Suppose a company builds a hiring algorithm and tells it not to use gender. At first, that sounds fair. The system will not be told whether an applicant is a man, woman, or non-binary person, and the employer can say that the model treats everyone the same because it never sees the protected category. The difficulty is that the model may still rely on employment gaps, previous job titles, school prestige, commute distance, availability for irregular hours, or patterns of promotion in earlier workplaces, each of which may carry the history of gendered inequality into the decision.
This is one reason apparently neutral algorithms can still discriminate. The problem is not usually that the system has bad intentions, since algorithms do not have intentions in the ordinary sense, but that social inequality leaves traces in data. If a model is trained on a world structured by unequal opportunity, it can learn those structures without ever being told to look for them.
A simple version of algorithmic fairness says that protected traits should be removed. Do not use race, gender, disability, religion, or other protected categories, and the model will be neutral. That rule has some value, but it leaves too much untouched because many features can work as proxies, which means they stand in for something else. A postal code may track race or income, a school name may track class, a gap in employment may track disability, caregiving, illness, or pregnancy, and a credit history may track generations of unequal access to housing, banking, and stable employment.
Once proxies are available, removing the protected category can become more symbolic than substantive. The model may not know an applicant's race, but it may know where they live, which school they attended, what jobs they previously held, and how people with similar profiles were treated in the past. If those patterns are shaped by discrimination, the model can reproduce them in a cleaner and more technical form.
The issue becomes clear in screening contexts such as hiring, admissions, lending, insurance, and policing, because these systems do not merely describe people but allocate opportunities, burdens, and risks. A prediction about who is likely to succeed at a job can affect who gets the chance to succeed. A prediction about who is likely to repay a loan can affect who gets access to credit. A prediction about who is likely to perform well at university can affect who is admitted into the conditions under which that performance becomes possible.
This is where the defence that “the model is only reflecting the data” begins to fail. A mirror may be passive, but a decision system is not. When an institution uses a prediction to sort applicants, it turns a pattern from the past into a rule for the future. If women were historically underpromoted in a field, past promotion data may make women appear less suited for leadership. If racialised applicants had less access to elite schools, school prestige may look like an individual measure of merit while carrying structural advantage inside it. If disabled workers faced unstable employment because workplaces failed to accommodate them, employment continuity may appear as personal reliability while encoding institutional failure.
None of this means accuracy is irrelevant, since a model that performs badly can harm everyone, including the people it is meant to help. The problem is that accuracy does not settle whether a prediction should be used. A variable can be predictively useful and still be normatively suspect, because the fact that a feature helps predict an outcome does not show that an institution is entitled to rely on it.
Fairness therefore requires more than removing explicit protected traits. Institutions need to ask which features are permissible, which outcomes should be predicted, how errors will be distributed, whether affected people can challenge decisions, and whether the system should be used in that context at all. Some systems may be improved through better data and auditing, while others may be inappropriate because the decision is too consequential, the proxies too pervasive, or the institution too unable to explain and contest the result.
The public debate around AI often makes discrimination sound like a bug that can be patched once discovered. More often, algorithmic unfairness reveals a conflict between prediction and justice, because a model can be accurate by learning from an unequal world, and an institution can become more efficient at reproducing that inequality by treating prediction as if it were neutral. Neutrality at the level of input does not guarantee fairness at the level of decision, because the past does not become harmless when it is translated into data.