AI and the Nature of Bias

MAY 1, 2019

In our last newsletter, we spoke about the increasingly fraught question of AI ethics and the way in which machines force clarity on age-old moral dilemmas. Nowhere is this clearer than in the arena of algorithmic bias, which, unlike fears about AGI and existential risk, has immediate, near-term implications. Concerns have already been raised in areas from facial recognition to recruiting algorithms to criminal risk assessment, with special emphasis placed on the disproportionate harm to women and minorities. Humans build AI, we are told, and their biases affect the systems they build. 

The reality is unfortunately more complicated, and the devil is to be found in the rather inscrutable details of the algorithms themselves. Most modern AI systems are deep neural networks, which work by spotting statistical patterns in vast quantities of data. As Benedict Evans notes, it’s vital to remember that such systems have no semantic understanding of the data they’re learning from. From the point of view of the machine, criminal sentencing data is not meaningfully distinct from pixels or vocal frequencies: all are numbers with patterns. Bias, where it emerges, comes from spotting the wrong patterns. But it is not always obvious where the error enters into the system.

Consider an algorithm that comes to associate a certain trait X with a certain group Y. This might be something innocuous, like fuzziness in teddy bears, or something with dangerous social consequences, like a negative stereotype in an ethnic group. If indeed X and Y have no strong association in the training data, this is an engineering problem: the algorithm can be re-tuned. So too if the association exists in the data, but not in the world it is meant to represent: the dataset can be made more representative (in principle, though in practice this is often difficult). Questions of bias tend to arise after these two options have been exhausted, at least to the satisfaction of engineers.

The clearest case is when the AI is trained on the judgments of biased humans, which in turn it mirrors accurately. This is not a problem with the system, but with the function it is designed to optimize. If, for example, human judges tend to interpret neutral black faces as angry, so too will the system trained on those judgments. But as controversy around the Implicit Association Test demonstrates, it is not always obvious that such patterns are correlated to explicitly racist attitudes. In a 2006 study that is particularly instructive for machine learning, white students were conditioned to show negative bias toward an invented group (“Noffians”) just by describing that group as oppressed. In the world of blind statistical association, a group that is bad and a group to whom bad things happen are not easily distinguished.

Things grow messier still when one attempts to build systems that rectify existing disparities. An algorithm designed to predict criminality, for example, would of necessity be trained on the present status quo. Even if blinded to race, it would likely begin to use race as a variable (derived from other correlated characteristics, like name or birthplace). This would serve to perpetuate the disparity, not because of bias in the system, but because of bias in the society. Provided the system is transparent, one could intervene to prevent the use of race as a variable, but at a cost: the system may, in the short term, become less accurate. As philosophers have long known, justice comes at the expense of consequentialism. Indeed, the human ideal of treating people as individuals may be unrealizable in systems that see only correlations with past data.

This has tempted some to argue that certain AI systems just shouldn’t be built or widely distributed. So-called “algorithmic gaydar,” which purports to predict sexual orientation from facial characteristics, is an oft-cited example. The privacy and human rights concerns are obvious. But as we saw in the above case, failing to ask a system to spot a correlation does not prevent it from discovering one on its own. If, for example, sexual orientation meaningfully correlates with other traits, like shopping behavior, then an advertising algorithm may form a “concept” of sexual orientation. If the system were then made transparent, in the name of justice concerns, the judgments of sexual orientation could emerge as well.

What this highlights is that algorithmic bias often manifests not in the machines, the data, or the engineers, but in the disparate harms of particular applications. One scenario Evans cites is skin cancer detection systems, which may be worse at detecting skin cancer on dark skin than light skin. This is neither an instance of weakness in the algorithm, which may perform much better across the entire population than a human, nor an instance of unrepresentative training data, which will by definition contain fewer minority samples. Rather, it is too broad a construal of the problem:

“...you might need to construct the model differently to begin with to pick out different characteristics. Machine learning systems are not interchangeable, even in a narrow application like image recognition. You have to tune the structure of the system, sometimes just by trial and error, to be good at spotting the particular features in the data that you’re interested in, until you get to the desired degree of accuracy. But you might not realise that the system is 98% accurate for one group but only 91% accurate for another group (even if that accuracy still surpasses human analysis).”

These examples barely scratch the surface of AI bias concerns, but they do serve to highlight a crucial point: addressing these problems depends on understanding the technology as well as the social context. Neither alone is sufficient. 

Nathanael Fast