“One of the perennial problems with the statistical and machine learning techniques that underpin “big data” analytics is that they rely on data entered as input. And when the data you input is biased, what you get out is just as biased. These systems learn the biases in our society. And they spit them back out at us.
Consider the work done by Latanya Sweeney, a brilliant computer scientist. One day, she was searching for herself on Google when she noticed that the ads displayed were for companies offering criminal record background checks with titles like: “Latanya Sweeney, Arrested?”, thereby implying that she may indeed have a criminal record. Suspicious, she started searching for other, more white-sounding names, only to find that the advertisements offered in association with those names were quite different. She set about to more formally test the system finding that, indeed, searching for black names were much more likely to produce ads for criminal justice products and services.
This story attracted a lot of media attention. What the public failed to understand was that Google wasn’t intentionally discriminating or selling ads based on race. Google was unaware of the content of the ad. All it knew is that people clicked on those ads for some searches but not others and so it was better to serve them up when the search queries had a statistical property similar to queries where a click happen. In other words, because racist viewers were more likely to click on these ads when searching for black names, Google’s algorithm quickly learned to serve up these ads for names that are understood as black. In other words, Google was trained to be racist by its very racist users.
Our cultural prejudices are deeply embedded into countless datasets, the very datasets that our systems are trained to learn on. Students of color are much more likely to have disciplinary school records than white students. Black men are far more likely to be stopped and frisked, arrested of drug possession, or charged with felonies even when their white counterparts engage in the same behaviors. Poor people are far more likely to have health problems, live further away from work, and struggle to make rent. Yet all of these data are used to fuel personalized learning algorithms, risk-assessment tools for judicial decision-making, and credit and insurance scores. And so the system “predicts” that people who are already marginalized are higher risks, thereby constraining their options and making sure they are, indeed, higher risks.
This was not what my peers set out to create when we imagined building tools that allowed you to map who you knew or enabled you to display interests and tastes. We didn’t architect for prejudice, but we didn’t design systems to combat it either.
We are moving into a world of prediction. A world where more people are going to be able to make judgments about others based on data. Data analysis that can mark the value of people as worthy workers, parents, borrowers, learners, and citizens. Data analysis that has been underway for decades but is increasingly salient in decision-making across numerous sectors. Data analysis that most people don’t understand.
Many activists will be looking to fight the ecosystem of prediction, regulate when and where it can be used. This is all fine and well, when we’re talking about how these technologies are designed to do harm. But more often than not, these tools will be designed to be helpful, to increase efficiency, to identify people who need help. And they will be used for good alongside uses that are terrifying. How can we learn to use this information to empower?
One of the most obvious issues is that the diversity of people who are building and using these tools to imagine our future is extraordinarily narrow. Statistical and technical literacy isn’t even part of the curriculum in most American schools. In our society where technology jobs are high-paying and technical literacy is needed for citizenry, less than 5% of high schools even offer AP computer science courses. Needless to say, black and brown youth are much less likely to have access let alone opportunities. If people don’t understand what these systems are doing, how do we expect people to challenge them?
We must learn how to ask hard questions of technology and those making decisions based on their analysis. It wasn’t long ago when financial systems were total black boxes and we fought for fiduciary accountability to combat corruption and abuse. Transparency of data, algorithms, and technology isn’t enough; we need to make certain assessment is built into any system that we roll-out. You can’t just put millions of dollars of surveillance equipment into the hands of the police in the hope of creating police accountability. Yet, with police-worn body cameras, that’s exactly what we’re doing. And we’re not even trying to assess the implications. This is probably the fastest roll-out of a technology out of hope, but it won’t be the last. So how do we get people to look beyond their hopes and fears and actively interrogate the trade-offs?
More and more, technology is going to play a central role in every sector, every community, and every interaction. It’s easy to screech in fear or dream of a world in which every problem magically gets solved. But to actually make the world a better place, we need to start paying attention to the different tools that are emerging and learn to ask hard questions about how they should be put into use to improve the lives of everyday people. Now, more than ever, we need those who are thinking about social justice to understand technology and those who understand technology to commit to social justice.”