Friday, 3 July 2020

Controlling the Algorithm

Can algorithms be racist? If they can, how can we control them? In this post, I look at the presentation of bigotry in technology and what we can do about it. [blog.mindrocketnow.com]


As part of my Python course, I’ve been learning machine learning techniques, or how bots recognise patterns in data sets by calculating correlations. In other words, how to create a decision-making algorithm. When I looked up from my keyboard, I began to notice that algorithms are getting some very bad press at the moment, which made me think a bit deeper. 


History of bad ideas

Facebook is suffering from more sponsors withholding ad spend. Despite Facebook claiming today “There is no profit to be had in content that is hateful”, the company still cannot stop big brands’ ads being placed next to racist posts. Big brands are responding in a way that’s eye-catching, by withholding ad spend. Eye-catching, but perhaps not ultimately effective as just 6% of Facebook’s revenue are from big brands. The remaining 94% consists of hundreds of thousands of businesses around the world, who cannot afford alternative methods of reaching their audience. The extraordinarily broad success of Facebook’s algorithm emboldens it to be blasé to both government and big business despite civic and corporate activism. And Facebook has some truth on its side, as its algorithm wasn’t designed to promote racism, so how can it be responsible for the racist outcomes?


Perhaps you remember Microsoft’s Tay bot, born in 2016. Within 16 hours it became a staple of future AI courses as a cautionary tale. Unwisely, Microsoft designed Tay to learn language from people on Twitter, as it also learned their values. Trolls targeted the bot and trained it to be a racist conspiracy theorist - presumably just for fun. Which essentially how trolls birth other trolls in their online echo chamber.


Things haven’t improved over time. Let’s try an experiment together right now. Perform a Google image search for unprofessional hair. What do you see? I did that just now, and saw pictures of mostly black women, which infers that most women with unprofessional hair are black. Which is racist. Was the algorithm that presented the results racist?


Enough people labelled pictures of black women with “unprofessional hair” that Google’s algorithm made the correlation and applied that inference to all the photos that it came across. This news story first broke in 2016. Before then, you only saw pictures of black women. Now, you see news stories interspersed with pictures of black women. Which shows that Google’s algorithm can’t distinguish between the two types of results.


It’s worth repeating: the algorithm cannot differentiate between non-discriminatory stories about discrimination, and results that infer discriminatory conclusions. This is because digital footprints never go away. Digital history is only additive. But at least whilst the search algorithm reductively simplifies and generalises, it doesn’t confer moral value, as both types of results are shown together.


It occurs to me that this is no different to people simplifying and generalising. But people normally understand that individual interactions are nuanced and to be judged on their own merits. Algorithms inherently do not. And it’s so much worse when algorithms enable ill-judged conclusions because they’re so impactful when they get it wrong. Algorithms now control all the complex transactions in life:


  • Presenting options for what watch next in YouTube;

  • Adjusting your insurance premium based on how hard you brake and accelerate;

  • Sequencing traffic lights in city centres;

  • Filtering your CV for keywords, to see if you’re a good candidate to interview;

  • Then assessing you might fit into company in your first video interview by measuring how you fidget;

  • Analysing credit card transactions to spot fraudulent activity;

  • Predicting crime hot spots based on the wealth of neighbourhoods;

  • Spotting the faces of terrorists flagged on watch lists on public transport CCTV.


Correlation is not causality

The core of the problem is that algorithms present correlation, and we interpret them as causality. Which at best is spurious, and at worst is bigoted. Algorithms aren’t inherently problematic, but can become so because people are, and the algorithms learn from people. The results aren’t inherently problematic, but do infer problematic conclusions, if we don’t understand their limitations. This train of logic is how we end up with discriminatory health insurance pricing.


Algorithms can be inspected, whereas humans cannot. But let’s not mistake this transparency for understanding. Even if the algorithm itself is clear and concise, the data sets are often complex, which makes outcomes unpredictable and not understandable. Then because we don’t understand them, yet believe that some other smart person could if they wanted, we over-trust them. There’s no civic demand to examine them, and civic acceptance of the conclusions. Which is how we end up with law enforcement resourcing algorithms over-policing poor neighbourhoods, and becoming part of the problem. Funding according to the algorithm targets the correlation, but not the cause.


So we have seen how spurious correlation and inherently biased data sets are major weaknesses of algorithms. The third major problem is that algorithms use past data to make predictions on likelihoods of decisions. As every investor knows, past performance does not necessarily predict future results. Decisions based on likelihoods are very bad at figuring out what to do in edge cases. When you apply those algorithms to millions of decisions, the number of bad decisions at the edge mount up. And each of those bad decisions changes someone’s life. Each edge case matters to someone.


It’s beneficial to allow people to game algorithms? For instance, part of the duty of hospital administrators is to work the NHS appointments system to enable patients to reorganise treatments for the convenience of the patient. Human intervention is needed because the appointments system optimises hospital resources.


Last Monday, TikTok and K-pop fans claimed responsibility for the lack of supporters at Donald Trump's campaign rally. They understood how the TikTok algorithms boost videos in order to promote them to like-minded activists, and deleted their posts after a day or two to avoid the plan leaking to Trump’s team.


The alternative to not understanding these algorithms is no longer feasible. The world has become too data rich to be navigated without help from artificial intelligence. We can either manipulate them to our benefit, or they will manipulate us to theirs. So what can we do about it?


Legislation and activism

It’s not illegal for a business to prioritise its resources to serve customers that are willing to pay the most. For example, it’s not illegal to prioritise call centre agent pick-up times based upon whether your number matches a list of high-value customers, even if it means you don’t answer calls from low-value customers at all. But it is discriminatory. 


In the EU GDPR legislation gives citizens the right to explanation of automated decision making. On request, companies are obliged to explain how sharing data links to decisions made about customers, and the impact of that decision. This lays important legislative foundation, as it forces companies to understand how data links to decisions, which many do not. However, this legislation falls short of protecting against bad algorithms, data and decisions. 


As we’ve seen, the combination of complex logic, inherently biased data sets and prescriptive application, make algorithms overly blunt instruments. This is now recognised by leading tech, perhaps more so than governments. Amazon notes that technology like Amazon Rekognition should only be used to narrow the field of potential matches, but because legislation still doesn’t understand this, it is implementing a one-year moratorium on police use. Amazon recognises that asking individuals to safeguard themselves against mis-application of its algorithms is unfair because it’s just too complex for end consumers.


This is exactly where we need legislation to protect us. My hope is that parliaments will enact well-considered limits on use of algorithms in industry and government, focusing on public safety and law enforcement. We should then use these limits to hold companies and agencies who do not safeguard their algorithms to account.


Legislation is not the only only tool that we have. As I’ve noted early, I do agree that inspecting algorithms and data sets is out of the reach of all but data scientists, so useless to most agencies. However, specific testing for discriminatory outcomes isn’t. So another important tool is to empower trading standards bodies to test for algorithm bias, the same way that they test for food hygiene.


Finally, there are things that end consumers can do to train the algorithms. We can increase our social connections, because online segregation is as destructive as physical segregation. By exposing ourselves to more opinions, algorithms are exposed to more diversity and present less echo chamber click bait. 


We shouldn’t engage with the click bait, the posts that elicit strong emotion. Imagine if the whole world scrolled past the trolls - then the oxygen would be removed, and the trolls would wither away, because the algorithms would see that bilious content is not clickworthy.


Fundamentally, we should play nice so that algorithms don't make bigots of us all, otherwise we only have ourselves to blame.


No comments:

Post a Comment

It's always great to hear what you think. Please leave a comment, and start a conversation!