Removing bias from machine learning

The AI industry is conscious of its unconscious bias. Both men and women in machine learning and in the media are speaking up about the issue.

Yet even as we vow to ‘lean in’ collectively; as we become more aware of the importance of diversity - socially, ethically and economically – there are still far more men working in the machine learning industry than women. And machine learning technology is still not neutrally scrubbing out biases. Instead it seems to be amplifying them.

Sexism, racism and other forms of discrimination are potentially being built into the machine-learning technology behind many “intelligent” systems that shape how we are categorized and advertised to. These systems may even end up exacerbating inequality in the workplace, at home and in our legal and judicial systems.

This is partly a data problem, but because many of the biases are ‘unconscious’ we only spot them when the machine learning algorithms trained on the biased data sets produce outrageously sexist or racist results. Would we spot potential issues earlier if our industry itself was more diverse?

What is being done about it? On the machine learning algorithm front, a team from Boston University and Microsoft Research have come up with a method of ‘hard de-biasing’ machine learning datasets. See arxiv.org/abs/1607.06520: Man Is to Computer Programmer as Woman is to Homemaker? De-biasing Word Embeddings.

Their de-biasing system uses real people to identify examples of the types of word connections that are appropriate (brother/sister, king/queen) and those that should be removed from the massive language datasets used to train intelligent systems. Then, using these human-generated distinctions, they quantify the degree to which gender is a factor in those word choices – as opposed to, say, family relationships and tell the algorithm to remove the biased connections.

When this was done, they found that the machine learning algorithm no longer exhibited blatant gender stereotypes. The team is now also investigating applying related ideas to remove other types of biases, such as racial or cultural stereotypes.

This seems like really valuable work although it doesn't absolve the responsibility of companies developing applications to guard against bias. It also doesn’t fix the lack of diversity in the machine learning industry itself. Professor Fei-Fei Li–one of the comparatively small number of female stars in the AI world who has recently moved to Google –brought this point home at a Stanford forum on AI recently, arguing whether AI can bring about the “hope we have for tomorrow” and whether this depends in part on broadening gender diversity in the AI world.

As Melinda Gates noted during this year’s Code Conference, “When I graduated 34% of undergraduates in computer science were women… we’re now down to 17%.” In the UK just 15.8% of engineering and technology undergraduates are female.

With such an imbalance in the sheer numbers of male and female engineers and with competition for candidates in AI so intense, why is it so hard to get more women in the workplace in engineering roles even with gender neutral and inclusive recruitment practices?

Yes, there are nonprofits and initiatives dedicated to preparing more female students for engineering careers. But AI, a technology that is shaping our present and future, is getting built right now. Can this host of female engineers get into the workforce soon enough?

We need to encourage women who have left the industry after children or for other reasons to retrain and come back and show all female engineers that machine learning companies are great places to work – policies should be built in and we must lead from the top.

This is just as important in start-ups like Graphcore as well as in technology giants. We all have responsibility for increasing diversity and working to remove bias in machine learning.