Progress in deep learning is exploding. The last three years have seen improvements in technology that have allowed researchers to do things that people could have only dreamed of just a decade prior.
We've seen fully computer-generated faces, AI songs, and cutting-edge language algorithms that produce human-indistinguishable work. But our growing reliance on these tools also means a growing reliance on potentially biased data, and this leads to an important question: how do we avoid or minimize bias in machine learning applications?
First and foremost, AI is a tool. The unfortunate reality is that it is up to us as humans to decide how that tool should be used. To fully appreciate the scope of the problem, we need to consider the impact of bias in data, in algorithms, and ultimately how these affect the decisions made by machines.
A recent example of bias in machine learning data comes from Microsoft, which was forced to apologize after its team building an AI chatbot named Tay was "taught" some pretty racist ideas by Twitter users over a short period of time. Tay was designed to learn about conversations from human interlocutors on Twitter. Unfortunately, it learned from some pretty unsavory people on Twitter instead!
The researchers at Microsoft had taught Tay with a data set of anonymized Tweets called a "corpus," and they hadn't filtered it for offensive content. They didn't have a specific goal in mind for their experiments with Tay, but they wanted it to learn conversational skills, so they gave it access to Twitter so that it could learn from conversations among human users.
By contrast, Google's DeepMind created AlphaGo by studying thousands of games played by human experts first and then adjusting its algorithms accordingly rather than starting with generic knowledge about the game and then seeking out specific examples of play (see the paper).
In addition to flawed data, sometimes it is the algorithm itself that is biased in some particular way. This is particularly important in situations where there is a procedural aspect to the generation of information - i.e., algorithms that aren't completely deep-learning based.
In these situations, human bias can lead to procedural decisions that consistently make the wrong move. For example, recommendation algorithms (like those used on Facebook, Instagram, and Netflix) typically try to provide products or users with access to similar products or users.
They use what are called 'predicted group traits'; thousands of subsets of the population are automatically generated, and individuals are placed into one or more of these groups.
These groups then receive different product or user recommendations, and much of the time, these recommendations result in broad ethnic, gender, socio-economic, or racial stereotypes - like when black Facebook account owners were over 50% more likely to have their accounts automatically disabled.
It's important for companies building artificial intelligence systems not just to create them but also to monitor how they function over time and intervene when clear examples of preferential or biased data manifest. Here are three ways to help avoid bias in machine learning:
Firstly, anomalous behavior should be investigated as soon as possible - mostly due to the fact that, even if a system isn't performing well enough yet for commercial use, researchers can later use these systems as test cases for investigating bias-related issues in future iterations of the algorithm. This will certainly slow algorithmic deployment, but on the flip side, it also leads to higher-quality products with better market value and a longer lifespan.
The second way that we can prevent machine learning systems from becoming biased is through transparency about what has been learned. This means clear, accountable documentation, open sourcing when appropriate, and the ability to distinguish between what has been learned through supervised learning (carefully hand-picked samples) versus unsupervised learning (i.e., mass training on unaltered datasets).
A side note: a few years ago, Google began publishing research about this distinction in its DeepMind blog after it discovered that its systems were developing some troubling biases simply through exposure to an incredible amount of unsupervised content on YouTube! With over 500 hours of video per minute, you're bound to get a few bad apples.
The third way we can avoid bias involves increasing social responsibility on the part of artificial intelligence researchers and engineers who are building these systems.
It's not just enough for the businesses and consumers that later use these products to understand the nuance - to improve the quality of future models, such ethical responsibility must first begin with the research.
Additionally, it is necessary that we make an effort to not only build unbiased AI but also to continuously monitor and make sure that AI systems are used appropriately once they are deployed commercially or otherwise adopted into existing environments.
That's a wrap! We sincerely hope you enjoyed our guide on how to avoid bias in machine learning. See you next time!
(sample finished)
1SecondCopy
We hired the top 1% of writers so you don't have to. Get high quality articles & posts in just 3 days.