I took AI Ethics at Georgia Tech and here is what I learned.

Until 3 months ago I evangelized AI and Machine Learning as the holy grail of all technology. Then took AI Ethics at Georgia Tech this last fall and it changed my perspective of AI/ML.

Here are innovations related to AI/ML to put some context:

  • NLP: Gmail can autocomplete the emails you are typing. Siri can also autocomplete your texts or understand your voice. The same with Alexa. Amazon or any search engine can try to predict what you would type.
  • Computer Vision: Security cameras can identify criminals like in the Boston marathon. Automated vehicles can drive themselves by recognizing objects on the road. Your funny TikTok video can distinguish between your face and your dog’s face.
  • And so many more applications like recommendation systems, anti fraud, supply chain, etc.

My first attempt at understanding ML was taking Coursera Andrew Ng’s course. Took the first 3 modules up to logistic regression. Tried to take it a 2nd time but didn’t finish it again. Then I enrolled in the MS in CS at Georgia Tech to specialize in ML, however soon I realized some classes were beyond my grey matter. I didn’t give up and took data analytics which was heavy in programming and had some homeworks in ML. In the meantime I was working at a company growing their data science teams.

This all came to a sudden halt after taking AI Ethics and here is the long story short.

Short Story

In products where humans are the end user, most datasets used to train AI/ML algorithms are built for the majority of the demographic population.

So far, this doesn’t sound so bad. However, let’s look at some demographic statistics. United States demographic data as seen in the Census website here and Wikipedia here show that between 2020 and 2021, the US has the following races:

  • 57.8% White
  • 18.7% Hispanic
  • 12.1% Black
  • 5.9% Asian
  • 4.1% Two or more races
  • 0.7% Native American
  • Others

This means that most datasets in AI/ML are built to meet the 57.8% White with high sensitivity (true positive rate). Not an ML expert but what I understand about metrics is that anything above 50% is better at predicting than flipping a coin.

Why does this matter? If it has an 80% true positive rate for 57.8% of the US population, then it’s good enough, right?

The problem is when you are not part of that demographic. If you are Hispanic, Black, Asian, or other, then this metric, the true positive rate is not as good, or as good as flipping a coin.

Why does this matter?

Let’s say the product is in the computer vision domain about security cameras that flag criminals in the population. Let’s say the algorithm has more than 90% chance of identifying a criminal from the white population but only 50% chance of identifying a criminal from the hispanic population.

If you are hispanic, the algorithm has a 50/50 chance to make a mistake. That’s about the same as flipping a coin.

What if you are Black in the US? You are not part of the majority population that the algorithm was built for, so you might get about the same result as the Hispanic population. There is a 50/50 chance that the algorithm makes a mistake.

In AI/ML, an algorithm with 50% true positive rate, it’s about as good as flipping a coin.

This can’t be true

There is just no way that these algorithms work like that. That would mean that algorithms are racist. This can’t be true.

I learned that AI Ethics it’s not only about demographics of race. Algorithms also affect regulated domains and legally recognized protected classes.

Regulated domains

  • Credit
  • Education
  • Employment
  • Housing

Legally recognized protected classes

  • Race, Color, Sex
  • Religion, National origin, citizenship
  • Age, pregnancy, family status, disability status
  • Veteran status, genetic information

I read many research papers and explored datasets and discovered that in the majority of surveys, responders are white males. The algorithm is deployed into a product and it only identifies and benefits this population. The algorithm would affect these populations: race and color (non-white), sex (female), probably national origin and citizenship, pregnancy, and family status.

There is just no way. What about all those big tech companies innovating with Machine Learning?

Who owns this information?

  • Your pictures on Facebook and Instagram
  • The videos you post on TikTok
  • Your search data on Google or Amazon or any search engine
  • Your browsing history
  • Your tweets.

You would assume that you own this data, right? They are your pictures/videos, tweets, and all your interaction with apps and websites. Wrong. Read the terms of services of any of the popular apps that you use. This information is delivered to an algorithm to learn and this algorithm might have intentional or unintentional biases.

Misuse and unfair use of AI/ML algorithms

Taking AI Ethics was the “I wonder what’s behind this curtain of Machine Learning? Woah!”

There is an application called COMPAS used to predict if a criminal would commit another crime (aka recividism), which is some sort of Minority Report app, although not as sci-fi. Blacks who didn’t reoffend were two times misclassified as compared to whites.

Google search for ‘CEO’ in images used to show 89% male pictures. This has been corrected since this was reported though and now shows a more balanced result. However at some point in the past it also showed Black people when querying gorillas.

Amazon tested a recruiting tool to automate what most candidates think is automated, job applications. The product was shut down after the algorithm classified women as less favorable candidates. The dataset was built from historical data where the demographic was mostly male.

Pre-existing biases in datasets.

There isn’t a governing board when datasets are generated. There aren’t many regulations for building datasets. If you are uploading videos to TikTok, most likely they are using your data to feed the algorithm, and you have no idea what the algorithm does, other than showing similar videos to like and share.

In my class someone asked a question about the homework and had categorized the demographics as: White, Black, and Mexican. It’s not the first time I see someone categorizing all Hispanic under the Mexican category.

In Amazon’s AI recruiting tool, I would assume a biased dataset could be that to parse resumes the engineers could try to scrape all male names from a system. Or just didn’t come to realize the unbalanced set of demographics.

An article from the Bay Area said a Black couple put up their house for sale and the realtor estimated the house price at way below the market price. They asked their White friend to pretend to be the homeowner and asked another realtor, who priced the house at market price. Why would this happen? The research showed that owner-occupied houses is undervalued in neighborhoods that are 50% percent black.

It’s easy to mislead with shiny objects

The population is often mislead by disinformation, shiny visualizations, and lack of knowledge of statistics or simple math.

Biases in datasets can start from sampling. Let’s say the study is to learn the behavior of teenagers in public parks. The researcher goes to one park to collect all the data and creates a dataset but missed to realize this park is in a affluent part of the city. The research is published as ‘behavior of teenagers in parks’, instead of ‘behavior of affluent teenagers in parks’.

Biases in datasets can come from poor analysis. They surveyed football quarterbacks in the US for vaccination status. Responders said they were vaccinated. One quarterback answered that he was immunized. What’s the definition of ‘immunized’? Fox News publishes an article using Department of Labor data where they asked the US population if they are currently working. The survey had 60K observations implying that 147M people are employed. However those that answered ‘Yes’ (I am working) didn’t explain if they are working full time or part time or volunteering or were part of the work force.

Biases in datasets can come from shiny visualizations. Any visualization can mislead if they have these:

  • The Y-axis doesn’t have a zero. For example a scatter plot that shows a big difference among observations. However, once you zoom out, the observations all appear the same.
  • Inverting the axes X with Y, perhaps to better fit the graph on a website or to make it look cooler could mislead to a wrong conclusion.

These visualizations don’t show up only on Fox News. I read research papers from respected organizations that contained similar graphs. These also show up all over social media to attract web visitors. They generate the ‘oh wow, really?’ but they mislead assuming the reader’s ignorance.

Median vs Mean

I did an analysis of the popular stackoverflow survey to understand fair compensation across all demographics and reduce the biases with a ML algorithm. Exploring the dataset with bar charts in matplotlib always showed a very high mean in compensation. A popular social media website could publish this as ‘highly compensated engineers get paid beyond $300,000’. However the dataset contained some really strange outliers where using the median would cut the mean compensation by half.

Simpson’s Paradox

Not named after Homer Simpson. It’s something that appears as a trend in individual groups and shows the opposite trend in combined groups.

That’s like if an analysis said that COVID was increasing in individual states but was decreasing in general in the US. A publication could mislead the population saying that COVID is decreasing across the country.

Stanford’s example is a bit more technical that suggested coffee intake can increase IQ test performance, however testing individual groups, it decreased performance. ‘The reason for the confounding is the causal impact of the hidden covariate, education level, on both coffee consumption and performance’. More here. In English this means that the analysis can mislead that drinking coffee makes you smarter while taking a test, but it shows the opposite effect when analyzing individual groups divided by education level.

Correlation vs Causation

This is the typical topic when talking about statistics. Correlation doesn’t imply causation. Umbrellas protect you from rain but buying a lot of umbrellas doesn’t imply it will rain.

Here is one related to recruiting.

Recruiting is a numbers game.

Candidates find jobs through recruiters emails. It doesn’t imply that sending hundreds or thousands of messages will help candidates find jobs. As seen on Linkedin, a software engineer posted a picture of a recruiter message asking them to join a company. The engineer said this was the 3rd email they got and told them again weren’t interested.

Is word2vec evil?

Word2vec is an algorithm for NLP created by Google and the algorithm is used by most companies that have a social media, search engine, or ecommerce app to produce sentiment analysis, understand user search queries or for recommendation systems.

Why is it evil?

It doesn’t play well with these demographics:

  • Hispanic
  • Black
  • Asian
  • Women

Has the algorithm been fixed for bias since being published? No. Although Google Scholar shows about 3,800 results when querying ‘word2vec bias’ and 5,800 results without the word ‘bias’.

Why is word2vec biased in the first place? If the corpus (whole dataset of sentences and words) has the word engineer with a male pronoun then it will associate this word with males. Try reading the Mythical Man Month. Most of the book associates the word engineer with a male pronoun and rarely uses ‘she’. If this book was fed into an NLP algorithm, it would always associate engineers as male.

Is LFW evil?

LFW (Labeled faces in the wild) is a dataset of face pictures used for facial recognition. This technology can be used on any app that uses pictures or videos. Those apps where you like to post your pictures and your videos.

Why is it evil?

It’s not evil. It actually works well. Unless you are part of these demographics:

  • Hispanic
  • Black
  • Asian

A query for LFW in Google Scholar shows 5,000 results for ‘LFW bias’ and 31,000 results without the word ‘bias’.

Have they fixed the dataset for bias? No.

I tested a popular web app for face recognition. It identifies Jason Momoa as Hispanic and John Legend as Asian.

Want to know if your pictures were used in biometric surveillance research? Try searching your name or search by hashtag in Flickr. If you ever used Flickr as far as 2004. Completely unregulated and lack of data privacy or data management practices. Oh and also you can’t remove your photos from these datasets.

Main source of bias is human

Conclusion is the main source of bias is human. As they say garbage in, garbage out. The source of bias is human when the data is collected about humans, through surveys and through users interacting with apps.

AI Ethics propose methods for explainable AI and fairness of AI and machine learning algorithms. However I see the current reality is far from this dream.

Three months ago I was hopeful for AI/ML but now I am a bit afraid of it.

Ask me anything on Linkedin