The world needs more soldiers of the truth to combat misleading statistics and disinformation.

Studying ethical AI, I had an aha moment learning about sampling bias. Last year I learned to make data visualizations using D3 in JavaScript, also learned PowerBI and Tableau. It’s rather easy to wow people with nice graphics without questioning the data behind it.

In the book “How to Lie with Statistics”, two newspapers sent surveys to the British population asking if they knew about the metric system. One said 33% and the other 98%. Why the difference? Respondents of the 98% self-selected and were not representative of the general population. This is similar to sending a survey asking the population “Do you love the PS5?” Who are most likely to respond? Those that own one. The survey can conclude that 98% of the population love the PS5.

Another form of sampling bias is social desirability when surveying a sample of the population that is most likely to answer in a desirable way. A hiring panel after candidates complete interviews can have social desirability bias if feedback is collected in an open forum where participants are aware of their organization structure in the room (or zoom). If the most senior or leader speaks their opinion with negative feedback, what will be the most likely feedback from the rest of the group? Even if such person doesn’t speak first. It might change someone’s opinion to change their feedback and appear socially desirable.

Are there evil scientists?

Who approves the studies that have sampling biases or misleading graphs?

A misleading graph could be one which Y-axis does not start with zero. There is a blog out there hunting down misleading graphs.

In ethical AI what I understand is that not all studies with sampling biases are made by evil scientists. Most studies mean well and aren’t created to mislead the population. The study of ethical AI is to elevate the skill in AI and ML to make explainable and fair products.

Ask me anything on Linkedin