Thursday, March 7, 2019

Reasoning with diagrams

Last night, economist and quant JD Long posed a challenge:

The tweet is related to a paper discussed in this article by Calling Bull. Imagine that such an algorithm were applied to the real world. What is the probability that a person is a criminal if the algorithm says so?

JD provided two data points:

• We assume that criminals are 0.5% of the population
• The accuracy of the algorithm is 90%

In the thread there are responses that answer the question using conditional probability formulas. Here's a little secret: I loathe formulas. Particularly when I can reason with an image instead; I may one day write a book about that, although Math with Bad Drawings is already out there, and you should get a copy. I prefer quick heuristics and visuals.

For my quick back-of-the-napkin-and-not-that-precise exercise on conditional probability I needed two other figures:

• The population: let's assume that we're in the U.S, so it's 325 million (you may not need the population if you use formulas, but it's useful for the diagram.)
• The false positive rate: how often the algorithm tags a person as a criminal even if that person is not a criminal. After reading this I guessed a false positive rate of around 6%.

Here's the resulting tree diagram; the probability of your being a criminal if the algorithm tags you as such is roughly only 7%:


The figures underlined at the bottom of the diagram are 19,402,500 (people who aren't criminals but are still wrongly tagged by the algorithm) and 1,462,500 (people who are criminals and are correctly identified by the algorithm). Adding up those two figures you get the total number of people tagged as criminals, regardless of whether they are indeed criminals or not: 20,865,000.

Of those, 93% (the 19,402,500) aren't criminals. The chance of false positives is enormous: if a photo is tagged as depicting a criminal, 9 out of 10 times that person won't be a criminal at all.

I double-checked the calculation using round numbers, beginning with a sample of 10,000 people; quants in the room, please let me know if I missed something:


There are plenty of websites and books that cover heuristics and diagrams for reasoning. To get started, I'd recommend Gerd Gigerenzer's Risk Savvy or Judea Pearl's The Book of Why. If you are a journalist or a graphic designer, I strongly recommend books like these, as we often get probability wrong. You'll enjoy them.