Wednesday, July 4, 2018

Visualizing amalgamation paradoxes and ecological fallacies

I'm spending the Summer writing my first popular science book about charts for the general public, to be published in 2019, and I've been searching for examples of amalgamation paradoxes and ecological fallacies. An amalgamation paradox occurs when patterns appear or disappear depending on how you subset your data, and an ecological fallacy consists on inferring characteristics of individuals based on the features of the groups they belong to.

I was inspired by a recent article and talk by Heather Krause, and decided to recreate her charts with more recent data. Here you have a strong positive correlation (0.51!) between cigarette consumption per person and year, and life expectancy; each dot is a country:

I made the chart in INZight (tutorial) and the data, which comes from the WHO and Gapminder, is here (CSV), in case you want to play with it.

This is an obvious case of spurious causal inference —don't miss this hilarious website,— as we could think of other variables that affect life expectancy at the national level, such as wealth. Using the same data, I color-coded the countries by income group. In general, rich countries have high life expectancies, and poor countries are at the bottom of the Y-axis:

Here you have one plot per income group:

Moreover, the positive relationship between the two variables disappears once we subdivide countries by region...

...and it reverses if we split the data further, down to the individual level, as smoking does shrink your life expectancy. Here's a chart from Heather's article:

