Tuesday, May 13, 2014

When plotting data, ask yourself: Compared to what, to whom, to where, and to when?

Two controversial maps by FiveThirtyEight
Let me begin with an aside: I believe that the life of "data" journalists and news graphics designers was easier before the World Wide Web and social media were invented. Not anymore. It's a scary world out there.

Consider what has just happened to FiveThirtyEight. Its audience seems to be much more numerate than the population in general. As a consequence, reporters and editors cannot walk away quietly when they make a mistake, such as plotting kidnappings in Nigeria apparently based just on news stories about kidnappings in Nigeria. These were aggregated by the Global Database of Events, Language, and Tone (GDELT).

Why is this wrong? Think about it this way: If you're writing about salaries since 1950, you never use gross values. You control for inflation. If you are analyzing crime stats, you adjust for population, and show rates per 100,000 or million people. And, just remembering the story by Vox.com that I wrote about last month, if you want to compare health care prizes in different countries —well, I'll leave the rest to you.

"News stories about kidnappings" is a poor proxy variable for actual kidnappings, as Erin Simpson has explained on Twitter, in one of the most informative takedowns I've read recently:
So the second map that FiveThirtyEight made is likely wrong, too. Rather than showing "kidnappings per 100,000 people since 1982," it may be just displaying news stories about kidnappings per 100,000 people since 1982. That's why we see such a large rate in the capital region.

We cannot repeat this enough to ourselves*: A number on its own means nothing. A single variable on its own doesn't usually mean much, either. When writing or doing a graphic about it, we need to ponder the factors that may help explain its variation. Context matters.

(I constantly repeat this to myself, by the way, because I'm prone to jump to conclusions and to make mistakes like the one described here.)

More tweets by Simpson: