Tuesday, September 24, 2019

On the dangers of aggregates and the beauty of variance: Christopher Ingraham's If You Lived Here You'd Be Home By Now

The best book about data I've read in the past few months isn't about data. It's about people. Or, better said, about the myriad of individual experiences and quotidian facts that often aren't captured by data.

On August 17, 2015, Cristopher Ingraham, a reporter at The Washington Post, wrote a story and designed a map under the title 'Every county in America, ranked by scenery and climate':
In the late 1990s the federal government devised a measure of the best and worst places to live in America, from the standpoint of scenery and climate. The "natural amenities index" is intended as "a measure of the physical characteristics of a county area that enhance the location as a place to live."
According to this measure, Ventura County, California, is the “best” place in the U.S., and Red Lake County, Minnesota, is the “worst”.

Ingraham's If You Lived Here You'd Be Home By Now chronicles what happened after. He not only got plenty of pushback from Minnesotans, but he eventually decided to move to Red Lake County with his family.

If You Lived Here... is a delightful collection of anecdotes—some of you may remember Ingraham's hilarious cricketpocalypse—connected by an underlying theme: data about human beings sometimes illuminates, but sometimes it also obfuscates, particularly when we assume that aggregates are a reflection of individuals, or when we don't grasp what it is that we are measuring.

I could write extensively about this problem—I have, actually, in How Charts Lie and here,—but I'll just quote from Ingraham's book:
As somebody whose job is to write about data writ large, I'm a big believer in its power—better living through quantification. But my relocation to Red Lake Falls has been a humbling reminder of the limitations of numbers. It has opened my eyes to all the things that get lost when you abstract people, places, and points in time down to a single number on a computer screen. 
[...] 
One of the big dangers of our glorious, new, quantified world is the emergence of a type of numeric stereotyping—of insights hardened into dogma by the weight of a thousand data sets. 
We “know”, for instance, that Mississippi is poor, that New York City is expensive, that Chicago is violent, and that Red Lake County is ugly. These things are, of course, true in the aggregate sense, or in comparison with other places. 
But each of these numbers and rankings masks infinite nuance behind their finite limits. They overlook the thriving communities in Mississippi, the inspiring stories of tenacity and triumph in Manhattan, and the people quietly working to make Chicago's streets safer.
Don't miss this book. It'll make you laugh. And think.