Wednesday, February 19, 2014

The incredible map that shows that half of the U.S. population produces half of the GDP

The map on the left (sources: 1, 2) is making the rounds in social media today thanks to the enthusiasm of some designers and journalists who should know better. For some reason, they think that it's surprising that large U.S. cities are responsible for generating 50% of the GDP.

So what? Is that insightful at all? According to the U.S. Census Bureau, more than 80% of the population lives in urban areas, and it seems that 40% 42% lives in the largest metropolitan regions (source), so this map is just revealing population density, among other sins. Just compare it to this population map, or to this other one, which is visually fancier, or even to this one, displaying concentration of college graduates. We should all work a bit harder in developing our 'numbersense' (h/t Kaiser Fung,) shouldn't we?

Side note 1: Perhaps this could have worked better as a cartogram?

Side note 2: This is a good time to remember this fantastic spoof by xkcd.

Update (02/20): Andy Kirk has storyfied the Twitter conversation and written a post about the map. I disagree with him (no sword-wielding either, Andy.) If the purpose of the map is what he describes, why wouldn't we just plot population instead of using GDP generation as a proxy? And, by the way, just to give you an idea of how tricky maps like this can be, it may be promoting a nasty urban vs. rural narrative: "Hey, we are the innovators, the creators, the producers, you redneck moochers!"

If you don't know much about the U.S., the map is even more misleading. If you're unaware of how large the urban population in this country is, you may end up thinking that people living in very large cities produce a disproportionate amount of wealth per capita.

All this said, Andy's post includes a very quotable passage: "One person’s ‘interesting’ is another person’s ‘knew it’." Good one.

13 comments:

  1. Unfortunately there already is a "nasty urban vs. rural narrative" but it comes from right-wing politicians who talk about "Real America", as though some parts are less real than others. I see this map as helping to push back against that message.

    ReplyDelete
  2. Someone's biased and misleading narrative should never be fought with another biased and misleading narrative, but with data, evidence, logic, and reason.

    ReplyDelete
  3. – This map shows exactly what it says, it's not ambiguous nor misleading.

    – Notice that xkcd criticizes how maps like this one –depicting indexes strongly correlated with presence of people– are interpreted, not the map themselves.

    – Also notice that for the sake of his argument, xkcd shows 3 maps that are extremely similar, thus emphasizing the idea of indexes being too strongly correlated with population… but he totally exaggerates: it's unlikely that Martha Stewart Living subscribers are so well distributed among population, or that its uneven distribution matches the uneven distribution of site visitors (that would mean something).

    – Yes, wealthy and population are spatially strongly correlated (more people, more money) but they are not the same, and correlation is not 1.

    – Different cities have different people with different concentration of wealth. Also, there are rural and urban differences in the average wealth per person. So, the resulting map not only is not the same as a map of population density cut on 50%, but it's also different in ways that are difficult to imagine or predict.

    – Is this map useful? Imagine you want to decide where to place branches of your restaurant chain, you might find interesting to know where exactly lays the 50% of the country's wealth. This map, although extremely simple (also one of its virtues) says exactly that. You could use a map of population density, but then you'll be overlooking the fact that people in different places hold different amounts of money; so, the combined indexes (wealth per person x pop. density) contains for this case more relevant information than pop. density alone.

    – In more general terms, even if two indexes A and B are strongly correlated, and then adding B to A does not add a lot of new information, it does add something, and it might be relevant, even critical.

    – Is it surprising? That might be the core of this discussion, the fact a lots people manifested surprise… and yes, probably in many cases this surprise comes because people are not aware of the strong uneven population distribution (they haven't seen maps depicting this, for instance). But I don't think that alone explains the surprise. Even someone acquainted with the concept (and with xkcd joke), such as me, can find the map extremely interesting and to some extent revealing. Because is not only that the spots surround (certain) cities, it's also how tight the spots are, how small. I'm convinced many statistical educated people would associate this map more to a cut on the the last decile rather than on the median.

    – This map could be "improved", for instance by accompanying with a similar map but just for populations, so people could compare and spot the differences. A good designer could even fusion both maps so it would be easy to see points laying in A an B, A alone, B alone, none (a sort of geo Venn diagram). And there are many other options… but I wrote "improved" and not improved because as I mention before simplicity in this map is a virtue, and undoubtedly part of its deserved success.

    ReplyDelete
  4. Discussion! I love those! Let's go:

    -----This map shows exactly what it says, it's not ambiguous nor misleading-----

    It may not be ambiguous but it is, indeed, misleading. And you’re even helping explain why.

    ---wealthy and population are spatially strongly correlated (more people, more money) but they are not the same, and correlation is not 1----

    Well, that's one of the things that I’d like the map to show me. That's another level of detail that is missing, and that is necessary to tell a more interesting story. Perhaps we could try to see the relationship between population density at the county level, and average wages. That'd be more meaningful. I don’t have the time to collect the data myself right now, but if a reader does, I’ll be happy to publish the results.

    Besides, 40-50 are not the same, but they are very close (and don't forget the 80% figure that I mentioned, which is also relevant for this discussion.)

    ----although extremely simple (also one of its virtues)-----

    Extreme simplicity is very, very risky. Let me use an analogy. Compare this to a graphic that shows JUST the average test scores of all school districts in the US without reporting the standard deviation (or median and quartiles, I don't mind), as well. You'd call me a liar, and for good reason. I'd be hiding very relevant information by degrading the data.


    ----Imagine you want to decide where to place branches of your restaurant chain, you might find interesting to know where exactly lays the 50% of the country's wealth-----

    I wouldn't rely on this map. I'd need the extra information and detail and depth mentioned above.



    ----so, the combined indexes (wealth per person x pop. density) contains for this case more relevant information than pop. density alone-----

    OK, but where's that clearly shown, exactly? Where does the map say "hey, I'm showing you that population density is highly correlated to GDP production." It doesn't. Therefore, it's misleading if you don't know much about the U.S., which was one of my points above.


    ----is not only that the spots surround (certain) cities, it's also how tight the spots are, how small----

    They are not that small. Scale is misleading you here. And even if they were, well, these are cities, after all! Population density is high, therefore surface may be relatively small.



    ---This map could be "improved", for instance by accompanying with a similar map but just for populations, so people could compare and spot the differences----

    For sure.


    ----simplicity in this map is a virtue, and undoubtedly part of its deserved success---

    Simplicity is not a virtue. Clarity is. If by striving for simplicity you sacrifice data that are necessary to put the information into a proper context (and you point out what those data could be), you're doing it wrong.

    ReplyDelete
  5. Another analogy: Simplistic graphics like this (only one or two data points; no nuances, exceptions, details) are the equivalent of writing just a headline when you should be writing that headline PLUS a complete news story to provide background information.

    ReplyDelete
  6. In our modern world of news aggregators, few people read beyond the headlines. Knowing a few sound bytes and bullet points is what passes for being informed. Few take time to think beyond a superficial level. Most producers of infographics encourage this through their designs, in part because they embody this in their own thinking.

    ReplyDelete
  7. Minor grammatical correction to Stephen's comment: should say "Few *takes* time to think beyond a superficial level."
    For which we are grateful.

    ReplyDelete
  8. I am copying Robert Kosara's comment in Andy Kirk's blog (source: http://www.visualisingdata.com/index.php/2014/02/defending-the-incredible-gdp-map/_

    "The problem here is not that it could be interesting to see population density, but that the claim is that something other than population density is revealed, which is simply not true. Why not make a chart of population density instead? This incredible map shows you where 50% of the people in the U.S. live!

    If this were really about GDP, it would be per capita. That would be interesting. Income per capita is certainly higher in New York City than in Dallas, for example. But how do NYC and L.A. compare? What about other areas? And how does income compare to cost of living? Etc.

    The reason this is getting any attention at all is because it’s a map. If it were a bar chart or similar, people would just ignore it. But no matter how simple or obvious your data, once it’s shown on a map, people find it interesting."

    ReplyDelete
  9. Robert: your claim seems to imply that wealthiness is uniformly distributed among population in the US, I don't think you believe that.

    ReplyDelete
  10. Fascinating debate all. See if this exploratory dashboard helps resolve some of the questions.

    Some notes: Unfortunately, the U.S. doesn't publish GDP at a county level, only by state and metro area, so I used IRS income tax data instead to answer the question "are financial maps just population maps?" Spoiler: yes, unless you look at per capita figures (hat tip to Kosara).

    A caveat: What's interesting to me about the original project that the stirred the debate isn't the absolute concentration of the source of GDP. We all knew it would center on NYC, LA, Dallas, Chicago, etc, etc. What's interesting is the question of concentration of GDP relative to concentration of population. Is the GDP "over-indexed" for these places or not? If we ask this question of the IRS income data, we can see there is a small degree of over-indexing: of the 100 or so counties that account for 50% of the Adjusted Gross Income in 2011, 42% of the overall population of the country lives there. This cumulative distribution chart helps answer that question.

    Hope this was helpful. Fun times, in any case. ;)

    ReplyDelete
  11. A friend podologist once asked me to walk "normally" (in order to assess if I had issues in my feet). Obviously if you're asked to walk normally you do the opposite. She laughed and explained me that that was hypercorrection. The concept was familiar to me, except that in a quite different context: linguistics. Wikipedia definition: "In linguistics or usage, hypercorrection is a non-standard usage that results from the over-application of a perceived rule of grammar or a usage prescription. A speaker or writer who produces a hypercorrection generally believes that the form is correct through misunderstanding of these rules, often combined with a desire to appear formal or educated."

    I believe that's what's happening here. The rule would be: any map depicting an index correlated with presence of people should be read with caution, because it's highly biased by population density. But the way the rule is being understood and applied, by hypercorrection, is something like this: any map visualizing an index correlated with human presence is virtually identical to a map visualizing population density.

    Xkcd tried to popularize the first rule but ended up popularizing the hypercorrected version. Is in big part his fault: the maps he uses in the cartoon are unrealistically similar. It basically says that the probability of a person being subscribed to Martha Stewart Living is basically the same regardless of where she or he lives (except for Alaska). I don't know if that's the case with this decoration magazine, but in general terms is very unlikely for a cultural index being almost perfectly distributed, immune to cultural diversity among the country's population.

    Only indexes with mild distributions (such as normal) would produce virtually identical maps to the population density one. In that case the power law distribution of population density will act as a strong envelope for the normal distribution, making the later almost disappear, only visible in the tiny detail. But normal distributions are actually rare. The case of the production of GDP per capita is definitely not one: it's also a power distribution. And in the combination of two power laws none is the envelope of the other, making difficult to predict the outcome.

    I which I had the time and the data to prove (or disprove) my claims for this particular case.

    (meanwhile, things I learnt while writing this comment: "Martha Stewart dated Sir Anthony Hopkins, but ended the relationship after she saw The Silence of the Lambs. She stated she was unable to avoid associating Hopkins with the character of Hannibal Lecter", from Wikipedia)

    ReplyDelete
  12. Not a fan of arguments from authority, but Kaiser knows much more about statistics than I do, so here it goes: http://junkcharts.typepad.com/numbersruleyourworld/2014/02/numbersense-and-true-lies.html

    ReplyDelete
  13. This map appears to be using Urbanized Area geography to display data collected at the Metropolitan Statistical Area level (based on county boundaries). So, any rural areas that were part of these MSAs have been excluded from the map.

    For an idea of how much those two geographies can vary in size: http://upload.wikimedia.org/wikipedia/commons/6/6c/Metropolitan_and_Micropolitan_Statistical_Areas_of_the_United_States_and_Puerto_Rico.gif

    ReplyDelete