Monday, August 19, 2019

Smart interaction in visualization

I'm always happy to see good charts being produced by national statistical offices. Thanks to Xan Gregg I've discovered these interactive visualizations by Rob Fry's team at the UK Office for National Statistics. They are sequences of density plots of the ages of people who committed suicide every year.

As the annotation layer on the charts themselves point out, there were fewer suicides in the 90s than today or the 1980s, and they were more common among younger people. In later years the highest points of the curve move to the right, indicating an increase in suicides among middle-age people. My other favorite smart feature of this piece is that if you hover over any year, you can see each density plot without being obscured by other curves.

Thursday, August 15, 2019

The increasing popularity of ternary plots

The Guardian's Josh Holder visualizes ‘How a no-deal Brexit threatens your weekly food shop’. It's a neat scrollytelling experience (just an aside: I just remembered that Robert Kosara criticized this narrative technique, as he prefers steppers). The core element in Holder's story is this series of animated maps of imports and exports:

However, the most intriguing part for me is the following interactive ternary plot:

I'm a fan of of expanding readers's visual vocabulary by exposing them to novel and unusual graphic forms. That said, if we know that many people still struggle with two-variable scatter plots, I wonder how they'd react to the graphic above, and also whether textual clarifications may help in cases like this, or if they'd be too redundant.

UPDATE: Plotly's Nicolas Kruchten has shared this article about how ternary plots work.

Wednesday, August 14, 2019

Getting ready for a new data and visualization MOOC

On Monday I visited the Knight Center in Austin, Texas, to record videos for a new free MOOC (“massive open online course”). Simon Rogers and I hinted that we're working on it in this article for the Data Visualization Society. The MOOC, which will be announced soon, is titled ‘Data Journalism and Visualization with Free Tools’ (DJVFT).

I've done plenty of free online training—see the materials I prepared the latest one,— but the upcoming MOOC is different, and far more ambitious. First, just one of its six modules covers visualization, with an entire section about How Charts Lie; the other five modules talk about how to get data, clean it and prepare it, explore it, use Machine Learning, and build narratives based on numbers. Second, the course will be offered in three languages simultaneously: English, Spanish, and Portuguese.

The third difference is the instructors. I did most of the previous trainings alone or in collaboration with just one person (Scott Murray, Heather Krause) We'll have many instructors in DJVFT: myself, Simon Rogers, The Pudding's Jan Diehm, Google's Dale Markowitz, Minhaz KaziMarco Túlio Pires, and Juan Manuel Lucero, Datavized's Debra Anderson, and the team at Kiln, the creators of Flourish.

There isn't a launch date yet, but the MOOC will happen before the end of this year, so keep an eye on this website or Twitter.

Tuesday, August 13, 2019

Data are people

In a sobering visualization project from 2017, Reuters Graphics gives an idea of the scale of the Rohingya exodus from Myanmar to Bangladesh. Instead of having abstract shapes—bars, points, lines‚—represent people, each person is represented by a 3D figure. The result is breathtaking:

Enrico Bertini has called this type of visualization 'anthropographics'. In a 2017 article about them, after mentioning an interview with Paul Slovic on the notion of statistical numbing, and a classic piece by Jacob Harris, Enrico wondered whether anthropographics can elicit empathy. I won't spoil the results of his experiments, but they seem to be in line with the cautious skepticism I still spouse—and that I'm willing to abandon: I don't think that visualization alone can cause a feeling of empathy in the strict sense of that term; what it can—and should—do is to provoke thinking, concern, and compassion.

Coincidentally, this past weekend The New York Times published a story about a Rohingya teacher that describes the beginnings and impacts of the humanitarian crisis. Combined, the Reuters visualization and the NYT profile remind us that, in cases like this, each data point is a person.

Monday, August 12, 2019

Forking paths visualization

The New York Times's Sahil Chinoy explores which factors—education, religion, race/ethnicity—are more predictive of party affiliation. The first visual is a short quiz that reveals the political leanings of people like you. Men like me—white, straight, with a college degree, and no religion—favor the Democratic party by a margin of 48 points:

Chinoy's project is another example of “me” layer at work: if you want to interest readers, open up your story by showing them how the numbers relate to them, why they matter, or let each of them see where they are in the data. Then proceed to provide detail and context.

The large diagram below summarizes the forking paths; the story points out that recent polarization has been accompanied with an increasing demographic homogenization of the parties.

Interesting fact: If you're white, it doesn't matter much which religion you profess, but how religious you are:

Chinoy has mentioned where he got inspiration from: this project by The Economist, and this decision tree by Amanda Cox. Well done.

(Update: the print version of the piece appeared yesterday; click to expand):

Friday, August 9, 2019

Yes, Nate Silver knows how to read charts

Nate Silver is getting pushback because of this tweet from Tuesday:
Some responses in that same thread are a reflection of the extremely polarized times we live in. Splinter's Naomi LaChance's even wrote a hot take titled “Numbers Guy Nate Silver Struggling to Read a Chart”.

Does Silver struggle to read charts? Come on. According to the Washington Post story, Bernie Sanders has the highest proportion of supporters who didn't donate to any other candidate in the Democratic primary (more than 80%); other major candidates range between ~50-70%. Does that mean that Sanders supporters only like him, and that only them like him, as Silver wrote? Of course not, but Silver's tweet is clearly hyperbolic, or at least I read it that way first time I saw it.

And there's truth behind it: Sanders's base is the most loyal in the Democratic primary by a noticeable margin. Let's go beyond the percentage—which is very high in itself—and think of counts: Sanders's donor base is enormous (750,000, source): 80% of 750,000 is 600,000 people who donated just to him; that's impressive. The two candidates who have a number of donors closest to Sanders, Elizabeth Warren (420,000) and Pete Buttigieg (390,000) have much lower percentages of exclusive donors: ~55% and ~62%. Knowing this beforehand, my understanding of the chart above changed quite a bit.

In any case, if there's something I've learned in the past few years is that it's great to be critical, but we should also strive to interpret whatever we see or read more charitably. Books such as Arthur C. Brook's Love Your Enemies or Todd May's A Decent Life may help with that.

Thursday, August 8, 2019

First reactions to 'How Charts Lie'

How Charts Lie is two months away, and some media organizations and people who write about design, visualization, statistics, data science, etc., have already received copies of the softcover galley proofs (the actual book will be hardcover) and are reacting to it. Washington Post's Christopher Ingraham has tweeted this:

Kaiser Fung has published the first review of 'How Charts Lie'. I'm happy that people are reading beyond the intentionally provocative title, and getting what the book is truly about: it's certainly about how charts can deceive and how we lie to ourselves with them, but it's more about how anyone can become a more attentive and informed chart reader:
Few of us learned how to create charts from first principles. No one taught us about axes, tick marks, gridlines, or color coding in science or math class. There is a famous book in our field called The Grammar of Graphics, by Leland Wilkinson, but it’s not a For Dummies book. This void is now filled by Alberto Cairo’s soon-to-appear new book, titled How Charts Lie: Getting Smarter about Visual Information.
Finally, here are some early blurbs by best-selling authors:

Friday, August 2, 2019

Mixed feelings about a series of lovely maps

I know, I know, I've complained before—and will keep complaining—about thematic maps that lack legends—a trend that seems to be gaining traction in news visualization,—but allow me to recommend this recent NYT story about donors to Democratic primary candidates, which showcases plenty of them.

I must admit the maps look lovely, and they do help spot regional patterns so, to certain extent, they work. It's just that, as a reader, I feel very uneasy when designers don't tell me what range of quantities shades of color represent. Moreover, as a visualization designer myself, I think that maps without legends are similar to graphs without X/Y-scales, something that may undermine not only the effectiveness of the graphic—you can see more and less, but not get even the faintest idea of how much more or less,—but also trust.

My favorite maps in the story are the smaller ones at the bottom of it:

Thursday, August 1, 2019

Yes, the chart is misleading, but the numbers are still alarming

For some reason I missed the discussion about the following graph published by The New York Times. It's a great example of how misleading a chart can be if we show just the most extreme data points and ignore the rest (you'll see some examples of this in How Charts Lie.) Here's a detailed critique.

Georgetown University's Erik Voeten explains that the problem is that the researchers who generated the data asked people to rate the importance of democracy on a 10-point scale. The NYT graph depicts just the percentage of people who chose 10 (“absolutely important”). Voeten suggests to chart the average scores instead, which is a good idea; there is a clear decline, but it isn't as steep:

Jeff Guo proposed this alternative graph, showing the percentage of U.S. respondents born in different decades who chose each of the scores:

Guo's and Voeten's charts are better than the NYT one, as they provide a more complete overview of the survey, but all of them alarm me greatly: just over one third of people in the U.S. born between the '70s and '90s say that democracy is “absolutely important” and the percentage who are neutral or lukewarm about it has risen sharply.

UPDATE: Kieran Healy has just reminded me that he discussed this example in his book.

Wednesday, July 31, 2019

The Mercator projection isn't “a monstrosity"

I've been revisiting some critical readings about graphics lately, such as Ruben Pater's little book The Politics of Design. It's a fine collection of design successes and failures accompanied by concise commentary. You should read it; it has a chapter about information graphics I largely agree with.

With one exception: the page about the Mercator cylindrical projection. Pater writes that it's “a monstrosity” that “gives us a [colonial] sixteenth century world view”. But Mercator's projection isn't a “monstrosity”, regardless of what you may have heard in that famous (and funny) scene from The West Wing.* This is a persistent myth.

Poor Gerardus Mercator. His projection is perfectly acceptable—if you know why it was designed and what's useful for: sailing and depicting small regions, not the world, as Pater himself mentions; he next wisely proposes the Winkel tripel projection as an alternative for world maps (it's National Geographic's choice.)

We designers must be wary of the possible misuses of the artifacts we create, and do what we can to avoid them. Mercator is indeed often misused; think of Web Mercator, for instance. However, if something is correctly designed—and the Mercator map is—both designers and viewers have a responsibility to pay attention, make an effort to understand it, and use it just for the purposes it's intended for. This is the theme of the last chapter of How Charts Lie.

(Update: M.F. Hartmann points out a possible mistake on the page: I didn't notice it, but that projection doesn't seem to be Mercator's, but the Gall Stereographic projection, which also distorts area, but not as badly.)

* Side note: as Mark Monmonier chronicled in his masterful Rhumb Lines and Map Wars: A Social History of the Mercator Projection, the Gall-Peters projection is sometimes proposed as an alternative to Mercator. I agree with Monmonier when he writes that Arno Peters, one of the creators of that projection, was an effective self-publicist, and that his projection is a true monstrosity, a twisted and ugly one. Instead, choose a compromise projection when designing world maps.

Tuesday, July 30, 2019

Correlation, causation, and deceitful charts

Enrico calls my attention to a tweet by notorious grifter Charlie Kirk. Kirk wants you to see a “correlation” in the fact that “America's worst run cities” haven't had Republican mayors in decades. That's bullshit.

Bullshit was already pervasive when Harry G. Frankfurt wrote his classic, and its presence in the public arena has only increased. I suspect that Kirk intuits there's something fishy about his “correlation” and doesn't care. That's the very definition of bullshit: facts can be true or not and still be bullshit because of how you employ them. It all depends on whether you care about the truth, and not about how you want to be perceived. Here's Frankfurt:
Someone who lies and someone who tells the truth are playing on opposite sides, so to speak, in the same game. Each responds to the facts as he understands them, although the response of the one is guided by the authority of the truth, while the response of the other defies that authority and refuses to meet its demands. The bullshitter ignores these demands altogether. He does not reject the authority of the truth, as the liar does, and oppose himself to it. He pays no attention to it at all. By virtue of this, bullshit is a greater enemy of the truth than lies are.
(Side note: this interview with Frankfurt is great.)

There's so much wrong with Kirk's tweet that it's hard to decide where to begin. For instance: Are cities run like schools or prisons, with principals or wardens that exert absolute control? Of course not. Cities are organic, chaotic entities. Are those cities on the list really the “worst” in the U.S.? What do you mean by “worst”? How do you measure that? After all, many small towns and rural regions have dreary health and life quality metrics.

I happen to know all those cities; they do face challenges (crime, poverty, homelessness,) but they are also wealth, science, culture, and creativity hubs. When a human group increases in size, extremes—both positive and negative—become more likely and visible. That's a matter of probability. Moreover, does having Democratic mayors for decades lead to being “worst”, or does the fact that these are big cities—with tons of racial and ethnic diversity, and plenty of highly educated people—lead to electing Democratic mayors? Causality is complicated.

How Charts Lie is devoted to explaining how we lie to ourselves with charts—numerical tables like Kirk's, graphs, maps, infographics—that may or may not be designed to deceive. The intentions behind a chart are secondary in comparison to the fact that we are all prone to projecting what we want to believe onto whatever we see or read. We can do better. Judging by the responses to Kirk's tweet, many from people persuaded by his “correlation”, we still have lots of work to do.

Saturday, July 27, 2019

Another grossly incompetent lying chart by climate deniers

Climate deniers are a reliable source of grossly incompetent lying charts. Just remember this one, which you'll see in How Charts Lie. Nigel Hawtin has just sent me a link to the gem below by The Global Warming Policy Forum, a British organization that says that the scientific consensus is climate “alarmism”, and then engages in some actual alarmism by warning us about something called “climate communism”.

The GWPF director is Benny Peiser, whose academic specialty is analyzing sports and physical activity. That means that he is as qualified as I am to discuss climate science with rigor (disclosure: neither of us is qualified at all.) However, and unfortunately for the GWPF, I can discuss charts, and the one above is a blatant example of cherry-picking by conveniently cropping the X-axis to select the time frame that “proves” what you want to believe.

It's perhaps not a coincidence that the GWPF chose 2014 as the starting point for the chart; 2014-2016 were El Niño years, and these tend to be warmer. Moreover, you can see the temperature slightly picking up by the beginning of 2019. NASA has a comprehensive article about why the Earth isn't really cooling, and it discusses the GWPF chart.

Anyway, if we want to discuss temperature variation and anthropogenic impact on it we need to go back in time to display long-term trends. Here's an example from the same thread.

Chart by Datagraver

(Another trick liars use—but not in this case, this is a warning about other charts—is to select their data carefully before making any graphic. Climate deniers don't just crop chart axes, but also favor nominal temperatures, often from specific regions. Real scientists recommend to show temperature anomalies instead. Anomalies are differences from a baseline temperature that is often the average of decades of data. This is a good explanation of how anomalies work and why they are a better choice if the goal is to have an informed conversation, not to lie.)

Friday, July 26, 2019

A new infographics book for all ages

I learned about The Infographic Energy Transition Coloring Book (TIETCB) when Ellery Studio's Bernd Riedel, and I were on the jury for the Malofiej Infographics awards this year. Bernd showed us a limited edition of the book, made by his team, and we all loved it. It's a fine collection of hand-drawn infographics for all ages.

Ellery is now planning to reach a larger audience, so they've created a Kickstarter campaign to fund a new printing of TIETCB. If you share my love for books that could get younger generations interested in the visual display of data (here's another example) you should take a look at the sample pages and read this article by Fast Company.

Thursday, July 25, 2019

Making data concrete with 'photovisualization'

The New York Times' Alexandra Stevenson and Jin Wu provide context to the recent protests in Hong Kong by explaining the dire living standards many people in the city endure. The piece contains some fine graphics such as this map. . .

. . . this comparison of housing affordability. . .

. . .And this pictorial diagram of the average living space per person in different large cities. which is quite striking:

My favorite visual, though, is the opening one, a great example of what Nicholas Felton calls photovisualization, or “photoviz” in his most recent book. We know that many people have a hard time bridging the gap between the unavoidable abstraction of visualizations and the realities that they depict or describe, so something like this is both informative and persuasive, particularly if we pair it with more orthodox graphs and maps, as it's the case in the NYT story:

Wednesday, July 24, 2019

'What If?' visualization

FiveThirtyEight's Ella Koeze wonders How 13 Rejected States Would Have Changed The Electoral College. The result is an example of what we could call 'What If? visualization', and that is related to what I wrote in this post about the 'me' layer. Koeze explains:
Our current state borders are fairly arbitrary. Throughout American history, people have been proposing new states, but most don't appear on the map today, either because they once existed but were later redrawn, or because they simply never caught on. But what if some of these would-be states were around today? Would moving those state borders, without changing any votes, change our political reality?

FiveThirtyEight is quite fond of this kind of alternative-scenario visualizations and simulators. These are some of my favorites: The Atlas of Redistricting, Should You Get Married (or Divorced) for Tax Reasons, and Hack Your Way To Scientific Glory, which allows you to p-hack like a pro.

Tuesday, July 23, 2019

Bringing visualization to the masses

Illustration by Antoine Orand
As I've explained here already, How Charts Lie is my first book for the general public and—I think—the first visualization book published by a major publisher as a hardcover not aimed exclusively to designers, scientists, or analysts.

Despite its title, How Charts Lie has a positive tone: it certainly discusses how people use charts to lie to others, but most of it is devoted to explaining how we lie to ourselves with charts, and how to become better readers —and, by extension, creators—of visualizations.

The reason is that I believe that visualization is a bit like writing: anybody can take advantage of it by learning how to reason about numbers and graphics, and then choosing a tool or two. The more we lower the barrier of entry, the more voices we may have.

This is the theme of an article Google News Initiative's Simon Rogers and I have just written for Nightingale, the journal of the Data Visualization Society. Simon and I have been collaborating for years in a series of visualization projects and tools that are intended to be useful to anyone. In the article we explain our goals and hint what our next steps are (hint: a few more free tools I can't tell you much about yet and a new Massive Open Online Course.)

Monday, July 22, 2019

Absolute or relative values?

In visualization sometimes the simplest choices are the hardest ones to make. My favorite example is whether to show absolute or relative figures. Take this map by the Urban Institute. It displays what would happen if the Affordable Care Act were repealed. It's the graphic chosen to promote this report on the Web and in social media.

Montana (+177%), West Virginia (+176%), and Maine (+165%), would witness the largest increases, but that's because their populations are small (1.0, 1.8, and 1.3 million). Not surprisingly, the largest absolute increases would happen in more populous states: California (+3.8 million insured), Texas (+1.7 million), and Florida (+1.6 million). You can see the data here. These are the top 10 states in absolute terms:

What's the right choice, total counts or relative values? This is always a decision I struggle with, and my answer is often both. On one hand, it's advisable to use adjusted data—percentages, rates—when designing a choropleth map, but is that map alone enough? Why are 83,000 Mainers and 112.000 Montanans represented by a darker color than 3.8 million Californians or 1.6 million Floridians?

Moreover, what's more informative to someone interested in a topic like this, the relative change or the total number of people who would be left without health insurance if the ACA disappeared? I'd choose the latter, but we should never assume that our preferences are representative of a majority of viewers.

Wednesday, July 17, 2019

Keep those legends

There was a debate today around maps from a Washington Post investigation about the opioid epidemic.

Some people praised them because they lack legends (other maps in the same story do have them). They argued that legends aren't sometimes that necessary because most viewers don't look at them anyway, as they just focus on the overall more-less message of the chart, not on its details. An article used in the debate even suggests removing sources.

I'm writing this on my phone, as I'm not in front of my computer today, so allow me to be brief: please never remove sources, and consider keeping your legends, particularly if you can make them unobtrusive (small, or placed behind an on/off button) and simple. Think twice before getting rid of them.

Legends aren't optional add-ons. They are an integral part of visualizations. I agree that a legend in a graphic aimed at a general audience shouldn't be overly detailed, but removing it completely may be going too far in most cases (I'll concede, though, that it might be acceptable in certain specific situations, including the map in the discussion, or these).

It turns out that I'm reviewing the results of some controlled observations I was part of, and I've seen that (a) many viewers can and do ignore legends but, (b) some others refer to them when reading a visualization, and feel frustrated and distrust the graphic if they can't find what they need. Contrary to what some of you may believe, this isn't dependent on education: people in this latter group didn't necessarily have college degrees. If you can place a legend on the side or at the bottom of your graphic, why wouldn't you serve them? Let's be cautious.

Saturday, July 13, 2019

Nonsensical diagrams

I have a soft spot for nonsensical diagrams. I don't cover them in-depth in How Charts Lie, but I have a small collection in my computer that I return to when I'm in need of a good laugh. This one is from Sebastian Gorka's PhD dissertation:

Here's my favorite from the Gorka subfolder; I'm no expert in geopolitics, but I'm quite certain that the “mechanics of terrorism” are more complicated than this:

Some of Gorka's visualizations are puzzlingly minimalistic, making me wonder whether they are necessary at all. See this beauty:

Gorka writes about this one that it's “frighteningly complex” and that it defies “many conventional wisdoms.” It does indeed!

This is the structure of Al Qaeda; I wonder whether a PhD dissertation shouldn't be a tiny bit more specific:

It's easy to make fun of Gorka, who is just a boor and a grifter, but nonsensical diagrams also appear in best-selling books. In 2016 Rolling Stone's Matt Taibbi launched a contest to ‘Make the Most Meaningless Thomas Friedman Graph’, which has its own Twitter hashtag, #FriedmanGraphs (don't miss it). Taibbi was spoofing Friedman's book Thank You for Being Late, which showcases graphics like this:

My favorite from Taibbi's contest isn't the winner, but this one:

Self-help guru Jordan Peterson is famous for his YouTube lectures and his doorstop 12 Rules for Life. I suspect many of his fans haven't read his previous Maps of Meaning. The diagrams in it were described on my Twitter feed as Dungeons&Dragons campaign maps:

(FYI: I try to design far better maps when I play Dungeons&Dragons).

Nathan J. Robinson called Peterson's graphics “masterpieces of unprovable gibberish”. He has a point:

Some of Peterson's diagrams are pretty heavy metal:

Footnote: A while ago I made fun of Peterson's diagrams and one of his fans replied that it was unfair to critique them without reading them in context. I agree, but it turns out that I did read Maps of Meaning, and can understand its philosophical references. This doesn't make the diagrams better, but even funnier.

Thursday, July 11, 2019

Updated website and calendar of talks

My collaborator and student Yuan Fang has been quietly redesigning my website, It's now more focused on the books and consulting, and its style is simpler. We've added a calendar of talks that I'll update regularly; I also have a Google sheet with the same information.

You'll see many events related to How Charts Lie close to its publication date, October 15. I'll be visiting New York, DC, Boston, and many other cities (there'll be book signings at some of these places):

Also, How Charts Lie is just three months away, so my publisher, W.W. Norton, is already promoting it. They've created a humorous flyer that reflects the book's lighthearted tone, and that has already appeared on its Amazon page.