Uncertainty, graphicacy, and the power of statistics

Power from Statistics is an initiative by Eurostat and the European Political Strategy Centre. Their conference in Brussels begins today, so they've just launched their report. I attended one of the roundtables that led to this document, so they asked me whether I'd write an article for it.

The result is “Uncertainty and graphicacy: How should statisticians, journalists, and designers reveal uncertainty in graphics for public consumption?” (PDF), which consists of some miscellaneous thoughts about why the public —including journalists!— don't grasp uncertainty, how people even misread common charts, graphs, and maps, and what we could do about it. If you have attended any of the Visual Trumpery lectures, a few of these musings may sound familiar.

This is an essay in the literal sense: Writing intended to help myself think and play with some ideas, such as the timeline of “Golden Ages” and “Dark Ages” of visualization (below,) so take everything with a grain of salt. I'm more than willing to change my mind on anything I wrote. Enjoy.

(This same article will appear soon —abbreviated— in a book about visualization in the news. Consider ordering it. It looks fantastic.)

Interview at Stats & Stories

A few weeks ago I had a fun conversation with John Bailer, Richard Campbell, and Rosemary Pennington, the people behind the Stats and Stories podcast, also available at NPR.

We talked about graphicacy, Visual Trumpery, how numbers mislead, and many other topics. Listen to it if you have time.

Ain't Data Truth?

My University of Miami colleague Hiram Henríquez has created an infographics non-profit organization called Ain't Data Truth, which he presented at the Malofiej Infographics Summit this year. Its main output is a series of enormous data posters about current events. Take a look at the “topics” menu on the upper-right corner of his website. You can also send him your feedback and suggestions.

Hiram worked in news media for many years (Miami Herald, National Geographic magazine, etc.,) and now, besides teaching at our School of Communication, keeps doing freelancing and side projects like this. Here's the latest poster he has published, about climate change (hi-res PDF here):

Visualizing gender and race inequality in newsrooms

Our latest project in the collaboration with Google News Lab —read about it here and see all projects here— is an exploration of gender and race in U.S news publications. It was designed by Polygraph based on data from the American Society of News Editors (ASNE,) which has also published an article about it.

One of the reasons I love this interactive visualization so much is the multiple ways in which the data is visualized —bubble plots, scatter plots, dot plots, tables, etc.— and also the animated transitions between them. Don't miss it.

An update on Visual Trumpery: New cities and dates

I've already delivered the Visual Trumpery talk in Mexico DF, Barcelona, and Atlanta. Speros Kokenes, who organized my visit to Atlanta, has written an article about it.

In October I'll be visiting Portland, Berkeley, Redlands, New York City (where nearly 300 people have already signed up!) and Ithaca (Cornell University.) The November schedule is also quite packed.

I'll soon add new places and dates confirmed for 2018, including Washington DC, Baltimore, Miami, Chicago, Columbus and Athens (OH), Syracuse (NY), Auckland (NZ), London (UK), and a few cities in Spain, Canada and Poland.

As I explained a while ago, this is what you need to bring Visual Trumpery to you:

1. A flight (coach is fine) from Miami and a place to stay (I'm not picky.)

2. I won't charge anything, but I'll need some minor expenses covered, such as taxis to and from the airport, meals, etc.

3. You must arrange a venue and announce broadly. I can help with promotion through social media, of course. The talk cannot be part of a paid-for event or conference. It must be open to the public.

4. I can present in Spanish, Portuguese, and English. If the audience speaks any other language, we may need an interpreter.

Low-tech visualization: How much space newspaper front pages used to cover hurricanes

Many of my students get a bit overwhelmed at the beginning of each semester by the amount and variety of tools we use in class. I decided to show them that sometimes you can create pretty neat visualizations with rather pedestrian techniques, such as drawing basic shapes in programs like InkScape or Adobe Illustrator.

I spent a couple of hours today designing the two graphics below. They depict the space devoted to the threat and consequences of hurricanes Harvey, Irma, and Maria in the past month on the front pages of The New York Times and The Washington Post. I first downloaded all cover images from both websites, and then drew the rectangles over them. Colors correspond to the region or regions that are most prominently mentioned on each story.

To see these in high resolution (ai, pdf, svg,) go to this folder. Feel free to use them.

UPDATE: Lynn Cherny and Moritz Stefaner have just told me that designer Krisztina Szűcs did something similar a while ago. It looks great. Check it out.

Updated Tutorials & Resources section

You're probably aware that this website has a Tutorials & Resources section where I post videos myself or other people have recorded. I've just updated that section with a short and quite informal tutorial about RAWGraphs, a great and tool by the Density Design lab.

These are materials I use in my classes at the journalism and interactive media Masters programs at the University of Miami. I'm a fan of “flipped classrooms,” so I don't devote precious class time to basic software training, just to answer questions about the tools, or to explain advanced techniques if students need them. Tutorials take care of software basics, so we can use class time for lectures, discussions, critiques, and feedback on exercises.

Here's a diagram of what we use each tool for:

Before you ask: Yes, Tableau and PowerBI are part of my classes, but later in the semester.

A conversation about designing better visualizations —and spotting misleading ones

I talked to the good folks at Discourse a while ago. If you enjoy in-depth journalism, you could consider following them. They've just published an edited version of our conversation, keeping just the good parts.

We discussed several takeaways from my Visual Trumpery lecture series and some of my to-go sources for great news visualization. Here's the most timely part, considering that ProPublica has just announced their partnership with several data scientists:

In visualization, captions are as important as graphics themselves

(Updated on September 8 and 9. Go to the bottom of the post)

Data visualization isn't just about visualizing data, but also about writing headlines, intros, captions, explainers, and footnotes. I'm right now closely following the news about hurricane Irma —I live in Miami!— and feeling both amazed and terrified by the many great graphics news organizations and independent designers are publishing. As I've just tweeted, beauty is sometimes correlated with terror.

Anyway, I've just read a very good graphics-driven story in The Washington Post. This is its first map:

This is its caption:

I'm no expert in weather forecasting, but I believe that this is inaccurate. To learn why, go to minute 14:30 in my keynote at Microsoft's Data Insights summit. Here's some of what I said there:

Maps based on cones of uncertainty are quite problematic, as this article by Jen Christiansen, and this other by Robert Kosara explain. Among other reasons, some people don't see in that cone the possible range of paths the center of the hurricane can take, but the size of the hurricane itself.

This happens event to those who, like me, do know how to read this kind of map. I need to consciously struggle with my brain's inclination to see a physical object, and not a probability range. Why? I don't know for sure, but I'll make a conjecture: it's because the representation looks pictorial. The rounded shape of the tip of the cone roughly resembles the shape of a hurricane.

This map is made even more confusing if a black line is placed in the middle of the cone. Just read tweets like this. People may see that line not as a visual aid to emphasize the center of the cone (right), but as the most probable path (wrong).

Going back to the caption, the reason why it sounds wrong to me is related to something most of you probably aren't aware of: the cone of uncertainty doesn't represent the range of all possible paths the hurricane could follow, based on simulations. This excellent paper explains that the most common cone, the one by NHC, “accurately predicts the ultimate path of the tropical cyclone’s center about 2/3 of the time (J. Franklin 2005, personal communication). In other words, one out of three storm centers directly impact areas outside of the cone.” That's a 66%-33% chance.

Therefore, the caption could say something like this: “Based on predictive simulations of past hurricanes, there are 2 out of 3 chances that the path of the center of the hurricane could be anywhere within this cone, and a 1 out of 3 chance it will be outside of it.” This is longer and clunkier —I'm sure that any copy editor in the audience can improve it!— but truer to reality.

This other map shows the actual uncertainty of predictive simulations quite well; notice the faded lines, corresponding to less probable (but still possible) paths:

UPDATE: It seems that NOAA is listening. See the explanation that they have been tweeting. It ought to be published next to every single cone of uncertainty map out there:

UPDATE 2: The map below, by meteorologist Ryan Maue, is far better than any cone of uncertainty map if your goal is to inform the general public about the risks posed by wind. See it animated. The scale is predicted maximum wind speed in mph.

How to have fun with visualization

I'm often amazed by the power of visualization and infographics to unveil truths hiding behind complexity. Surprise and wonder are emotions a good graphic may cause. However, I don't remember myself laughing out loud when exploring a data visualization.

This changed a few months ago, when my friend Xaquín. G.V. (Twitter) began working on a project that's part of my ongoing collaboration with Google News Lab. As you may remember, I'm art-directing a series of experimental visualizations by top designers from all over the world. You can see them all here.

Xaquín has been head of visuals at The Guardian, and a graphics editor at National Geographic magazine, The New York Times, Newsweek, and El Mundo, in Spain, where we worked together between 2001 and 2005. His virtues are many, but the one that you'll first perceive if you ever meet him in person is his wild sense of humor. Xaquín is a genuinely funny fellow, and that shows in his style. Just take a look at his project for us, titled How to Fix a Toilet, and the article explaining how he made it.

And, yes, as Xaquín wrote in that article, this is my favorite animation by far:


A single data point is often meaningless without its context

(Chart updated with a suggestion by Andrew Losowsky)

In case you haven't heard, we're bracing for a monster hurricane down here in Miami. While praying to the gods of uncertainty and chance to push it a bit to the East, back into the Atlantic Ocean, I decided to relax for 15 minutes from installing shutters and getting supplies by designing a quick chart. I'm offering it for free to Breitbart News.

This morning, Breitbart published a story by reporter John Binder with this alarming headline “2,139 DACA Recipients Convicted or Accused of Crimes Against Americans.” This is its lede:
As Attorney General Jeff Sessions announced the end of the Obama-created Deferred Action for Childhood Arrivals (DACA), from which more than 800,000 un-vetted young illegal aliens have been given protected status and work permits, the number of them who are convicted criminals, gang members, or suspects in crimes remains staggering.
It's a staggering number indeed. Staggeringly low. If there are roughly 800,000 DACA recipients, 2,139 of them are just 0.27%, or roughly 3 out of 1,000. The numbers in the story suggest that undocumented youngsters protected by DACA commit proportionally far fewer crimes than American citizens do. This is, of course, if the data are reliable. Breitbart mentions its source, USCIS, but doesn't link to the specific report this comes from.

To put these data points in context I made the charts below; sources are thisthisthis, and this. If you have other figures that would make for a better comparison, let me know. For instance, to be more accurate we'd need to get the felony rate just of Americans who were under the age of 31 as of June 15, 2012. This was one of the requirements to apply for DACA. Also, it might be the case (I don't know) that a conviction doesn't immediately lead to your DACA status being revoked.

Finally, we'd need to consider the rate of Americans who have been convicted of “significant misdemeanors” or are affiliated with gangs, as the second graphic only plots felony —but not misdemeanor— convicts up to 2010. This might make the difference even larger:

Improving Kid Rock's map t-shirt

A couple of days ago, on Thursday, I gave the first Visual Trumpery talk in the United States. The public in Atlanta was wonderful, and the comments during and after the lecture were very inspiring.

People seemed to have fun with the discussion about when to use state-level maps, county maps, or cartograms to show results of presidential elections. The most celebrated moment was the constructive critique of a t-shirt sold by Kid Rock. My suggestion was to use a county map, as the borders between the U.S. and Dumbfuckistan aren't accurately depicted at the state level. You can see the slide below.

(Read more about the Visual Trumpery tour here. Go beyond the talk title, and read its description, as the title is intended to trick you. If you want to sponsor a talk —I won't charge salary,— here's how.)

Visualizing the German elections

The latest project in our ongoing Google News Lab visualization series is based on searches for candidates in the German presidential election. This is the second time we've collaborated with Moritz Stefaner, after his very successful The Rhythm of Food and, as in that case, my role as art director of the series was limited to a few suggestions here and there.

This interview with Moritz explains how the project was done.

To learn more about the goals of this series of experimental visualizations, read this article at FastCo Design. You can see all our previous projects, articles, and hangouts here.

Shock and precision in visualization

Comparisons and context are at the core of data visualization. We humans have a hard time grasping large numbers, such as “9 trillion gallons of rain,” so transforming that magnitude into a pictorial illustration may help. Here's a graphic by The Washington Post; it's part of this story:

John Grimwade has a great collection of this kind of side-by-side pictorial comparisons. They surprise and illuminate, but they are quite limited. Sometimes it may be preferable to present data in a more abstract and precise manner, like this (h/t Sam Lillo):

Which graphics are better, the pictorial or the abstract? There is no better. It all depends on what your goals are. The former graphic is shocking, but it doesn't enable any analysis. Its power and its limitations reside in its bluntness. The latter lacks visual punch, but it's rich in detail.

No visualization is ever perfect, useful in every case and for every purpose, so the key to a successful data presentation is often to not limit yourself to just one graph, map or diagram, but to combine different kinds. The shocking ones work as a headline, pulling readers into the narrative by intriguing them; then, the more detailed ones provide some needed context and depth.

(The Washington Post has other graphics stories that are also worth your attention.)

UPDATE: Felix Salmon has just told me about a mistake in the copy of the first graphic. It reads “four miles square” when “four square miles” (the area of the base of the cube, 2*2 = 4) is more accurate. Felix adds: “just to be unambiguous: "four miles square" means a square with four-mile edges, i.e. 16 square miles.” In visualization, words are as important as visuals!

UPDATE 2: The Post has just corrected the graphic (see comment below)

Counties aren't citizens

Conspiracy theorist and far right personality Jack Posobiec is about to re-launch a self-published book titled Citizens for Trump. He has just shared the new cover on Twitter.

The map he chose —which I've already critiqued, so I won't bore you anymore— contradicts the title. He should change either of them. As part of my pro bono efforts to spread the word about good visualization practices, I decided to suggest a small edit. This is a slide in my Visual Trumpery talk series this coming Fall and Spring semesters:

UPDATE: Mike Cisneros proposes this. I love it!

More articles about the dangers of maps: 1, 2, and 3.

Echoes of Minard: Axios maps the flow of goods between states

Axios Visuals' Chris Canipe's The Flow of Goods Between States is a a little prodigy of simplicity, reminiscent of Charles Joseph Minard's flow maps. Notice the “how to read it” paragraph. This kind of unobtrusive annotation, with many historical precedents, greatly increases the chances of readers comprehending unusual graphic forms. The bullet point list of takeaways at the bottom of the piece is also nice.

I'd like to make a suggestion: Wouldn't it be great to add some kind of table or graph showing the top five —or ten— points of origin and the top five points of destination of each category of products, along with the quantities freighted? I'm a fan of maps but, as I've pointed out elsewhere, multiple representations may yield different insights from the data.

h/t Samuel Arbesman

Stack and unstack

My friend Geoff McGhee, who works for Stanford University's Bill Lane Center for the American West, has just published a nice series of interactive graphs and maps about California's move toward renewable energies. One of them caught my attention. Here's an animated GIF of it:

This is a nice example of a principle I explain in The Truthful Art: Stacked graphs show both the total and its components, but they emphasize the former, not the latter. When the total is more relevant than the parts, a stacked graph may be an appropriate choice.

However, what if seeing the variation of each portion with great accuracy is as important as the total? Then you need to let readers unstack the graph, or see each portion separately. Otherwise, estimating the variation of the parts not sitting on the horizontal baseline is hard.

Here's another example, a little classic by The New York Times.

Brit Hume misreads a graph and unintentionally parrots a White House talking point

Fox News pundit Brit Hume complained about this graph from the Congressional Budget Office's report on the Senate Better Care Reconciliation Act (BCRA):

Here's Hume's tweet, replying to an article by Jonathan Cohn claiming that the BCRA would reduce Medicaid by 26% on 2026 and by 35% on 2036, which is exactly what the CBO suggests in its report:

The graph isn't misleading at all if you bother to read its title, X-axis labels, and source. What Hume is doing here is to unintentionally parrot a White House talking point, calling the future shrinkage of Medicaid not a “cut”, but a “reduction in the rate of increase.”

The way we use words frame discussions, so I tweeted at Hume with the following analogy, which all pundits can grasp (I certainly can!): Imagine that I hire you for the next 20 years, and our contract limits your yearly salary “increase” to just 1% a year. Then, we let inflation play its magic. After two decades, when comparing your purchasing power in 2037 to the one in 2017, wouldn't you call that a “cut”? You'd certainly experience it as such, if inflation stays at its current level.

This isn't a great analogy —for one thing, I don't know if Medicaid increases take into account the rising costs of healthcare. However, I was just trying to illustrate the difference between absolute change and relative change, and the fact that “increases” are always relative to variables like needs. In the case of Medicaid it's not just inflation (if it isn't built in already) that may have pernicious effects in the future, but also predicted variations in population size, its composition, economic growth, number of Medicaid recipients (gross expense vs. per capita expense,) societal expectations, etc. I'm no expert, but that is what the CBO said.

Hume then replied to my tweet:

His response unleashed a barrage of abuse from the troll horde that follows him, which also attacked Cohn. Some accused me of engaging in “liberal Math” and many xenophobic goons told me to go learn some English. Not a bad suggestion, I'll admit, but, as I'm busy, I decided not to waste time replying to each of them. I erased my tweet, and decided to write this post instead.

What the White House and its propaganda machine are doing —and I'm not claiming that Hume is part of it; he strikes me as a polite and professional fellow— is to twist the English language the aforementioned trolls claim to cherish. They do it to sugarcoat their message, as they know that cuts to the safety net —present and projected— are unpopular.

Contrary to what Hume claims, an increase that doesn't keep pace with projected inflation and other relevant factors, like the number of people predicted to need aid, is not an increase, unless that you believe that the piece of paper below has value on its own, independent of the economy:

That isn't how money works. To quote Charles Wheelan's wonderful Naked Money, “it is a piece of paper with no intrinsic value.” Here's how I reason about this: Let's say that I give my kid $10 dollars a week, which he uses exclusively to pay for his favorite hobby, ice skating. I drive him to skate with his friends every Friday evening.

Then, at the end of 2017 I decide to increase my kid's 2018 ice skating allotment to $11. Is that a real increase? It depends. In isolation, it sounds like it. My kid will be receiving 11 wrinkled sheets of paper instead of 10, after all.

But if the cost of an evening of ice skating changes to $12 and I can't drive him anymore, so he needs to pay $1 for the bus, my kid will lose purchasing power. And here comes the critical part, which is what companies and governments do all the time to control future expenses: If somehow I can predict —with the always unavoidable uncertainty attached to any long-term forecast— that ice skating will get that expensive next year, and that my kid will need to take the bus, I'll be doing something akin to cutting his allotment by adding just $1 to it.

(Imagine then that I also predict that my younger daughter will also become a fan of ice skating soon, and will really need to practice it. However, I decide in advance not to give her an allotment, but to split the current one in half, and keep increasing it just $1 a year.)

That's why many are talking about future “cuts” to Medicaid, even people that —not like me— are native English speakers. See this explainer by the Washington Post, this critique of a White House graph by, and Cohn's own tweetstorm.

That said, if Hume's problem is with the word “cut” itself, I'll agree to stop using it. In return, I'd ask him and the White House to abandon the term “rate of increase” in this case, as may be equally misleading. Let's call Medicaid's future reduction a “covfefe” instead. The underlying reality won't change, as numbers are stubborn creatures: Medicaid will be roughly 1/4 smaller on 2026 and more than 1/3 smaller on 2036 in comparison to the original baseline. Exactly what the chart represents.

You may argue that this'd be a good thing —I happen to be a moderate European-style fiscal conservative— or a bad one, but a covfefe is a covfefe.

UPDATE: My friend, economist Jon Schwabish, author of Better Presentations, has sent me the following suggestions for the example:
—You tell your son that you are going to give him an extra 2 dollars every month to keep up with rising costs of ice skating (skate rental, ice time, whatever). He will get $10 in January, $12 in February, $14 in March, and so on. Over the course of the year, he will have received $252 in total allowance.
—You then decide you are not going to give him as much and instead you increase his allowance by $1 per month: He'll get $10 in January, $11 in February, $12 in March, and so on. Over the course of the year under this scenario, he will have received $186 in total allowance.
—Under the new world in which he gets a $1 increase per month instead of $2, his allowance is still growing over the period (you could even make those increases rise with inflation, so there is no actual loss of purchasing power), but it's less than the original promise; in CBO-speak, the original is the baseline and the new world —where he gets the $1 increase— is the new policy.
—That's what's happening in the Medicaid debate. Under BCRA, Medicaid spending grows more slowly than the baseline. I can see Hume's point of not calling that a cut, because there is a connotation that the word “cut” implies a literal decline, which is not what happens in BCRA. That being said, there is a cut relative to the rate of Medicaid growth relative to the baseline, but not absolute dollars.
That is, I guess, if health care cost growth is indeed factored in already, and other aforementioned variables —possible population size, age composition, Medicaid recipients, etc.— are considered correctly. In any case, as I said, let's just find a new term we can all agree on, and call this a covfefe.

New tutorials and resources section

I've completely revamped the Tutorials & Resources section of this website. Check it out. Here are the main changes:

1. I've added Maarten Lambrechts's wonderful Data Cleaning With Excel tutorial to the list, besides a couple of short videos about pivot tables.

2. I've recorded a brand new tutorial about INZight, an easy to use but powerful free tool for data exploration through visualization.

3. The order of the Illustrator tutorials has changed a bit.

Next time I have some time —crossing my fingers— I'll try to do a tutorial about time-series analysis with INZight, and also grab a few free tutorials about Tableau and PowerBI.

These are all videos that I use in my classes at the University of Miami, in the Journalism and the Interactive Media programs. The textbooks of my courses are both The Functional Art and The Truthful Art, besides a long series of articles.

Here are some screenshots from the INZight tutorials:

Student work: Visualizing forest loss in the Amazon

I want to bring your attention to Silent Forest, a large news documentary project that maps forest degradation and destruction in the Brazilian Amazon. The Guardian has a good story about it, and the About section itself can help you understand what it covers, so I won't bore you with details. I just wish to highlight the visualizations and illustrations in it, designed by my student Laura Kurtzberg.

Laura is a student in our MFA in Interactive Media. She came to the program with experience in coding and mapping, and her skills have only gotten better in this first year with us. Silent Forest can give you an idea of what she's capable of: From interactive data visualizations to pictorial illustrations, like the ones you'll see in the section about bird species. She can also model and animate in 3D, but she didn't show off here.

(Full disclosure: I'm listed as a “special advisor in data visualization” but my participation was limited to giving some feedback on the graphics.)