Sunday, March 29, 2020

'How Charts Lie': a few clarifications and edits

(Update 01/08/2020: you can now download most figures from the book in high resolution and in two different color schemes.)


A clarification for page 44: where it says “people became richer or poorer” or “people in those countries contaminated more or less,” I think I'd add a qualifier such as “on average” just in case, as these are per capita numbers.

Second clarification, page 73: the Richter scale is based on tenfold increments of wave amplitude. That's what I mean by “stronger”; in other words: it's not the possible energy released, which increases at higher rates.

Third clarification, page 74: in the fictional example about gerbil population growth, I assumed that the parents die shortly after giving birth; otherwise, by the second generation we wouldn't have double the gerbils (8 children) but triple, a total of 12: 8 children and their 4 parents.

Fourth clarification: whenever you see Kaplan-Meier charts in the book, assume that lines have been smoothed; actual Kaplan-Meier estimators create lines that look like staircases.


If you read the first print edition of How Charts Lie you may notice a few printing and layout errors. These should have been corrected in the e-book already, and also for the paperback version, to be released in October of 2020. If you detect anything that looks strange other than these, please let me know.

I've printed out cards containing the main corrections. If you want to receive one and keep it inside your copy of the print book, contact me.

On page 24 the transparency effects that should emphasize or hide parts of the charts disappeared between the galleys—where the graphic was perfect—and the final printing. Mysteries. Here's how that graphic should look like:

On page 45 the line corresponding to the United States didn't show when printed. Here's the graphic:

There's a minor issue with the gradient on the second bar of the chart on page 142: it doesn't fade to white. It should look like this:

A chart on page 116 is slightly misplaced.

On page 49, the second paragraph should read: “Imagine that a district's circle sits on the +20 line above the baseline. This means that Republicans lost 10 percentage points, which went to Democrats, for a total of +20 percentage point change in their favor (there weren't third-party candidates, I guess.)”

On page 92 there's a needless “is” in a sentence that should read “Unless a crime is premeditated...” On page 104 there's an “s” missing at the end of “assess” (this one made me giggle.) At the bottom of page 128 there's an “as” missing before “Assange”. And on page 157 there's a tiny label that should read MS instead of MI. The last label on the Y-scale of the chart on page 172 should read “600” instead of “490”.

Saturday, March 28, 2020

Rates of change are tricky

Let me begin by saying that (a) we should all appreciate the effort that so many journalists are making to keep the public informed about the coronavirus pandemic; shifts of 10, 12, 14 hours and more are common (subscribe to your favorite news publications, people, be responsible!) (b) Commenting on graphics is easier than making those graphics. As a designer myself, I know how hard it is to navigate the many challenges and trade-offs visualization poses.

This said, I often ponder how we can make visualizations more approachable and understandable. Take the following graph from this New York Times story:

The vertical position of the points on the line represents the percentage change of confirmed cases over the previous 7 days. There are other ways to show change—think of bars with arrowheads pointing up,—but they are clunkier. This graph, if you know how to read it, works fine: the goal is to bring those lines down to the +0% baseline, or close. This point is explained in the body of the story.

However, imagine the following realistic scenario: someone takes a screenshot of this graph and publishes it in social media, adding some personal comments, or wild inferences (1, 2). I wonder whether graphs like this, when isolated from what surrounded them originally, might make some readers reach dubious conclusions or feel too optimistic and confident (“most lines are going down! You're all overreacting! Time to stop worrying and go back to work!”)

Those readers would be missing a crucial point: a 33% increase (line is low) is, in general, better than a 80% one (line is high) indeed, but we need to know more. Prior conditions matter. At the beginning of the chart, the curves are pretty high probably because those are the early stages of each outbreak; few cases were detected. If a city begins with 10 confirmed cases, and later detects 8 more, for a total of 18, it has an 80% increase.

But if we already have many confirmed cases, for instance 1,000, and later we end up with 1,300, we've experienced an increase of +33%. It is better to have +33% than +80%, as it might mean* we're stretching the time it takes cumulative confirmed cases to double or triple—we're flattening the curve, as we say these days—but readers shouldn't ignore other facts. Even a “tiny” 10% increase, if experienced when you're already dealing with tens of thousands of infections, may be catastrophic: hospitals could be even more overwhelmed, leading to more deaths. Think of the situation in Lombardy.

The NYT story contains another graphic comparing rates of change with confirmed cases per thousand people but, as the Times journalists themselves acknowledge, it's “hard to read”:

What to do? It's tricky. Maybe to show more and explain more, as I've suggested before? The New York Times is doing a good job. The body of the story thoroughly explains the pros and cons of these graphics, what they show and what they don't show.

What I fear, though, is that it's too easy to read charts like these while ignoring their footnotes, or to detach the charts from their context. I wonder whether we should produce animated explanations or have presenters explain our visualizations more often, so readers won't be able to separate visuals from their context and annotations. Mediators play an important role.

(* I wrote “might” because confirmed cases aren't total cases. In the U.S. at least, these charts might be showing, at least in part, the increasing availability of testing. Also, in this pet example I'm not considering other factors, such as the number of recoveries.)

Thursday, March 26, 2020

Fourth edition of a classic —plus my favorite visualization books

This morning I replied to a post in Linkedin asking for favorite books about data visualization. You can see my answer below, in case you're curious.

I googled Colin Ware's Information Visualization: Perception for Design, to add a link to it, and discovered that its fourth edition is being released tomorrow, March 27, at least on Amazon. What a coincidence! I just ordered it.

I have many favorite books, but here's my answer to the Linkedin post:

I try to read all books about visualization that I find, so I have many favorites. 
Because it was so illuminating to me more than a decade ago, I love the 1st edition of Thematic Cartography and Visualization by Terry Slocum. Used copies are $7-8 these days, which is great. 
Colin Ware's Information Visualization: Perception for Design, is an absolute classic. It's in its 4th edition already. 
Isabel Meirelles' Design for Information brings the perspective of a visual designer.
Tamara Munzner's Visualization Analysis and Design
For business graphics, Stephen Few's Show Me the Numbers
 Finally, William Cleveland's pair The Elements of Graphing Data and Visualizing Data, which deserve much more popularity than they have. 
Oh, and any book by Howard Wainer. He has many, compiling his articles. 
I could go on an on. There's a lot of good stuff out there these days. We're lucky.

Friday, March 20, 2020

Why not leaving data visualization aside for a few hours to design an explanation graphic?

I began my career in 1997 designing not data visualizations, but visual explanations—we used to call them “infographics”—using illustrations, 3D models, animations, etc. Here's an old example.

I still enjoy that type of work, and in the past few years I've repeatedly lamented its decline in news media—see 1, 2, 3, and my own dissertation. Nowadays, most news graphics desks, at least in the English-speaking world, are focused almost exclusively on data visualizations. I love data visualization, of course, but we shouldn't ignore illustration-based explanation graphics. They are powerful and useful.

Here's a good example: Our World in Data has just partnered up with the German animation studio Kurzgesagt, which has a popular YouTube science video channel, to design an animated infographic about how COVID-19 works. It's really good (I know it's good because my attention-challenged teenager watched it until the end and learned a lot):

Thursday, March 19, 2020

The most-read story ever published by the Washington Post online is a visualization (and other reasons why your organization should invest in a graphics team)

Poynter informs that the most-read piece ever published in the Washington Post's website is a visualization-driven story, the now famous coronavirus simulator, by Harry Stevens.

(Poynter's story is very good; see also this tweet by WaPo's media columnist Paul Farhi, confirming the news.)

Here are a few more factoids for you, without trying to be exhaustive:

In 2013 the most-read piece in The New York Times online was the dialect map, How Y’all, Youse, and You Guys Talk, which still is “one of the most popular in The Times’s digital history.” And remember Snow Fall?

ProPublica's Scott Klein has just told me that “about half of our traffic that goes to journalism on our site is to news apps,” which are databases and visualizations. Back in 2010, the Texas Tribune wrote that their applications account for “a third of the site's overall traffic.”

The Financial Times's graphs and maps about the coronavirus are becoming wildly popular, and for good reason: they are excellent, overall.

I predict that the flatten-the-curve visual explanation—read about it here and here—will become the most iconic image of 2020, and one of the most influential graphics ever made.

I could go on an on.

It's puzzling to me, then, that so many organizations—not just news organizations—are reluctant to invest in a data and graphics team, or to give it the power, resources, and autonomy it needs to thrive. What are you thinking?

(Also, Pulitzer Prize Board, it's about time to create a category this type of work, don't you think?)

Tuesday, March 17, 2020

Linear or non-linear scales? Why not both?

The coverage of coronavirus has rekindled the debate about whether most readers understand non-linear scales. In How Charts Lie I have a cute fictional example of when this type of scale is necessary: imagine that you own four gerbils, two males and two females.

The four gerbils mate and each couple gives birth to four little ones (eight little gerbils in total.) For the sake of argument, let's imagine that the parents die shortly after giving birth. The gerbils keep reproducing at this constant rate, so each generation is double the size of the previous one.

If you plot this exponential growth on an arithmetic Y scale, the line remains very close to the 0 baseline for ~25 generations. Therefore, it'd be impossible to estimate the rate at which you need to increase the amount of food to purchase for your adorable critters:

However, if you use a non-linear scale, the exponential growth of gerbil population becomes clearer. By the 32nd generation there'll be more gerbils in your backyard than people in the world:

When doing graphics about pandemics, we do need non-linear scales because contagion is also non-linear: if you are infected, you likely won't infect just another person, but two, three, or more every n days. That's why community mitigation strategies such as staying at home and washing your hands are so important.

But it's true that most of us have a hard time wrapping our heads around non-linear scales. What to do? Well, we can explain them. As I've said in recent talks, the impulse of too many editors when they think that readers won't understand a visualization is to avoid that visualization. That's self-defeating and wrong. If you never use a type of graphic or scale, how are your readers ever going to learn how to read it?

Another solution is to take advantage of interaction. Showing data on a linear scale is also valuable; it's not just more dramatic than a non-linear scale, but it gives readers an additional view of the data. Why not letting people switch between a linear and a non-linear scale? That's exactly what Spain's El País did in this visualization.

Our World in Data's coronavirus page has a similar feature, although it's harder to see where to click on to switch between scales.

Monday, March 16, 2020

The ethics of counting

The coronavirus pandemic is being covered widely, deeply—and not always correctly. There've been plenty of instances of innumeracy and dubious visualizations, as Amanda Makulec said in an article about ethics and good practices. The other day I made some suggestions myself: never publish anything without consulting experts, for instance.

Happy coincidences, over the weekend I read the manuscript of a timely book that will come out in October this year. Its title is Counting: How We Use Numbers to Decide What Matters by Deborah Stone, a professor emerita at Brandeis University. If you follow this blog or liked How Charts Lie and The Truthful Art, you'll enjoy her book as well.

I've looked into Stone's previous work and found an intriguing article of hers, 'The Ethics of Counting', that anticipates some of the themes that appear in Counting. It's a transcript of her acceptance speech for the 2017 James Madison Award, and it's organized into five parts:
What does it mean to count?
How do numbers get their meaning?
How do numbers get their authority?
How can counting change hearts and minds?
Are there some things we shouldn’t count?
Don't miss it.

Sunday, March 15, 2020

Before showing any data, explain how your visualization works

When discussing how to make the public more graphically literate (“graphicate”) in recent talks about How Charts Lie, I've been advocating for explaining how unfamiliar visualizations work before we reveal any data. I use this famous Hans Rosling video as an example. I described it in my article for IEEE, too. I emphasized that the part at the beginning, when Rosling talks about the encodings—horizontal and vertical position, bubble size, color,—is crucial. You can see one of those talks here; jump to minute 15'.

That's why I'm happy to see the most recent Lazaro Gamio's visualization about the possible impacts of the coronavirus at The New York Times.  It contains a bubble scatter plot, and it applies Rosling's technique. It's a great use of the annotation layer: (a) “Each bubble on this chart represents an occupation. The bigger the bubble, the more people do that job,” (b) “the vertical position of each bubble is a measure of how often workers in a given profession are exposed to disease and infection,” (c) “the horizontal position is a measure of how close people are to others during their workdays.” Well done.

Saturday, March 14, 2020

Trump sent his followers a signed chart; I'm doing the same with mine

It seems that President Trump is (mis)using the copy of How Charts Lie I sent him months ago to thank him for the free publicity he's given me—the review in The Economist mentioned this fact in its last paragraph.

According to CNN, Trump sent the following chart to his fans and followers. His point is that the markets experienced a rapid recovery after his press conference about the coronavirus yesterday, Friday 13. The vertical black line on the right-hand side of the chart marks the time of that press conference:

It's hard to see, but the chart shows fluctuations of the Dow Jones Industrial Average just on Friday 13, which I don't think is sufficient information. This is a case of convenient cherry-picking. I decided to send my followers my own signed chart, this one showing the variation since January 1 2020, as Chinese authorities informed the World Health Organization of the first cases of coronavirus at the end of December of 2019. I also added some annotations:

I sincerely want to thank President Trump again for all his efforts to promote How Charts Lie. Whenever he tweets a map or a chart, I think that sales increase.

Friday, March 13, 2020

Explaining and simulating the coronavirus

The other day a reporter asked me about my favorite visualizations about the coronavirus. I've been hesitant about the quality of many graphics I've seen (thisthis, this, and this,) so I chose the “flatten the curve” abstract diagram—many of its versions can't be called data visualizations, as they don't encode actual data—particularly when adding a verbal annotation layer to it, like CNN's Brian Stelter did. Stelter acted like the “mediators” I discussed in my recent article for IEEE. We shouldn't just show information to viewers or readers; we ought to explain it.

Nicholas Kristof and Stuart Thompson have just released another intriguing piece. This one lets you simulate how the curve would change depending on how early or late you intervene to stop the spread of the virus, or how mild or aggressive your actions are:

In any case, here's my take about visualizing anything related to the coronavirus—or anything at all, for that matter: don't mindlessly apply your generic statistical or visualization skills to data downloaded from public sources when covering serious topics; you likely lack domain-specific knowledge, which is essential to getting things right. Always consult with an expert or two. Seek the help of epidemiologists, biostatisticians, or public health specialists. And, when in doubt, err on the side of caution and don't publish anything.

UPDATE: I'd revisit this 2017 article by Steve Wexler and Jeff Shaffer: “Publishing bogus findings undermines our credibility. It suggests we value style over substance, that we don’t know enough to relentlessly question our data sources.”

Wednesday, March 4, 2020

An opinion article for IEEE

IEEE Computer Graphics and Applications has just published my opinion article “If Anything on This Graphic Causes Confusion, Discard the Entire Product.” If you're wondering where that quirky title comes from, see the last part of this article for Nightingale from a few months ago. “If Anything...” also deals with Sharpiegate, but its focus is a bit different: I talk about the role that people who mediate between a visualization and its intended audience—a TV presenter explaining how to read a graphic, for instance—may play.

The article is paywalled, but here's an early and unedited draft.

Wednesday, February 26, 2020

All talks from the Data Intersections symposium

The four talks we had at our data and design ethics conference, Data Intersections, are already online in the Institute for Data Science and Computing's Youtube channel (here.) Otávio Bueno, Heather Krause, Yeshi Milner, and Mike Monteiro covered a lot of ground, and their talks were great. See them below:

Thursday, February 13, 2020

Design can destroy the world—but it can also make it better. My opening remarks for the 2020 Data Intersections conference

Data Intersections, the University of Miami's conference about the ethics of data, design, and technology is this same afternoon. Some of you have asked me in private whether we'll record it, and the answer is yes. We'll make all talks available in few days, as soon as the videos are edited.

In case you're interested, here's the draft of the remarks I'll offer at the beginning of the conference (spoiler alert: there'll be a book about some of this in 2021):

Hello, welcome to the Data Intersections conference. First I’d like to thank you for being here this afternoon.  
For those of you who don’t know me, I’m Alberto Cairo, the Knight Chair in Visual Journalism at the School of Communication of the University of Miami. I’m also director of visualization and information design at our Center for Computational Science, CCS, one of the sponsors of Data Intersections. CCS is about to become UM’s Institute for Data Science and Computingor iDSC,—as you’ll learn in a minute from our Provost, Jeff Duerk. I'll be the director of iDSC's Center for Visualization, Data Communication, and Information Design, so that's exciting news...
But before that, let me briefly explain how the 2020 edition of the Data Intersections conference came to be. It all begun in 2019, when Mike Monteiro, one of our speakers today, sent me an early copy of his book Ruined by Design: How Designers Destroyed the World, and What We Can Do to Fix it. 
You'll see, I’m a journalist and also a data visualization and infographics designer, so I was interested in learning about how and why I was destroying the world. 
I liked Mike’s book so much that I ended up writing a blurb for its back cover. I said about Mike’s book, and I quote, that it is “Victor Papanek’s Design for the Real World updated for the 21st Century—and with much more swearing.” 
If you are a designer or technologist, you probably know Victor Papanek’s famous book. If you’ve never heard of it, go get a copy. As with Mike’s book, Papanek's Design for the Real World is a passionate discussion of how design and technology can go wrong, and what we can do to get them right instead.
The reason Mike’s book had such an impact on me is that I’ve always been fascinated by numbers, design, and science. That's why I make data visualizations and infographics, and also teach how to design them. At the same time, Ive always been interested in thinking about how we, the creators of those numbers, designs, and technologies, can make good and informed choices not ignorant or even destructive ones. 
This is related to a third interest: moral philosophy, Since I was in High School, I’ve been reading informally, as a proud amateur, into the literature of ethical thinking, so I'm somewhat familiar with the major debates and schools in the philosophy of ethics—virtue ethics, deontology, consequentialism, and their multiple variants. 
That literature might help us answer questions that we all face: what is the difference between what we can do and what we ought to do? How can we train our moral intuitions, and our sixth sense of what is right or wrong? How can we weigh the possible consequences of our actions and creations? What is it appropriate—or not appropriate—to do with the data and technologies we design? 
Ultimately, I’d argue that the key question we should try to answer is: how can we use data, design, science, and technology to help human beings have better, happier, and wiser lives, and to make our societies flourish—instead of destroying them? 
This is what Data Intersections is about. Our four speakers today, Otávio Bueno, Heather Krause, Yeshi Milner, and Mike Monteiro, will surely inspire us to be more ethical data scientists, designers, journalists, and technologists. Or, in general, better human beings. 
Once again, thanks so much for being here this afternoon. I hope you’ll enjoy our great speakers and the reception at the end of the day. Now, I’d like to introduce the Provost of the University of Miami, Jeff Duerk. Jeff, thanks for being here. The stage is yours.

Wednesday, February 12, 2020

Visualization often puts stories in perspective

Following an election or primary race too closely is entertaining, but also stressing. Media organizations often read too much into individual polls—I try to stick to the mantra “every single poll is noise; what might matter is the weighted average of all polls,”—and pundits love to read tea leaves—aka: spotting and discussing signals in randomness.

The recent Democratic primaries in Iowa and New Hampshire are big news these days, but how much do they really matter? I don't know; I'm no political analyst, but the following visualization by WSJ's Brian McGill puts things in perspective. Maybe those primaries matter less than we want them to, as there's still such a long way to go?

Brian designed that cartogram manually, painstakingly copying and pasting nearly four thousand squares. I sympathize with his effort. I've made cartograms like that myself in the past.

Monday, January 27, 2020

A nice short video about misleading graphs

While reading How Charts Lie, one of my undergraduate students told me that TED-Ed published a short animation (paired with a lesson containing extra materials) about misleading graphs in 2017. The author is Lea Gaslowitz. It's quite nice:

Thursday, January 23, 2020

An example of how to annotate a visualization, by The Financial Times

The Financial Times visualizes how Britons spend their time at weekends vs. week days, a graphic that is part of a story about how weekend working affects families.

The array of graphs is elegantly designed, as it often happens with the FT, which has developed a massive library of high quality visualizations and even has a column about data journalism. That said, what really does the trick for me is the detailed annotations.

Several members of the FT data and graphics team have repeatedly expressed their belief in the power of the annotation layer. John Burn-Murdoch, for instance, has said: “I and my colleagues here at the FT, we really do think one of the most valuable things we can do as data visualization practitioners is add this expert annotation layer.” I agree. Explanatory visualizations shouldn't consist of visualizing data alone, but also of adding words to put the data in context, highlight the most relevant facts, or dispel possible misinterpretations.

Monday, January 13, 2020

Announcing Data Intersections, a free conference about the ethics of data, design, and technology

In the afternoon of February 13 I'm hosting Data Intersections, a free half-day conference about the ethics of data, design, and technology.

Our speakers are Mike Monteiro, author of Ruined by DesignYeshimabeit Milner, founder of Data for Black Lives, Heather Krause, founder of We All Count, and Otávio Bueno, philosopher of science, Math, and technology.

If you wish to attend, sign up for free, and see more information here, including the schedule.

Saturday, January 11, 2020

El País wisely uses animated transitions

Spain's El País recently got a new graphics director and stepped up its visualization game by strengthening its graphics desk with some other new hires. The results are showing already. Their latest data-driven piece is an analysis of the European countries currently governed by multi-parti coalitions. The designers wisely used a restrained color palette and elegant and smooth animated transitions to connect each section of the story to the following one.

The print version of this piece also looks quite great:

Wednesday, January 8, 2020

All graphics from 'How Charts Lie' freely available in two color schemes (for now)

I'm making most figures—not all, due to technical and copyright issues—from How Charts Lie freely available. Find then in this Dropbox folder. I also decided to create different versions using alternative color schemes. The first one is blue and red instead of gray and red, and you can download the graphics here. (Also, don't forget the edits page.)

Here's a comparison between two versions of the same graphic, the one that appears in the book and the one with a red-blue color scheme:

Thursday, December 26, 2019

All The Economist Graphic Detail visualizations in one convenient PDF

Alex Selby-Boothroyd, head of Data Journalism at the Economist, has just announced a special gift for their readers: they've collected more than 60 articles/visualizations from their Graphic Detail section as a PDF. To download it you need to sign up (it's free) or subscribe to the magazine.

The document contains some of the finest, most elegant, and most tightly edited print graphics I've seen this year. You can pair it with this article, which explains what Alex and his team learned by publishing a weekly data-driven article for more than a year. If you teach visualization, both the document and the article are great teaching resources.