Tuesday, October 15, 2019

A dubious chart that confuses bias with trustworthiness

How Charts Lie is out today in the United States (see a few early corrections here,) and I guess that there's no better way to celebrate it than discussing a dubious chart, the one on the right. According to reporter Dana Liebelson it's being used in libraries to educate kids about which news organizations to avoid—those on the far left and far right columns.

There's so much wrong with this chart that it's difficult to decide what to begin with. I guess I'd first point out that having an ideological bias isn't bad per se, as the source of the chart, AllSides, implies in its motto (“don't be fooled by media bias and fake news”.) Lacking a clear ideological bias doesn't automatically make you more trustworthy.

On the contrary, you can have a clear ideological slant and still be trustworthy because your reporting and verification methods are solid, and because you strive to be honest and fair. Think of the New Yorker or Mother Jones magazines, for instance. On the right, Fox News online is pretty decent, National Review is a mixed bag, but it still has good columnists, and I fondly remember The Weekly Standard, which was clearly neoconservative while holding itself to strong professional standards, particularly when Stephen Hayes was in charge (Steve has just launched a new media venture recently, by the way; it's called The Dispatch.)

But the main reason this chart is so deceptive is that it compares things that aren't comparable. Come on, Breitbart or The Federalist rags at the same level of “bias” as Vox? The Washington Examiner at the same level as NPR? Those aren't equal. Neither in terms of trustworthiness, nor in terms of ideological bias. And The Hill (The Hill!) isn't a “centrist” publication. I could go on and on, and I'm sure you will have your own pet peeves.

UPDATE: Laura Ana Maria Bostan suggests this scatter plot as an alternative:

Monday, October 14, 2019

'How Charts Lie': a few corrections

If you read the print edition of How Charts Lie you may notice a few errors that flew under the radar of everyone who reviewed drafts and galleys—more than 20 people, including myself! Small typos and errors are an inevitable curse in book publishing, I guess. These should have been corrected in the e-book already, or will be soon (if you detect anything that looks strange other than these, please let me know!)

Here they are:

On page 24 the transparency effects that should emphasize or hide parts of the charts have disappeared. Here's how that graphic should look like:

On page 45 the line corresponding to the United States doesn't show. Here's the real graphic:

The last label on the Y-scale of the chart on page 172 should read "600" instead of "490". This is a corrected version of that chart:

In The Washington Post and The Economist

How Charts Lie will be released tomorrow in the United States, and there has been some buzz around it in the past few days. This week's The Economist includes a review by Alex Selby-Boothroyd,—it's on page 87 of the print magazine,—and The Washington Post has just published this interview written by Christopher Ingraham.

I love the title that The Economist chose: ‘Axes of evil: Lies, damn lies and charts,’ and also the fact that the review ends with the following note:

“Mr Cairo has sent a copy to the White House.”

(I did!)

You can find other recent interviews in Storytelling With Data (Cole has also written a review,) BIBrainz, Present Beyond Measure, and Data Viz Today. Época, the Brazilian magazine I used to work for years ago, has also published a review.

If you want to get a sense of the cone and content of the book, read its first 20+ pages for free.

Thursday, October 10, 2019

Sometimes it's the simpler visualizations that matter the most

The New York Times has just published a nice piece about automobile COemissions in the U.S. It opens with a beautiful map where you can search for any metro area. Here's the Miami metro area:

It may be because I've seen a lot of visualizations in the past decades but, even if I liked the map a lot, I didn't find it that impressive or insightful. It's a bit like a population map, after all, although I'll admit it helps bring attention to the story.

I was more interested in the variation of emissions and their sources. Fortunately, Nadja Popovich and Denise Lu, the authors, also cover that in the graphs that follow the map. There's a lot of attention to detail in the design of these. Notice the annotations and the careful use of color:

Wednesday, October 9, 2019

Decisions in visualization should be based on reasons

The New York Times's David Leonhardt has a column titled The Rich Really Do Pay Lower Taxes Than You, meaning a lower effective tax rate. The column is based on a recent book, and it contains a graph that reminded me of something I wrote a while ago about line charts not being appropriate just for time-series data.

The graph displays the flattening of tax rates in the past seven decades. In the 1950s households in the lower income groups paid an effective rate of ~20%, while the richest were hit by a total tax bill of ~70%.

Today the picture is quite different: the line is almost flat, and the top 400 households pay ~23% while the lowest decile of households pays a higher rate of (I think) ~27%.

Responses to Leonhardt's social media postings about the article have been intense: people on the left claim that the flattening of the tax rate curve is unjust (Leonhardt agrees,) while conservatives say that the current tax rate distribution is fair, as everyone should pay more or less the same percentage of their income as taxes. In How Charts Lie I wrote that charts are rarely the last word in discussions about relevant issues, but they can certainly inform them. This chart is a great example of that.

A few words about the design of the graph itself: RJ Andrews, author of Info We Trust, sent me some intriguing suggestions. He worries about the evenly spaced tick marks on the X-axis of the graph. RJ proposes to subdivide the graph into three sections to separate the top 1% and the top 400 households from the other 99%. Here's a quick sketch he drew:

I liked the original NYT graphic, and it didn't bother me much that the X-axis tick marks are evenly spaced even if the last three tick marks don't correspond to income deciles. But I also like RJ's idea. Stuart Thompson reminded me that they tried an alternative design for a similar type of graph in a previous article—and I also like it!

I see good reasons to justify different design solutions in a case like this, so I'm torn. Perhaps this is a good topic for a class discussion or even for a little research experiment: does it matter more to be geometrically accurate? Or is it better to preserve continuity and maybe increase clarity, as the NYT did, by showing all data into a single chart that magnifies the upper echelons of the income distribution? Or could it be that none of this matters, and that all these design alternatives are equally effective and persuasive? Are we facing a distinction without a difference?

Tuesday, October 8, 2019

Read the first 20+ pages of 'How Charts Lie' for free

How Charts Lie will be available in the United States exactly one week from now—and a bit later in the UK. Here are the links to pre-order it through different bookstores and receive it exactly on the release date, October 15:

IndieBound (independent bookstores.)



My publisher, W.W. Norton, has agreed to make the book's introduction publicly available and free to download. If you want to read the first 20+ pages of the book in advance, download them from Google Drive or Dropbox.

I hope you'll enjoy them!

Monday, October 7, 2019

The Unwelcomed: mapping an ongoing tragedy

Mohamad A. Waked's The Unwelcomed, longlisted at the Information is Beautiful Awards, combines true storytelling with data visualization to map the thousands of migrants and refugees who, according to the International Organization for Migration, lost their lives or disappeared between 2014 and 2019.

I just wrote “true storytelling” because the project consists of two parts, and one is a fictional narration inspired by real events, followed by a summary of the main numbers.

The other part of the project is a big visualization—you'll need to see it in a large screen—that shows times and places where migrants perished or went missing. It's a haunting combination of a proportional symbol map, a strip plot, bright lines that connect the former to the latter, and controls that allow you to modify the display.

Friday, October 4, 2019

Why I require students to blog about visualization

Comic by PhDComics.com, showcased in a blog post by one of my students
One of the main changes I've made in my classes in the last year is to ask students to launch their own blogs and write in them at least once a week. Here's why:

(A) Too many students don't read their textbooks and other recommended readings unless they know they'll be tested.

I hate tests, quizzes, and exams; they encourage students to do their readings all at once at the very last minute, instead of spreading them out and developing a disciplined reading routine. I want students to learn, not to pass an exam.

(B) A blog can be a great addition to anyone's portfolio. It's a place where potential employers can assess whether a student is not just a good designer or coder, but also a thinker. I ask students to write while imagining that it won't be me who's reading, but a professional peer, their boss, or the visualization community in general.

(C) You can't be a good visualization designer without being a decent writer. This is a hill I'm willing to die on. Writing aids thinking.

I teach two classes at the University of Miami. One is an introduction to data visualization and infographics, and the other is called Data Visualization Studio, which is more advanced. Students in the intro class must read 2-3 book chapters from The Truthful Art or other visualization books per week and write a blog post (a) summarizing what they've read, (b) connecting it to examples of visualizations they've seen.

Students in my advanced class are also required to do some readings every week, but I give them more freedom to choose what to write about in their blogs. This class is for students who've already been through a couple graphics courses: my intro course, our interactive data visualization class—which focuses on Javascript and d3.js—or other coding, GIS, and 3D modeling and animation classes we also offer. Here you have their blogs (you can also find them in the #AlbertoCairoDataVizClass hashtag on Twitter):

Qinyu Ding; her most recent post is about empathy in visualization.

Alyssa Fowers; don't miss the first part of her 2-part series about minimalism in visualization.

Yuan Fang; she's interested in immersive technologies in visualization.

Deb Pang Davis; her latest blog post discusses several visualizations about bird extinction.

Leila Thompson; in a recent post she describes the project she's planning to design for the class.

• Grace Snyder; she's a marine science PhD candidate. In her blog she's chronicling her efforts to learn code and visualization. She has very interesting thoughts about teaching styles.

Yutong Han; her writings are often gentle critiques of existing visualizations.

Jinqi Li; she's planning to make a project about the Marvel Universe. Her blog reads like a long making-of article in progress, and it includes sketches, early prototypes, etc.

Zihao Zhong; he's planning to design an animated infographic about soccer. In his blog he's posting references that inspire him,

Thursday, October 3, 2019

Upcoming talks in Boston, New York, Washington DC, and South Hadley

Right after How Charts Lie is published in the United States, on October 15, I'll be giving public talks in several cities; you can consult my public calendar. The first ones after the launch will be in Boston, New York City Washington DC, and South Hadley, among others. Registration is free for all of them. Just click on the previous links to sign up (space is limited.) The poster on the right was designed by students at Northeastern University. Isn't it lovely?

I'll also deliver talks in Spain, Mexico, and Denmark before the end of 2019.

I'll sign books in some of these events, as indicated in the calendar. If you haven't pre-ordered How Charts Lie yet, you'll have the opportunity to buy it on site. If you own a copy already, just bring it with you and I'll be happy to sign it.

(If you want me to visit your town or city in 2020 to give a free talk, just contact me.)

Wednesday, October 2, 2019

RawGraphs 2.0 is coming. Please support it

I think that most readers of this blog are familiar with RawGraphs, the beloved free and open source tool designed by Density Design years ago. If you aren't, give it a try. It's really good. The RawGraphs team is now planning to develop a second version of the tool, and they are seeking funding. I've contributed with €100 myself to their crowdfunding campaign; if you've used RawGraphs 1.0 and you can afford it, I recommend you help them. What they plan to do sounds very promising.

Tuesday, October 1, 2019

Don't just visualize uncertainty; explain it and don't let captions contradict it

MIT Election Lab's Alexander Agadjanian has a nice piece in The New York Times about how people react when reading that the Democratic party may be shifting left.

The graphs in the article include error bars which, considering that they are based on a survey experiment, I guess correspond to a 95% confidence interval. I had to guess because it's not explained anywhere.

I'm in favor of disclosing uncertainty in visualizations. I've written about it repeatedly in books and blog posts. However, I also think that uncertainty should never go unexplained, particularly if we present it to readers who may not understand what the whiskers on either side of the point estimate dots mean.

I also think that we journalists shouldn't let what we write contradict what we visualize. On the first graph the error bars for Independents—the only data point that seems statistically significant—are very wide, but the caption reads “independents in a survey were six percentage points less likely to say they would vote for a Democrat in 2020, compared to a control group” (that's the 0 baseline.)

That isn't wrong per se, but I conjecture that in the mind of many readers it makes the point estimates sound much more precise than they really are. If we display uncertainty, we should convey uncertainty also through our annotations, so instead of writing “six percentage points less likely,” I'd suggest “significantly” or “considerably less likely,” without assigning any specific value to the difference.

UPDATE: Kaiser Fung has written about this visualization. Don't miss it.

Sunday, September 29, 2019

That damn map again

Donald Trump's daughter-in-law Lara has tweeted the following map.
As this map is the very first case I analyze in How Charts Lie, I wrote a short thread on Twitter:
The last tweet in the thread contains some additional readings:
Pre-orders of How Charts Lie seem to have increased substantially because of the thread above and Sharpiegate, which I wrote about here. To thank the President, I'm sending him a complimentary copy of the book.

Thursday, September 26, 2019

What you call chaos I call a method

This tongue-in-cheek tweet got a lot of attention the other day. It's true: I still take my notes by hand, quite often on the margins of papers and books, although if I deem them important I transfer them to notebooks I keep at my home office.

It's a kludgy system. For a while I followed the advice of much experienced—and much more organized—authors such as Steven Johnson, and tried DevonThink, EverNote, and other tools. But I gave up.

There's something about handwritten notes and imperfect node diagrams with tons of scribbled arrows connecting concepts and ideas that has always worked for me. I think it's somehow related to the fact that I remember better what I read on print than what that I read on screens.

In some past talks I've explained that I inherited this idiosyncratic way of studying from my father, who told me to always keep a pen and some paper handy when reading. My dad also taught me to draw mnemonic diagrams, although he didn't call them that; there's one in The Functional Art that still makes sense to me as a memory aid but that is probably incomprehensible to everyone else. I can't think if I don't draw.

Wednesday, September 25, 2019

Early reviews, recent interviews, and upcoming talks

We're exactly three weeks away from the release of How Charts Lie in the United States, and there's been some media buzz already. A few weeks ago I chatted with Jon Schwabish and the conversation ended up in his podcast. We went way beyond the book and talked about how to teach visualization.

Jon and the Urban Institute will host a public lecture about How Charts Lie on October 24; you can see this talk and others in my calendar. If you live in Boston, Washington DC, NYC, Atlanta, Columbus, etc., I'll see you soon.

I've also been in The Damage Report, one of the shows of The Young Turks network; the interview begins at approximately the 10 minute mark.

Publishers Weekly has a starred review of How Charts Lie. The reviewer captured some core messages of the book:
With the use of such graphics throughout media only increasing, Cairo insists, persuasively, that “just looking at charts, as if they were mere illustrations,” is not enough; “we must learn to read them and interpret them correctly.” [...] After offering a guide to different kinds of charts, Cairo presents the different ways they can mislead, including by using the wrong data or concealing uncertainty. 
And here's Kirkus reviews:
As this entertaining addition demonstrates, the “how to lie with statistics” genre is alive and well. In a cheerful introductory chapter, the author explains that, while writing was invented about 5,000 years ago and charts weren’t used until the late 1700s, both are encoded forms of communication with a structure and vocabulary. Readers receive well-researched information about the makeup of a chart along with the warning that this knowledge, like rules of grammar, is necessary but not sufficient. It’s essential to pay attention. [...] An ingenious tool for detecting flaws in charts, which nowadays seem mostly deliberate.
Kaiser Fung also wrote a review of an early draft of How Charts Lie. I hope everyone will like the book as much as these early reviewers did!

Tuesday, September 24, 2019

On the dangers of aggregates and the beauty of variance: Christopher Ingraham's If You Lived Here You'd Be Home By Now

The best book about data I've read in the past few months isn't about data. It's about people. Or, better said, about the myriad of individual experiences and quotidian facts that often aren't captured by data.

On August 17, 2015, Cristopher Ingraham, a reporter at The Washington Post, wrote a story and designed a map under the title 'Every county in America, ranked by scenery and climate':
In the late 1990s the federal government devised a measure of the best and worst places to live in America, from the standpoint of scenery and climate. The "natural amenities index" is intended as "a measure of the physical characteristics of a county area that enhance the location as a place to live."
According to this measure, Ventura County, California, is the “best” place in the U.S., and Red Lake County, Minnesota, is the “worst”.

Ingraham's If You Lived Here You'd Be Home By Now chronicles what happened after. He not only got plenty of pushback from Minnesotans, but he eventually decided to move to Red Lake County with his family.

If You Lived Here... is a delightful collection of anecdotes—some of you may remember Ingraham's hilarious cricketpocalypse—connected by an underlying theme: data about human beings sometimes illuminates, but sometimes it also obfuscates, particularly when we assume that aggregates are a reflection of individuals, or when we don't grasp what it is that we are measuring.

I could write extensively about this problem—I have, actually, in How Charts Lie and here,—but I'll just quote from Ingraham's book:
As somebody whose job is to write about data writ large, I'm a big believer in its power—better living through quantification. But my relocation to Red Lake Falls has been a humbling reminder of the limitations of numbers. It has opened my eyes to all the things that get lost when you abstract people, places, and points in time down to a single number on a computer screen. 
One of the big dangers of our glorious, new, quantified world is the emergence of a type of numeric stereotyping—of insights hardened into dogma by the weight of a thousand data sets. 
We “know”, for instance, that Mississippi is poor, that New York City is expensive, that Chicago is violent, and that Red Lake County is ugly. These things are, of course, true in the aggregate sense, or in comparison with other places. 
But each of these numbers and rankings masks infinite nuance behind their finite limits. They overlook the thriving communities in Mississippi, the inspiring stories of tenacity and triumph in Manhattan, and the people quietly working to make Chicago's streets safer.
Don't miss this book. It'll make you laugh. And think.

Monday, September 23, 2019

How to build a data narrative

On Friday September 20, the Wall Street Journal published the following infographic by Aaron Zitner, Dante Chinni, Jessica Wang, Lindsay Huth, and Danny Dougherty (the online version is paywalled):

In the past few weeks I've been talking to students about how to structure data and visualizations into narratives to produce infographics like this. Let's use this piece as an example.

We begin with a title or headline that summarizes the main point of the piece or anticipates what you're about to see:

The first section provides big figures and highlights key takeaways to spark interest:

After that, we dig deeper into details. The WSJ infographic contains several vertical dot histograms like the one below. Each dot is a district, and the color corresponds to whether each of them leans Democrat or Republican. The lines on top of the dots display the 2008 distributions; there are stark differences between that year and the present: richer districts have become more Democratic and poorer ones more Republican:

Notice also that last horizontal graph; the authors could have made a simple dot plot or lollipop graph, but decided instead to emphasize the change with arcs. I like it.

Next, we move on to explain the reasons for the polarization: different industries, population density, education, and others:

And then a nice conclusion about why all this matters: because the less people have in common with one another, the harder it is to recognize problems that don't affect us:

Friday, September 20, 2019

New data journalism and visualization MOOC

I've been organizing Massive Open Online Courses (MOOCs) for many years, but the latest one is the most ambitious to date.

Titled 'Data Journalism and Visualization with Free Tools', it's the product of a collaboration between myself, the Google News Initiative, the Knight Center at the University of Texas, and a large group of instructors (Simon RogersDebra Anderson, Duncan Clark, Jan Diehm, Minhaz Kazi, Dale Markowitz, Marco Túlio Pires, and Katherine Riley,) each in charge of a module.

I say it's the most ambitious because it's the first time we offer a course like this in three languages—English, Spanish, and Portuguese,—and also because it covers a wide variety of topics: finding and scraping data; wrangling and cleaning it; exploring it to find insights; visualizing it, and building narratives. It also has an entire module on Machine Learning and AI. The MOOC doesn't require any prior knowledge or purchasing any software tool. It's intended to get you started in all those areas.

On top of that, if you sign up you'll be able to read the first 30+ pages of How Charts Lie in advance of its publication. All for free. The MOOC begins on October 14; see you there!

Thursday, September 19, 2019

Why Sharpiegate matters: It's not just about the graphics

I've just published another article in Nightingale, the online magazine of the Data Visualization Society. It's titled 'The Day I Thought I Misled the President of the United States: A Visualization Tragicomedy'.

The article deals with my reaction to 'Sharpiegate', and what I think it tells us about visualizations and the importance of protecting the integrity of truth-telling institutions that are supposed to be neutral, such as the Census Bureau or the National Weather Service. In the piece I recommend Michael Lewis's book The Fifth Risk. It's a must-read nowadays.

Monday, September 16, 2019

What is killing us?

Chelsea Bruce-Lockhart and John Burn-Murdoch from the Financial Times' data team have charted “How life, death and disease have changed over the past 180 years”. John has a nice thread on Twitter with some extra information about the project.

The authors explain:
Since the turn of the 21st century, progress made in preventing the spread of infections and parasitic diseases around the world has led to three million fewer deaths each year. A further one million fewer deaths have reportedly been caused by neonatal conditions. Yet these improvements have been entirely offset by the rise in deaths caused by cardiovascular diseases and diabetes. On top of that, there are an additional two million people each year dying of cancer.
The piece contains two visualizations. First, a bubble scatter plot with four variables: disability-adjusted life years on the vertical axis (read the caption for an explanation of what this is); change in the number of deaths per year on the horizontal axis; type of death (color); and death rates (bubble size):

Cancer and coronary heart disease have increased a lot on both axes, while maladies that historically killed millions—and still kill too many,— such as malaria, tuberculosis, or HIV, have decreased.

My favorite graphic is this time series heat map of mortality rates in England and Wales which reveals a steady decline interrupted by the Spanish flu and the two world wars. I wish it were possible to switch the annotations on and off; they are necessary, but also quite obtrusive:

Thursday, September 12, 2019

Visualizing risk challenge results

Back in May, the World Bank’s Global Facility for Disaster Reduction and Recovery (GFDRR) launched the VizRisk 2019 challenge in collaboration with the Data Visualization Society, the Understanding Risk Community, and Mapbox. GFDRR has just announced the winners.

The grand prize when to Riesgo, a visualization of flood hazard in the Philippines by Briane Paul V. Samson and Unisse C. Chua, who have a nice write-up with tons of details. It's a delightful project built with ReactJS and Mapbox. I even like the 3D map in the opening because the height of the bars isn't the only encoding to show elevation (that'd make it confusing); Samson and Chua also used color shade, so the map is quite clear.

You can see all other winners and submissions here. Living in Florida, I was particularly interested in this visualization of the impacts of hurricane Maria in Dominica (article). Also, this story about the aftermath of the 2015 earthquake in Nepal is quite well done.