Thursday, March 7, 2019

New book and new public lecture

My new book, How Charts Lie: Getting Smarter About Visual Information, is already available for pre-order through W.W. Nortonmy new publisher, Amazon, IndieBound (independent bookstores), and will soon appear in other retailers. Publication date is October 15.

Pre-orders matter A LOT for the success of a book, so if you like the work that I've been doing in the past years —free toolstutorials, online courses, or my previous books,— I'd like to ask for your support.

How Charts Lie is my first book for the general public, an explanation of how anyone, regardless of education or professional background, can become a more informed reader of graphs, maps, diagrams, and infographics.

For those who have asked: no, it's not the follow-up to The Truthful Art. As I mentioned in the Epilogue of The Truthful Art, the third volume in the Art series will probably be titled The Insightful Art, but it'll need to wait for at least another year or two, if not more.

Instead, How Charts Lie is a standalone book that, if you work with data and visualization, you can give as a gift to that friend or relative who doesn't understand what you do. I hope that it'll help the public to approach numbers and their visual representations more critically, but also with more interest, appreciation, and care.

That's why I once toyed with a different subtitle —and How They Make Us Smarter, because good charts that are correctly read may have that effect. Despite its title, the tone of the book is positive: it's not that charts lie per se, but that we tend to lie to ourselves with them, even when they are well designed. But we can learn.

Some other authors have already read How Charts Lie and provided early blurbs:

Cathy O'Neil, author of Weapons of Math Destruction: “What can I say? I'm a sucker for statistics explained in funny, engaging, and mathematically correct ways, especially when every now and then a line like "charts lie to us because we are prone to lying to ourselves" is thrown in with good humor. A must read for anyone who wants to stay informed.”

Tim Harford, author of The Undercover Economist and presenter of More or Less in the BBC: “Alberto Cairo has written a wise, witty and utterly beautiful book. You couldn't hope for a better teacher to improve your graphical literacy.”

Kaiser Fung, author of Numbersense, Numbers Rule Your World, and the JunkCharts weblog: “This book will open your eyes to how everyone uses visuals to push agendas. A master visual designer, Alberto Cairo shows you how to read charts and decode design. After this book, you can’t look at charts with a straight face!”


I'll continue delivering my public lecture wherever I'm invited. The only major change besides its content —which will be closer to the new book— is the title: instead of Visual Trumpery it'll be How Charts Lie. Requirements remain the same:

• Send me an e-mail so we can chat about location and dates: alberto DOT cairo AT gmail DOT com.
• I won't take salary for the public talk.
• I only need you to cover a flight (economy is fine), hotel (I'm an easy guest), and minor expenses such as taxis and meals.
• Attendance to the talk should be free and open to anybody.

I'll announce these talks in my calendar, which I haven't updated in a while. I'll also post them in the upcoming website of the new book,

Reasoning with diagrams

Last night, economist and quant JD Long posed a challenge:

The tweet is related to a paper discussed in this article by Calling Bull. Imagine that such an algorithm were applied to the real world. What is the probability that a person is a criminal if the algorithm says so?

JD provided two data points:

• We assume that criminals are 0.5% of the population
• The accuracy of the algorithm is 90%

In the thread there are responses that answer the question using conditional probability formulas. Here's a little secret: I loathe formulas. Particularly when I can reason with an image instead; I may one day write a book about that, although Math with Bad Drawings is already out there, and you should get a copy. I prefer quick heuristics and visuals.

For my quick back-of-the-napkin-and-not-that-precise exercise on conditional probability I needed two other figures:

• The population: let's assume that we're in the U.S, so it's 325 million (you may not need the population if you use formulas, but it's useful for the diagram.)
• The false positive rate: how often the algorithm tags a person as a criminal even if that person is not a criminal. After reading this I guessed a false positive rate of around 6%.

Here's the resulting tree diagram; the probability of your being a criminal if the algorithm tags you as such is roughly only 7%:

The figures underlined at the bottom of the diagram are 19,402,500 (people who aren't criminals but are still wrongly tagged by the algorithm) and 1,462,500 (people who are criminals and are correctly identified by the algorithm). Adding up those two figures you get the total number of people tagged as criminals, regardless of whether they are indeed criminals or not: 20,865,000.

Of those, 93% (the 19,402,500) aren't criminals. The chance of false positives is enormous: if a photo is tagged as depicting a criminal, 9 out of 10 times that person won't be a criminal at all.

I double-checked the calculation using round numbers, beginning with a sample of 10,000 people; quants in the room, please let me know if I missed something:

There are plenty of websites and books that cover heuristics and diagrams for reasoning. To get started, I'd recommend Gerd Gigerenzer's Risk Savvy or Judea Pearl's The Book of Why. If you are a journalist or a graphic designer, I strongly recommend books like these, as we often get probability wrong. You'll enjoy them.

Wednesday, March 6, 2019

When your own data contradicts your headline

This morning I got my coffee ruined by an alarming headline in The New York Times:

Two friends I talked to on my way to work were also alarmed —because they didn't read the entire thing. If they had, they would have noticed that the body of the story and the data the Times itself shows contradict its own headline:

Where's the “record”? One of the most common tricks used to lie with data is to crop a time series in convenient places. If you read the body of the story you'll notice that the supposed “record” means an “11-year high” of migrants crossing the border. That sounds like a pretty arbitrary cutoff baseline to me. Why not talking about the seasonality of the data and the fact that current levels are way below historical highs? Both are revealed in the chart and the reporter herself discusses real records:

Just to corroborate that this may be another case of terrible editorial judgment by The New York Times when writing headlines, here's the real emergency: it's not related to an imaginary invasion of migrants, but to a big increase in people with children fleeing from violence in their home countries. There's not a record of migrants; there's a record of migrant families —and even that is not clear, as the chart showing that number (see below) only goes back to January 2016:

The online version of the story is a bit better, as it showcases a critical chart not published on the print version (I'd like to see this data going back to the George W. Bush presidency, too):

It's a shame that my new book is in production already. This would've been another example to include —not of how charts lie, but of how charts tell truths that are ignored to write a clickbaity, sloppy, and irresponsible headline. Very often a headline is the only thing people read, so let's be careful.

Tuesday, March 5, 2019

Sonifying data with TwoTone

Today we're making some noise (no pun intended) by releasing TwoTone, a free, open source, browser-based tool to sonify data. Simply upload any data set of up to 20MB or 2,000 rows, select the variable or variables you want to transform into sounds, choose instruments, playing speed and other features —and listen to your numbers! Here's Maarten Lambrechts, who made global warming data into a tune:

TwoTone is the product of a collaboration between Datavized and Google News Initiative; my role, as usual in this series of data projects and tools, was to give some feedback during development. To learn more about how TwoTone was created, read this write-up by Datavized's Hugh McGrory.

Data sonification has a long history and plenty of tools exist; TwoTone's goal isn't to surpass any of them in terms of power, complexity or features, but to be simple enough for anyone to learn in a few minutes and a couple of tutorials and begin experimenting right away:

If you prefer to read tutorials, see this document. TwoTone's website contains many examples to give you some inspiration. Finally, if you're a developer and want to tweak the tool or expand on it, consult its Github page and scroll down to find the instructions. You can also contact Datavized if you have suggestions or comments.

Not surprisingly, I've been playing with TwoTone myself. Here's the sound of the U.S. unemployment rate between January 2009 and December 2018 (here's the data set and here's the mp3 for better sound quality):

You may notice that there's a second variable in the data set linked above. What would happen if we added a second instrument to the mix? Would it sound harmonious? I encourage you to try, or use your own data. Have fun!

Wednesday, February 20, 2019

A virtual Galton quincunx

In The Truthful Art I describe the Galton quincunx, a fun device to explain basic probability and distributions. There are plenty of video demonstrations of how it works; here's one:

Some people have even designed 3D animations of it and I got curious whether I could do one myself using Autodesk Maya's gravity fields (the little arrow on the animation below) and rigid body capabilities. It turns out it works!—with some glitches, and with limitations due to each computer's processing power. In my case, the falling spheres began producing a normal distribution, but once more than five or six were on screen, weird things started to happen and the computer began running really slowly:

3D animation is fun; I used to teach a class about it and I'm hoping I'll be able to resume doing at some point!

Wednesday, February 13, 2019

New tutorial: Data Illustrator

I've just added a new free tutorial to the Tutorials & Resources section of this website. It's about Data Illustrator, a tool that greatly expands the charting capabilities of tools such as Adobe Illustrator, Sketch, or Inkscape, similarly to Charticulator.

The tutorial is by Data Illustrator's co-creator John Thompson. You can see all video clips and data sets from this DropBox folder. I recommend you download these files; if you play the videos in the browser, they may get cut off.

I've also updated the tools diagram in the Tutorials & Resources section. These are the tools that I expose students to in my introduction to infographics and data visualization classes at the University of Miami:

Monday, January 28, 2019

New project: The shape of news in Google searches

We've just launched a new project, The Lifespan of News Stories, designed by Schema in partnership with Axios and the Google News Initiative. Axios wrote a story of their own.

My role, as usual, was to art-direct a bit and bug the Schema folks every couple of weeks with design suggestions and tweaks. See our previous projects here.

From Schema's press release:

The Lifespan of News Stories is a collaboration between Schema, Google News Initiative, Alberto Cairo and Axios. The project analyzes the shape of search interest data from the top news stories of 2018. By visualizing and categorizing the shape types into four groups, we start seeing patterns. For example, stories that are skewed to the right are usually unexpected events such as a celebrity death or a natural catastrophe. Stories with multiple peaks are normally longer in duration due to longer exposure in the media, such as the Brett Kavanaugh confirmation. Stories with broad national interest can have long tails or long ramp ups, such as the midterm elections. Finally, it is possible to visualize that news stories come and go with a certain rhythm throughout the year with the exception of a few gaps, notably during the summer school break and winter holidays.

The data has its limitations, needless to say, as it's just Google search interest and doesn't capture conversations in social media platforms, but it's still revealing, I think:

The shapes of the lifespan of news stories —suggested by Axios— are organized in six categories, which can be explored at will:

My favorite part is this simple explainer of shapes:

Anyway, enjoy!

Wednesday, January 9, 2019

New book in the Fall; new public talk in 2019

If you follow me on Twitter you know that I have a new book coming in the Fall this year. It'll be my first for the general public and, unfortunately, it's not part of the “art” series —yes, that means you'll need to wait until 2020 or 2021 for the The Insightful Art, the closing of the trilogy. The new book is the reason I've been silent for so long in this blog.

The book is titled How Charts Lie: Getting Smarter About Visual Information, and it'll be launched by W.W. Norton, which also publishes authors I greatly admire such as Michael Lewis (The Fifth Risk) and Charles Wheelan (Naked Statistics). No pressure, I guess. How Charts Lie can't be pre-ordered yet, but I'll let you know as soon as it can. The domain already exists, but I haven't added any content yet.

If you have attended one of the Visual Trumpery public lectures in the past couple of years, be aware that the talk is a concise trailer for How Charts Lie, as it roughly replicates its structure. How Charts Lie teaches general audiences how to read graphs and maps correctly, and how to use them to improve understanding.

Some other authors are reading the draft of How Charts Lie and will blurb it. Here is, for instance, Tim Harford (The Undercover Economist and Messy): “Alberto Cairo has written a wise, witty and utterly beautiful book. You couldn't hope for a better teacher to improve your graphical literacy.” I love Tim's books —go check them out— so his endorsement means a lot.

In 2019 I'm revamping my public talk, and renaming it to match the title of the book. In the first semester this year I'll be presenting in Miami (today!), Mexico DF, Denver, Vancouver, Providence, Knoxville, Detroit, Pamplona, and probably Amsterdam, Milan, and Calgary. I'll continue giving public lectures once the book is launched in the Fall.

The content of the new version of the public talk will likely be organized around the biases I warn against in How Charts Lie —which, by the way, could have been titled “How We Lie To Ourselves With Charts —and How They Can Make Us Smarter Instead”, although that'd be way too clunky. Here you have some draft slides I'm working on right now:

Friday, October 19, 2018

Building Hopes: design your own virtual statues

Yesterday we launched a new project, Building Hopes. The application, designed by Accurat, has a desktop version, and also iPhone and Android ones that have some extra capabilities, such as augmented reality —we wanted to experiment with it a bit.

Building Hopes consists of designing virtual statues based on things you feel hopeful for. At the beginning you'll be given several choices and, through a slider, you can indicate your level of hopefulness for them. After that, you can name your statues and place them in the real world. If you use the smart phone applications, you can also see them over your surroundings (see images and animation below.) You can also click on the pebbles of your own statue or of statues designed by other people near you to get more detail, such as Google search interest for those terms.

To learn more about this playful, experimental project read the press release, Accurat's article, and Simon Rogers's post.

Monday, September 24, 2018

New tool: Morph for abstract data art

We're launching a new tool today. After Flourish —for news interactive data visualization; see a tutorial,— Tilegrams —for cartograms— and some others, Morph goes in a different direction: it generates abstract images based on data.

Designed by Datavized in collaboration with Google, Morph lets you design traditional charts and graphs, and then randomize transformations through an evolutionary algorithm. I've recorded a short tutorial highlighting Morph's main features, and you can also read about it in the official press release, in this making-of article, and in this post by Google's Simon Rogers.

Morph's documentation and code are available on GitHub. Play with it, send your best work, and if you have suggestions for additions or improvements, let Datavized know.

(See all our previous projects and read more about our collaboration.)

Monday, September 17, 2018

Visualizing the Brazilian elections

The Brazilian presidential election will take place on October 7th, and we have just launched a project that visualizes the search interest for the candidates. This is part of my ongoing collaboration with the Google News Initiative (read more about it here and see all previous projects,) and it was developed by Carol Cavaleiro, Thais Viana, and Tainá Simões. Jair Bolsonaro, an authoritarian, misogynistic, and anti-LGBTQ candidate who leads in the polls —but who likely won't win on the first round— is the most searched-for candidate:

Other graphics in this project display related terms and themes in searches. Enjoy.

Monday, September 3, 2018

To learn visualization, write about visualization

The best way to understand something well is to force yourself to explain it to others. That's why, beginning this semester, I'm requiring all my students at the University of Miami to read 2-3 chapters from visualization books every week, and then write about them in a weblog. I'm giving students in my advanced class access to the manuscript of my new book, which will be published in 2019, so if you want to get a sneak peek, read below.

Here are some posts I liked:

Shiqi Wang used the weekly readings as a pretext for an essay about the nature of visualization. Her opening paragraph is: “I think data visualization is not simply about turning data into charts. It's about looking at the world through data. In other words, the object of data visualization is data, but what we want is actually — data vision, data as a tool, visualization as a means to describe reality and explore the world.”

Alyssa Fowers got a book by Howard Wainer and wrote an interesting post about the dangers of small samples and extreme values.

Mackenzie Miller wrote both about The Truthful Art and the first few chapters of my new book. Scroll down, as she also has posts about R and ggplot2.

Shiyue Qian wrote a nice summary of the first few chapters of The Truthful Art.

Adam Clarke disagreed with me—I like that!

Brendan McBreen critiqued a faulty graphic about federal spending. 

Wednesday, August 22, 2018

Visualization MOOC materials available

In the past six years I've done numerous Massive Open Online Courses (MOOC). The latest one was this Summer, in partnership again with the Knight Center at the University of Texas. My MOOCs are quick and broad introductions to data visualization and infographics. They won't make you a professional visualization designer —that requires years of self-teaching, or applying to programs like our MFA in Interactive Media at the University of Miami— but they offer some foundations.

We've decided to create a public repository of all video tutorials and readings from the course. Enjoy and feel free to use them in any way you want (just give us credit!)

Coincidentally, and talking about our interactive media and journalism programs at the University of Miami, I've been compiling a list of the courses that we offer. Here are a few (I'm not even including classes from our MS in Business Analytics, as I still need to learn which ones are focused on visualization.)

Friday, August 10, 2018

A preview of my 2019 book

At this point you all know that I'm writing a new book, and that it'll be published in 2019. I just finished the draft, which it's now under review by several stats friends such as Heather Krause, Diego Kuonen, Walter Sosa, Nick Cox, Frédéric Schütz, Jon Schwabish, etc. This conversation I just had with The Customer Equity Accelerator Podcast works as a decent preview of what I have to say.

Wednesday, July 4, 2018

Visualizing amalgamation paradoxes and ecological fallacies

I'm spending the Summer writing my first popular science book about charts for the general public, to be published in 2019, and I've been searching for examples of amalgamation paradoxes and ecological fallacies. An amalgamation paradox occurs when patterns appear or disappear depending on how you subset your data, and an ecological fallacy consists on inferring characteristics of individuals based on the features of the groups they belong to.

I was inspired by a recent article and talk by Heather Krause, and decided to recreate her charts with more recent data. Here you have a strong positive correlation (0.51!) between cigarette consumption per person and year, and life expectancy; each dot is a country:

I made the chart in INZight (tutorial) and the data, which comes from the WHO and Gapminder, is here (CSV), in case you want to play with it.

This is an obvious case of spurious causal inference —don't miss this hilarious website,— as we could think of other variables that affect life expectancy at the national level, such as wealth. Using the same data, I color-coded the countries by income group. In general, rich countries have high life expectancies, and poor countries are at the bottom of the Y-axis:

Here you have one plot per income group:

Moreover, the positive relationship between the two variables disappears once we subdivide countries by region...

...and it reverses if we split the data further, down to the individual level, as smoking does shrink your life expectancy. Here's a chart from Heather's article:

Monday, June 11, 2018

Transitions in visualization

I'm a fan of animation in data visualization when used for meaningful transitions. A good example is the latest project in the ongoing collaboration with the Google News Initiative (read more about it in the previous post,) which visualizes search interest in the upcoming Mexican elections. It was designed by Yosune Chamizo and Gilberto León.

Here's a transition between a geographical map and semi-equal area cartogram:

And here's a transition between a map and a chart (we're still working on making the proportions 100% accurate):

Finally, I'd like to mention this animated histogram by the Financial Times, which I'll probably showcase in my upcoming book:

Thursday, June 7, 2018

Soccer, tools, a course, and a new book

News about my ongoing collaboration with the Google News Initiative —see all projects, and an article about them— slowed down a bit on the first half of 2018, but they are about to get more frequent. First, this coming week I begin a brand-new Massive Open Online Course, a 4-week intro to elementary principles of visualization. The course is aimed at absolute beginners, and you can still sign up; more than 4,000 people from more than 100 countries have already done so.

We're also preparing to launch visualizations about the elections in Mexico and Brazil, the U.S. midterms, and a few free software tools that I think you'll really like. I can't say much about them for now, other than they are similar to Flourish in the sense that they let you generate visuals, interactives, and animations that up to this point you could only make through code.

(This is all happening, by the way, while I work on my new book. I need to turn in the draft to the publisher before September, and the book will be launched in 2019. No, it's not The Insightful Art —that'll likely come next— but my first non-fiction hardcover for the general public. I'm excited. See a sneak peek of one page.)

Anyway, the latest visualization coming out of the Google News Initiative explores the World Cup. It was designed by Polygraph, and it lets you see which players and teams are most popular all over the world. I really like its simplicity, it's intentionally Googlish color palette —and the animated players:

Tuesday, May 29, 2018

A conversation with Cole Nussbaumer

The other day I chatted with Cole Nussbaumer for her visualization podcast. Cole is the author of the book Storytelling With Data, which has quickly become a best-seller. We planned for 40 minutes but we ended up talking for more than an hour. I mentioned my 2019 book —I'm working on it right now; I need to turn it in by September— and why I don't think I can contribute much to the visualization field itself, but may be more helpful popularizing graphicacy outside of it.

Tuesday, May 22, 2018

New MOOC(s)

This year I'm doing two separate Massive Open Online Courses, one in English, which begins in June —registration is already open, so sign up— and another one in Spanish (more information soon; this one will likely be around October.)

In these free courses I cover the basics of data visualization and teach a few tools I love, such as INZight and Flourish. I have some tutorials about them here, as you may remember, in case you're interested. In the course I asume you have no knowledge of visualization, so recommend it to friends and acquaintances who you know want or need to learn it.

Here's a screenshot of the video tutorials of the MOOC in English; I felt we needed something else other than me and my computer in them, so I put some books from my home office shelves on the desk:

Wednesday, April 25, 2018

Visualization myths: Henry Beck and the London Underground map

Human nature dictates that whenever we group, we start devising a shared identity, bonding around imaginary heroes, myths, and legends. Visualization, infographics, and data journalism aren't exceptions. Years ago, I wrote about the myths surrounding John Snow's undeniable achievements, and I often need to point out that most visualizations that look very innovative have precedents. It happened just yesterday with one of my graphics. Perhaps it's because I've always been skeptical of nationalisms and other strong identities that I prefer my myths and heroes to exist exclusively in the movies and novels I enjoy watching and reading.

This morning I discovered another possible myth. I guess you're all familiar with Henry Beck's 1933 London Underground map. We've learned that it's a landmark in the history of information design thanks to books, articles, and talks (including mine), but it turns out that the story is —as it often happens— much more complicated and enthralling, as information designer Douglas Rose reminds us in this article. Rose is quite convincing when arguing that Beck was likely inspired by George Dow, an employee of the London & North Eastern Railway.

Here's a 1929 map by Dow which straightens out the lines and evens out distances between stations:

Another paper I've found, which also analyzes Beck's diagram, suggests that not even the idea of expanding the area of central London was entirely his, but was based on the work of designer F.H. Stingemore, who drew some early underground maps. Facts like these don't diminish the importance of Beck's diagram —he did bring together several influences, and came up with ideas of his own— but they put it in context.

Douglas Rose recommends a book by George Dow's son, Andrew. It's titled Telling the Passenger Where to Get Off: George Dow and the Development of the Diagrammatic Railway Map; I've just ordered it: