Friday, September 20, 2019

New data journalism and visualization MOOC

I've been organizing Massive Open Online Courses (MOOCs) for many years, but the latest one is the most ambitious to date.

Titled 'Data Journalism and Visualization with Free Tools', it's the product of a collaboration between myself, the Google News Initiative, the Knight Center at the University of Texas, and a large group of instructors (Simon RogersDebra Anderson, Duncan Clark, Jan Diehm, Minhaz Kazi, Dale Markowitz, Marco Túlio Pires, and Katherine Riley,) each in charge of a module.

I say it's the most ambitious because it's the first time we offer a course like this in three languages—English, Spanish, and Portuguese,—and also because it covers a wide variety of topics: finding and scraping data; wrangling and cleaning it; exploring it to find insights; visualizing it, and building narratives. It also has an entire module on Machine Learning and AI. The MOOC doesn't require any prior knowledge or purchasing any software tool. It's intended to get you started in all those areas.

On top of that, if you sign up you'll be able to read the first 30+ pages of How Charts Lie in advance of its publication. All for free. The MOOC begins on October 14; see you there!

Thursday, September 19, 2019

Why Sharpiegate matters: It's not just about the graphics

I've just published another article in Nightingale, the online magazine of the Data Visualization Society. It's titled 'The Day I Thought I Misled the President of the United States: A Visualization Tragicomedy'.

The article deals with my reaction to 'Sharpiegate', and what I think it tells us about visualizations and the importance of protecting the integrity of truth-telling institutions that are supposed to be neutral, such as the Census Bureau or the National Weather Service. In the piece I recommend Michael Lewis's book The Fifth Risk. It's a must-read nowadays.

Monday, September 16, 2019

What is killing us?

Chelsea Bruce-Lockhart and John Burn-Murdoch from the Financial Times' data team have charted “How life, death and disease have changed over the past 180 years”. John has a nice thread on Twitter with some extra information about the project.

The authors explain:
Since the turn of the 21st century, progress made in preventing the spread of infections and parasitic diseases around the world has led to three million fewer deaths each year. A further one million fewer deaths have reportedly been caused by neonatal conditions. Yet these improvements have been entirely offset by the rise in deaths caused by cardiovascular diseases and diabetes. On top of that, there are an additional two million people each year dying of cancer.
The piece contains two visualizations. First, a bubble scatter plot with four variables: disability-adjusted life years on the vertical axis (read the caption for an explanation of what this is); change in the number of deaths per year on the horizontal axis; type of death (color); and death rates (bubble size):

Cancer and coronary heart disease have increased a lot on both axes, while maladies that historically killed millions—and still kill too many,— such as malaria, tuberculosis, or HIV, have decreased.

My favorite graphic is this time series heat map of mortality rates in England and Wales which reveals a steady decline interrupted by the Spanish flu and the two world wars. I wish it were possible to switch the annotations on and off; they are necessary, but also quite obtrusive:

Thursday, September 12, 2019

Visualizing risk challenge results

Back in May, the World Bank’s Global Facility for Disaster Reduction and Recovery (GFDRR) launched the VizRisk 2019 challenge in collaboration with the Data Visualization Society, the Understanding Risk Community, and Mapbox. GFDRR has just announced the winners.

The grand prize when to Riesgo, a visualization of flood hazard in the Philippines by Briane Paul V. Samson and Unisse C. Chua, who have a nice write-up with tons of details. It's a delightful project built with ReactJS and Mapbox. I even like the 3D map in the opening because the height of the bars isn't the only encoding to show elevation (that'd make it confusing); Samson and Chua also used color shade, so the map is quite clear.

You can see all other winners and submissions here. Living in Florida, I was particularly interested in this visualization of the impacts of hurricane Maria in Dominica (article). Also, this story about the aftermath of the 2015 earthquake in Nepal is quite well done.

Wednesday, September 11, 2019

Mark Monmonier, author of 'How to Lie With Maps', talks about Sharpiegate

At this point I assume you've all heard of 'Sharpiegate': Trump illegally manipulated an official forecast map to “prove” that he was right when he tweeted on September 1st that hurricane Dorian was about to hit Alabama “(much) harder than anticipated”. Well, he wasn't right. He should have issued a correction right away.

Rather, he doubled, tripledquadrupled, and quintupled down, sometimes using maps I doubt he can interpret properly.

What could have been just a misunderstanding has derived into a scandal that threatens to undermine the credibility of agencies lives depend on. That's serious, and we all ought to be concerned regardless of ideological leanings.

Anyway, Mark Monmonier has chimed in. If you've never heard of Monmonier you have homework to do. He's the author of books such as Cartographies of Danger, Mapping it Out, or Rhumb Lines and Map Wars, a favorite of mine that I'd recommend to those who keep insisting that the Mercator projection is a “bad projection”—it's not, as I've said before.

Monmonier's most popular book is How to Lie With Maps, an absolute classic now in its 3rd edition. Needless to say, it was an inspiration for How Charts Lie. It's one of the best intros to cartography I've ever read.

All this is a preamble to say: please don't miss the interview. This is a quote that captures Monmonier's indignation: “This guy shows absolutely no subtlety at all. And then people try to make excuses for him. I have never seen anything like this.”

I fear we'll see even worse.

Tuesday, September 10, 2019

Drowning in plastic

A while back I praised a piece by Reuters showing the scale of the Rohingya humanitarian crisis in Myanmar and Bangladesh. A few days ago, Reuters graphics launched a similar project revealing “the world’s addiction to plastic bottles”.

Marco Hernández, one of the authors, told me that he modeled a plastic bottle in Cinema 4D, and then transformed it into a particle that the software could duplicate at the required pace hundreds of thousands of times.

3D software such as Cinema 4D lets you simulate real-world forces such as wind or gravity. Maya, the tool I use, has similar capabilities; that's how I designed this virtual Galton quincunx a while ago. Marco said that it took nearly a week to render the opening animation.

I really like the pictorial comparisons in the piece. This is what would happen if we could pile up the 4 trillion bottles of water consumed worldwide in the past decade:

The story says:
The plastic bottles sold worldwide since 2009 would tower above New York’s Manhattan Island. Data from Euromonitor International shows that more than 480 billion of these bottles were sold last year alone. The 2018 annual figure of almost 482 billion is up more than 50% since 2009. The pile visualised below is around 2.4 km high and dwarfs the glittering skyscrapers of the Financial District at the tip of Lower Manhattan.

Monday, September 9, 2019

Get a free copy of 'How Charts Lie' before it's released

We're nearly one month away from the release of How Charts Lie (October 15), although the book has been available for pre-order for quite some time.

My publisher, W.W. Norton, has announced a giveaway in the U.S. Just sign up here before September 11 at 10AM Eastern Time, and you may win one of the 10 copies of How Charts Lie that Norton is offering. You'll have the opportunity to read the book weeks before its publication date. I hope you'll enjoy it!

Friday, September 6, 2019

Infographics gallery by Chile's El Mercurio

El Mercurio, one of leading national newspapers in Chile, collects some of its infographics in this gallery. They seem to be mostly adapted from their print edition, so interaction is minimal. Interestingly, I think that this plays in their favor, not against them: the amount of information each project presents isn't overwhelming.

Some infographics are pieces about local fauna or culture, but the most appealing are the ones tied to news events, such as the creation of a new province in the country or those about subjects of local relevance, such as a ranking of universities or the expansion of the international airport. I think it's easy to find information about, say, a famous historical painter on Wikipedia, so what visual journalists could try to do instead is to figure out what worries their audiences, or matters the most to them, and cover it. Particularly nowadays, when newsroom resources are decreasing and readers' attention is scattered.

Regardless, there's good work here. And some of these projects look even better in their original print form. See some pages.

Wednesday, September 4, 2019

Upcoming public talks (Calgary this Friday!)

I'm about to resume my public talks discussing topics in How Charts Lie, and also presenting some ideas that may lead to another book in the near future—a proposal for it is already written.

If you want to know what places I'll visit in the next few months, consult my public schedule. This Friday I'll be at the University of Calgary, and next friday I'll visit Grand Rapids. I'll visit Boston, New York, and DC right after How Charts Lie is launched; there may be book signings at some of those events.

Remember that you can contact me to schedule a free talk. Scroll down in this article to see how.

Tuesday, September 3, 2019

3D explainer by Folha de São Paulo

On September 2, 2018 a fire destroyed part of Brazil's National Museum.  Júlia Barbon, Marcelo Pliger, and Simon Ducroquet, from the daily Folha de São Paulo, have designed a 3D explanatory piece about the efforts to recover the building and its collections. It's in Portuguese but you can run it through Google Translate.

Simon explained on Twitter that he was inspired by other pieces, such as this one by The New York Times about Notre Dame. He also mentioned the tools he taught himself to make the web-friendly 3D: three.js, Cinema4D, and glTF exporter, which allows you to transfer objects from 3D programs to Javascript.

I used to make 3D animations myself with 3D-Max and Adobe Flash many years ago, and I still teach an intro to 3D class every now and then—see this little experiment of mine with physics in Maya—so I found Simon's explanation particularly compelling.

Sunday, September 1, 2019

Explaining visualizations in The New York Times, NPR, and the BBC

I've published an op-ed in The New York Times about how to read the National Hurricane Center's cone of uncertainty map. It adapts part of the chapter about uncertainty in How Charts Lie.

Tala Schlossberg made the nice animations. It was a pleasure to work on this project with her and Stuart Thompson for the past few days. I've discussed the piece in these interviews at NPR and BBC Mundo (Spanish).


I started writing about hurricane maps and uncertainty in visualization exactly two years ago. If you're interested in research about these maps, follow Lace Padilla and Le Liu (his dissertation is available online). A difference between this research and the one we're doing at UM is that we're mainly focusing on populations who are very vulnerable to hurricane risks, such as minorities, people without higher education, etc.

In the past two years I've become even more alarmed about how we misread charts like the one above and, more importantly, about how journalists—particularly TV anchors—explain these maps wrong to their audience. Just yesterday Noah Pransky complained about the sloppiness of TV journalists, and for good reason. They still misinterpret hurricane forecast maps in the way I explained. That's irresponsible.


The reactions to the op-ed have been interesting. Some people asked why we don't design better visualizations. That's easier said than done. The cone is sometimes misinterpreted as an either-or display—“if I'm inside, I'm in danger; if I'm outside, I may be fine”—but spaghetti maps and other alternatives have their own problems, making viewers focus too much on the center line of forecast models and neglecting the uncertainty that surrounds them.

Also, knowing where the storm center may go tells us little about the risks we may be exposed to: wind is just one of the threats during a hurricane; storm surge, flooding, and heavy rain can also be deadly, and they may affect you if you live far from the cone center or even outside of it. That's why the NHC and other entities design additional visualizations.

The responsibility of journalists and readers

This is just a personal conjecture, but I don't think we'll be able to design maps that everyone will interpret correctly, at least in the short term. This is related to a complaint some people expressed about the NYT article: it seems that I blame readers, and not the scientists or designers who create the maps. That isn't so, but readers do have a responsibility. And also journalists, who are the translators, or mediators, between scientists and the public.

Scientists and designers certainly ought to create visualizations that are as transparent as possible, of course, but readers often have the unrealistic expectation that any graphic should be understandable without effort. We designers have fed that expectation, and we must stop. I go over a lot of detail about this problem in How Charts Lie. I think that it's related to several common myths, such as the popular “a picture is worth a thousand words”.

We need to accept—and help others accept—that a visualization isn't a picture or an illustration. It's an argument, or a text in the semiotic sense. You need to make an effort to grasp what it means. No graphic, no matter how well designed it is, can resist the test of an inattentive or careless audience.

Another mantra designers love to repeat is “show, don't tell”. Ideally that'd be the case, but when a visual is abstract and complex because it represents complex information, we need to show and tell.

Audiences often lack knowledge of graphic symbols, grammar, and conventions, and therefore may misread certain visualizations. My conjecture is that this is what leads to some misinterpretations of the cone of uncertainty. The cone seems to be an area under threat because it resembles the visual conventions often used to represent areas of relevance or interest on maps, such as distinct colors or sharp boundaries. To create a different mental schema of how to read the cone in readers's brains we need to calmly explain not only what the chart is saying, but also how to read this type of chart in general. Words matter.

Remember Hans Rosling's classic talks and documentaries. The first time he showed one of his now famous bubble scatter plots he didn't jump to its content. He first explained the scaffolding of the chart and its encodings: “position on the X-axis means such and such, position on the Y-axis...” Rosling did what TV news anchors should do, and what I tried to do myself in The New York Times article and in How Charts Lie: help increase readers's graphicacy.

Friday, August 30, 2019

Why we don't visualize uncertainty

There's a whole chapter about uncertainty in How Charts Lie, —I've just adapted part of it for The New York Times—and I mentioned techniques to visualize it in The Truthful Art, so it should be clear I've been a bit obsessed with the topic for a while.

That's why I was so happy when Jessica Hullman told me she's written a paper to explore why designers, journalists, and even scientists, are often reluctant to reveal uncertainty in their graphics.

The results of her survey are relevant, but I found the responses themselves even more interesting:
Multiple interviewees and survey respondents implied that most viewers who they created visualizations for did not require specific information about process or uncertainty to trust that a signal is valid [...] Some authors described trust as a pervasive default in visualization-based communication. As one industry interviewee described it “There’s a participation in trust with the system produced by the information, so wherever that comes from. Most people will trust the doctor, not necessarily because the information itself was trustworthy, but because the doctor was” (I8). In contrast to the seemingly rational expectation that uncertainty would play a role in fostering trust, the same interviewee described how a priori trust is instead a necessary precondition to presenting uncertainty: “I would say that you want trust established before you show uncertainty... My hypothesis would be that it [uncertainty communication] may have no effect for trust development.”
Showing uncertainty “may have no effect for trust development”. That's something waiting to be empirically tested.

Anyway, please do read Jessica's paper, and don't miss her talk at Chi Data Viz (YouTube). Also, Jessica will be the main speaker in my own VizUM Symposium, on December 12. Attendance is free!

Wednesday, August 28, 2019

Mapping water stress risk

Bonnie Berkowitz and Adrian Blanco from The Washington Post map the places in the U.S. that are at risk of draining their water supplies. The piece also contains a scatter plot comparing water stress risk to water consumption state by state, and a beeswarm plot of country risks. Here's an explanation of the water stress scale:
The WRI’s Aqueduct Water Risk Atlas researchers used hydrological models and more than 50 years of data to estimate the typical water supply of 189 countries compared to their demand. The result was a scale of “water stress” — how close a country comes to draining its annual water stores in a typical year.
Living in Florida, I found this particularly worrying:
Florida demonstrates that a state surrounded by seas and perforated by lakes and rivers can still have a water problem. Desalinization of saltwater is expensive and often not practical. The enormous Floridan aquifer provides most of the area’s freshwater, but demand is high. Florida uses the fourth-most water of any state.

Tuesday, August 27, 2019

Yes, charts can and do lie

I've started receiving some mild, reasonable, and predictable pushback to the title of How Charts Lie (example), so allow me to explain my rationale to choose it.

First, some friends have complained that the title sounds too negative, and that it may create a bad impression of data visualization among the public at large. Well, charts can indeed lie. We shouldn't be afraid of saying so. After doing that in How Charts Lie I adopt a cheery and positive tone, as the book is really a manual about becoming better chart readers. It also praises the power of good visualizations.

Another objection I've seen is to claim that “charts don't lie, people lie with charts”. This is true under one strict definition of what lying is: telling someone something you know is not true with the intention to deceive. Lying requires a deliberate decision by a conscious actor (a chart isn't conscious) to convey something untruthful. By this definition, lying is a bit different than deceiving—you can deceive without lying,*—misleading, or bullshitting. More about this later.

My first response is that “charts don't lie, people lie with charts” sounds similar to “guns don't kill people, people kill other people with guns,” a common contention in gun policy debates. I find little merit in this argument. If you're wounded by a bullet shot by someone, it's appropriate to say that the bullet wounded you.

Moreover, the bullet—and the gun—is an artifact designed with the specific goal of hurting. The argument above reminds me of those who claim that artifacts and technology are neutral, and that they only become good or bad when someone uses them. Those arguments and assumptions have been discussed and challenged for decades by philosophers of technology. There's also a growing critical literature about digital tech that you may want to consult: Cathy O'Neil, Meredith Broussard, Virginia Eubanks, or Safiya Noble. I'm not saying you're wrong, just that you shouldn't assume that you can take artifact neutrality for granted without thorough inspection.

Back to the strict definition of lying. Think about cases in which you called a statement a lie. I'm sure you've done so more than once. We all have. Did you read the mind of the actor or actors who made the statement? Or did they explicitly disclose their intentions? Or did you have solid evidence that (a) those actors knew that their statement was false and that (b) they were consciously trying to conceal what they knew to be true with the intention to deceive? Because that's the standard required to call something a lie if you're too nitpicky with words. This standard is what led The New York Times to be (unnecessarily) cautious about calling anything a lie for too long.

How Charts Lie contains examples that I think qualify as lies in this sense, but I cannot tell for certain because liars are usually unwilling to disclose their goals. I'm happy calling those charts lies regardless.

Many other cases in the book are not lies, but instances in which we “lie to ourselves” with a chart. This is the informal way of saying that we often use perfectly fine evidence such as charts in a twisted or inappropriate manner, and we end up extracting faulty inferences or seeing what we want to see. “Lying to ourselves” isn't an entirely accurate term for such a situation, but I believe that everyone understands what it means. Strictly speaking, you can't “lie to yourself without knowing it”, as this article claims. That's a contradiction in terms, but common language isn't as rigorous as philosophical discourse, so I think that the phrase is fine.

I admit that to be 100% accurate I shouldn't have titled the book How Charts Lie, but How Charts Lie, Deceive, and Mislead, and How We Lie with Them, Misread Them, and Use Them to Confirm What We Want to Believe. But the goal of the book isn't to please you, language nerds. It's to attract audiences who don't know much about visualization, get them excited but also a bit skeptical about it, and then show them how it's supposed to work.

Therefore, I'll go with the concise, provocative, and punchy title I chose following Hans Rosling's advice: “You have to be like the worst tabloid newspaper in the front and the Academy of Science in the back.”

*For instance, if I told you something that is true while knowing you're going to get it wrong I wouldn't be telling a lie, but I'd be deceiving you.

Saturday, August 24, 2019

Data 'essays', not data 'stories'

Eric William Lin favors the term “data essay”, rather than “data story”, to refer to data and visualization-driven narrative and explanatory pieces, particularly those in news media or in publications such as The Pudding—which indeed calls its projects “essays”. In the same conversation on Twitter, Kim Rees replied that she prefers “data documentary’; she has a talk about it.

Data journalist and statistician Harkanwal Singh added:
Story is vague and in journalism tradition, are considered definite statements with no alternate interpretations. Data vis essay is more honest, it is one attempt at an explanation.
I agree. “Essay” sounds appropriate and more precise than “story”, as Lin himself wrote. I hope it'll catch on, although I doubt it will, considering how ingrained “storytelling” is in journalism, marketing, and even business analytics. “Essay” suggests a principled, reasoned, but also limited or even personal piece of argumentation that doesn't attempt to be the last word about anything, but to be a part in an ongoing conversation. I like that.

(This is a good time to remind everyone of Lorenzo Amabili's recent article for Nightingale, Joshua Smith's piece, and Jon Schwabish's 5-part series about ‘storytelling’ in visualization.)

Thursday, August 22, 2019

The Amazon is burning

Álvaro Valiño is surprised by the scarcity of infographics about the fires in the Amazon—it's a huge story—and calls attention to this piece by QZ. The article says that these are not wildfires:
The Amazon rainforest is burning at an unprecedented rate, and the fires are unlikely starting themselves. Rather they may be set by people in an attempt to clear land for cattle ranching.
The article contains a annotated satellite images and a worrying map of active fires in a period of 24 hours on August 21:

UPDATE: Bloomberg and other news organizations have published visual pieces about the fires in the Amazon.

Wednesday, August 21, 2019

The emotional journey of designing a great visualization

Classes have begun at the University of Miami. This semester I'm teaching my intro to data visualization and infographics class, and an advanced one called Data Visualization and Infographics Studio, where I ask students to develop a presence in the visualization community—more about that soon—and also a very ambitious data-driven essay similar to The Pudding's.

One of my grad students, Deb P. Davis, tweeted this funny diagram by Celestine Chua:

I repurposed Chua's graphic for my class schedule; the dates here correspond to the project proposal (September 10-17), presenting the first prototype (October 10), coming back from Fall recess (October 22), presenting the second prototype (November 12), and the final critique session (December 3):

Tuesday, August 20, 2019

My first article for Scientific American

The September issue of Scientific American magazine is titled ‘Truth, Lies & Uncertainty: Searching for Reality in Unreal Times’. It contains articles about how deception works (and not only among humans!), how dishonesty spreads, why we trust lies, and how we make decisions when having incomplete information.

My favorite article is by Jessica Hullman. Perhaps not surprisingly, she writes about visualizing uncertainty. This article alone is worth the price of the magazine, so buy a copy if you can.

I wrote the article on the last page. It's titled ‘Does Obesity Shorten Lives? Misreading data visualizations can reinforce biased perceptions’. It explains the ecological fallacy, amalgamation paradoxes and, above all, focuses on how easy it is to misunderstand a chart if we describe its content sloppily, either to ourselves (mentally) or to others (verbally or textually). The example I used is similar to one that I borrowed from Heather Krause (who also fact-checked the SciAm article!) and that I showcase in How Charts Lie.

As I wrote in the article, it's tempting to describe that first chart as “the more obese we are, the longer we live”, when it doesn't really show that. It just shows that, nation by nation, obesity rates and life expectancy are positively associated.

How do I know it's easy to come up with rushed and sloppy verbal descriptions of charts like that? Because I heard them repeatedly in newsrooms when I worked as a visual journalist. I even inadvertently made this mistake in one of my books! On pages 123-128 of The Functional Art there are several graphics about the strong negative association between obesity rates and educational attainment in the U.S. at the state level.

I carelessly described these charts as showing “that better educated people are less likely to be obese”(1). That statement is true—there's no amalgamation paradox here: I checked at the time, and the negative association between education and obesity is obviously weaker person by person, but still pretty substantial. But my maps and graphs don't really corroborate that on their own, something I should have warned readers about; we'd need data about individuals, not states, for that (2). The graphics themselves aren't the problem, though (there may be reasons to design a chart to see whether obesity and education are associated state by state or country by country); the problem is whether we describe what they show accurately.

(1) There are also conspicuous non-weighted means in those graphics; what was I thinking? I should think of doing a second edition of that book one day.

(2) Thanks to Hicham Bou Habib for pointing out some embarrassing lapses in his thorough critique of The Functional Art.

Monday, August 19, 2019

Smart interaction in visualization

I'm always happy to see good charts being produced by national statistical offices. Thanks to Xan Gregg I've discovered these interactive visualizations by Rob Fry's team at the UK Office for National Statistics. They are sequences of density plots of the ages of people who committed suicide every year.

As the annotation layer on the charts themselves point out, there were fewer suicides in the 90s than today or the 1980s, and they were more common among younger people. In later years the highest points of the curve move to the right, indicating an increase in suicides among middle-age people. My other favorite smart feature of this piece is that if you hover over any year, you can see each density plot without being obscured by other curves.

Thursday, August 15, 2019

The increasing popularity of ternary plots

The Guardian's Josh Holder visualizes ‘How a no-deal Brexit threatens your weekly food shop’. It's a neat scrollytelling experience (just an aside: I just remembered that Robert Kosara criticized this narrative technique, as he prefers steppers). The core element in Holder's story is this series of animated maps of imports and exports:

However, the most intriguing part for me is the following interactive ternary plot:

I'm a fan of of expanding readers's visual vocabulary by exposing them to novel and unusual graphic forms. That said, if we know that many people still struggle with two-variable scatter plots, I wonder how they'd react to the graphic above, and also whether textual clarifications may help in cases like this, or if they'd be too redundant.

UPDATE: Plotly's Nicolas Kruchten has shared this article about how ternary plots work.