Monday, August 29, 2016

Download the Datasaurus: Never trust summary statistics alone; always visualize your data

This tweet is quickly becoming the most popular I've ever written. I drew that dinosaur with this fantastic tool created by Robert Grant, a statistician and visualization designer. It lets you plot any points on a scatter plot and then download the corresponding data.

In case you want to use the Datasaurus in your classes or talks to illustrate how important it is to visualize data while analyzing it, feel free to download the data set from this Dropbox link. It'll be fun to first show your audience just the figures and the summary statistics, and then ask them to make the chart:

Saturday, August 27, 2016

The news graphics designer skill set

I continue working on the-project-that-must-not-be-named. I'm currently analyzing several interviews with news graphics professionals and I keep finding many quotes worth saving. Here's Washington Post's Kat Downs on the skills she looks for when hiring for her department:
“Our team is really multidisciplinary, a lot of our projects are team projects. There are fundamentally three skill sets that I am looking for when I hire people. One is reporting, a storytelling skill set. Another is design and that would include things like data visualization, or drawing skill, illustrating skill, modeling skill, strong aesthetic or UI design skill. And the last would be development skill, so that would include data analysis, front end development, full stack development. Typically all of our hires have two and sometimes three of those skill sets. But maybe one is their main and then they’ve got a second or third, but those are the main things that we are looking for. So we have across the team based on that artists, designers who are very focused on usability, visual design, reporters, data focused reporters, developers from junior to people with CS degrees who are extremely, extremely competent, sort of groundbreaking computer science people.”
Take note, students —and professors.

And this a portion of the raw transcript of the interview with NPR's Brian Boyer, a journalist who has a background in computer science. It made me cheer out loud several times (my kids are witnesses):

“I think that, yeah, if you want a journalist who's an experienced software developer but a novice reporter, yeah, teach your programmer how to be a reporter. I'm certainly not going to claim that I'm a great reporter, and I'm still learning about being a pretty good editor. But I would say that I believe the Computer Science —fuck Computer Science, right? I have a four-year degree that is not actually that useful at our day-to-day work. The kind of software that we're building in these rooms, the kind of software that most people are building as consultants, or working for PricewaterhouseCoopers, working for IBM, working for Facebook, —. there's a small subset of people who are doing hard computer science problems, but the vast majority of us are writing code to make webpages, and writing code to make webpages is not that hard. There are certainly some learning curves. There's some bumps in the road you've got to get over, but I really, truly believe that coding is something that anyone can do with practice.  
The analogy I use is it's like learning to cook, right? Anyone can make themselves a grilled cheese sandwich. Anyone can make themselves macaroni and cheese for dinner, and most of the programming we do is macaroni and cheese. Now, there's a certain subset of people who are obsessed with food or obsessed with programming, and then go on and they learn to do much more complicated things, but the difference between you and I and a great chef, is there's a little bit of inspiration, but it's mostly practice. It's mostly just doing it over and over and over again, and that's how you become a great chef. It helps to have good taste, but that's how you become a great chef, and that's how you become a good programmer.  
There's a lot of words we use in the software world like "wizard," and "ninja," and "rock star," and "unicorn," and all those fucking words are bullshit. They create a notion that this kind of work is magic, that it can only be conducted by freaks, and that you don't disturb the programmers; they're special. And that's horse shit. It's not magic. It's just practice, and when we use words like that, we further the idea, we promote the idea that this is fundamentally different than other work that only certain people can do, and that is bad for the field. That's bad for journalism. It keeps people out, and we shouldn't use words like that because we shouldn't be keeping people out. We should be as inclusive as possible. All right. That's my soapbox speech. I think it's really important.”

Saturday, August 20, 2016

Visualization office hours with Google

A month ago I announced a monthly live “office hours” feature with the Google News Lab. We've done two already. You can see them here: 1, 2. This is a series of informal conversations in which I talk about visualization, infographics, and data journalism projects I saw during the previous month.

(Full disclosure, also mentioned in the videos above: I'm doing some consulting work for Google Trends/Google News Lab)

Here are the links that I mentioned:



The Guardian Olympics graphics: 1, 2, 3, 4 




Type for user interfaces: 1, 2

Friday, August 19, 2016

Miami Herald's Zika virus tracker

When writing about visualization, infographics, and data journalism it's easy to highlight just special projects by large organizations that take weeks or months to complete, and forget about the bread-and-butter ones about current topics, which are often much less flashy, but also much more relevant and useful to people.

Miami Herald's Zika virus tracker belongs to the second category. It's a straightforward series of graphics —a large map plus some graphs— produced by a tiny group of earnest professionals. We should give this kind of team more credit and attention than they usually get. They haven't forgotten that journalism is, above all, service, not entertainment.

Tuesday, August 16, 2016

Inspiring visualizations by Sam Petulla

There's so much inspiring data journalism, visualizations, and infographics around nowadays that it's easy to miss plenty of great projects. The work of NBC News's visualization editor Sam Petulla has been under my radar for a while for some reason. What a shame. I've just discovered this fantastic long-form story published in June this year, which describes who Donald Trump's supporters are. It's an example of how to effectively blend a classic written narrative with photographs, interactive graphs, maps, and animated diagrams.

(This Friday at 3 p.m. EST I'm doing another public hangout with the Google News Lab folks; I'll likely mention this piece, among many others about the Olympics.)

Saturday, August 13, 2016

Visualization in comic books

Jonathan Hickman is one of the most interesting comic book authors nowadays. His non-superhero work is consistently innovative, ambitious —some say “pretentious”— and often disorienting, as he loves to play with story structures and layout. One of his earliest books, The Nightly News, is full of graphs, charts, diagrams, and maps. Just take a look:

Hickman has just launched a lovely new series, The Black Monday Murders, which chronicles a vast conspiracy behind the global financial system. Here's a graph from the last pages of the first issue:

I wonder what the scale of that thing is, and where the data came from!

Wednesday, August 3, 2016

Signing copies of "The Truthful Art" for the Digital Humanities + Data Journalism Symposium

One of the giveaways we'll hand out at the upcoming Digital Humanities + Data Journalism Symposium, on September 29-October 2, is a signed copy of my most recent book, The Truthful Art. I'm working on that right now, as you can see in the picture. If you are planning to attend, make sure you have enough room in your luggage.

Saturday, July 30, 2016

Trumpian data visualization

Donald Trump just tweeted this chart, designed, I guess, by the graphics folks at Fox News:

I've learned the hard way that averages are dangerous, particularly if time periods differ so much, so I went to the source, the Bureau of Economic Analyses, downloaded the quarterly GDP percent change, and quickly made the following time-series graphic:

The 2007-2009 crisis caused a drastic plunge between the end of George W. Bush's tenure and Barack Obama's first months in office. Right after that, quarterly GDP variation under his presidency doesn't look that different from previous years. Here's the annual data (source):

Another way to approach this story could be to average annual GDP growth under each president. The data is here, and here's the chart (note: I've updated Obama's figure):

The picture becomes a bit clearer now. It shows the recent stagnation described in Robert Gordon's book, summarized in this article (more here and here.) To be brief: economic growth is becoming much harder to achieve, so it is dubious to compare Obama and Bush Jr. to Johnson or Clinton, not to mention to the average of all previous presidents since 1950. And this is just if you accept that the GDP is a good measure of economic development, as Trump does in his tweet. We could argue that other metrics, like the unemployment rate or wage growth, are equally relevant.

The following chart —its source is a good read— provides another depiction of the steady slowdown of economic growth: the shorter the time period you calculate the average from, the smaller the GDP variation is:

Needless to say, I am no economist, so please chime in below if you wish.

UPDATE: Xan Gregg offers this other chart. And here's Catherine Mulbrandon's proposal, which uses GDP per capita.

Tuesday, July 19, 2016

Visualization office hours

Tomorrow I'm beginning a new monthly feature, the News Lab Data Visualization Round Up, a public hangout in which I'll discuss recent news graphics with Jennifer Lee and Nicholas Whitaker.

The conversation will take place at 12 PM (Eastern time.) If you want to listen to it, sign up here. It'll be fun.

Full disclosure: One of my ongoing consulting gigs is with Google News Lab, working with Simon Rogers and several very popular designers (more about this soon) to create visualizations based on Google Trends data.

Wednesday, July 13, 2016

Talking about visualization with John Burn-Murdoch

I keep working on my PhD dissertation, for which my students and myself are interviewing a lot of news graphics professionals. The latest one is Financial Times's John Burn-Murdoch —follow him on Twitter.

I'll release most —if not all— of these interviews, along with the dissertation itself and some quantitative data, by mid-2017 through the project website, (under construction.) However, the conversation with John was so compelling that I asked him if I could make it public right away. Listen to it here, or below.

(Note: A small portion of the chat didn't get recorded. I asked John to make some predictions about the future of visualization, and he mentioned a larger role for annotation, good headlines, etc.)

Here are links to some of the projects John mentioned:

Monday, July 11, 2016

Free video tutorials to supplement "The Truthful Art"

At the beginning of The Truthful Art I wrote that I was going to release tutorials explaining how the charts and maps in the book were made. It's taken me a while to get started, but I've just uploaded the first batch of seven videos. They deal with the example of elementary data exploration I describe in the Preface of the book (you can download the first 40 pages for free here.) I used iNZight, an R-based free tool that is very easy to learn.

To see all videos, visit the Tutorials & Resources section on the upper menu, or go to my YouTube channel. Keep an eye on either. I'll continue adding tutorials on a regular basis in the next couple of months, as I'll use them in my classes this coming semester.

Friday, July 8, 2016

The first Arabic data journalism book

It's always a pleasure to witness data journalism, infographics, and visualization gaining popularity worldwide. Egypt's business journalist Amr Eleraqi has just published the first Arabic book about our field (see it in Google Books.)

Amr is the founder of InfoTimes, a firm that has produced a good amount of information graphics for local companies. He is an enthusiastic and tireless data evangelist, so consider the short interview below my shameless attempt to promote his work.

How did you get interested in data journalism, infographics, data visualization, etc?

I have more than ten years of experience as a business editor. I’ve always loved to work with data. I deal with numbers all the time. Nowadays we have a lot of leaks, a lot of data everywhere. With very simple tools we can find stories inside it. I love this part, finding stories inside data and make it readable and shareable.

Tell me a bit about your project,

At the end of 2012 I was participating in boot camp organized by ICFJ in Amman. One of the sessions was about infographics. This session inspired me to create a small studio to visualize data. We work for clients like Yahoo Maktoob, Akhbar Elyoum, and Petra, the Jordan news agency. We were shortlisted by GEN’s data journalism awards this year.

We are a small team: 3 journalists, 2 graphic designers, 1 developer, and 1 animator. Sometimes I design, but you can't call me a graphic designer. I'm a journalist who can use graphic design software to present a story in an effective and attractive manner, but my main role is managing the team, besides analyzing data and transform it into stories.

I also train journalists. I'm working with BBC Media Action, Internews and Free Press Unlimited. And right now I'm learning to code. I believe that learning how to code is as relevant for a journalist as learning how to make an interview or writing a story.

You are busy! And besides all that, you wrote a book about data journalism. How did that happen?

It took me two years to write the book. There are no books in Arabic about data journalism, visualization, etc., besides the translation of the Data Journalism Handbook, which is good, but that is not designed for Arab audiences. So it was an obvious opportunity.

That's surprising. Are newsrooms in Egypt and its neighboring countries ready to embrace data journalism, infographics, etc?

Well, we've done more than 100 entry-level workshops just here, in Egypt, and some in other countries like Algeria, Turkey, and Jordan. There is great interest, but very little knowledge.

We do two kind of workshops. One is about data-driven journalism, and it covers topics like how to find data, scrape it, using spreadsheets to analyze it, etc. The other one is about visualization. It deals with how to select the best graph or map for your data, color, and then how to use online tools like Piktochart and

Tell me about the book, its contents, structure, etc.

The book has three chapters. The first one is an introduction to data journalism. The second deals with how to find, scrape, clean, and analyze data. The third is about visualization. It can be ordered online from all Arab-speaking countries, besides having a presence in book fairs.

The second chapter is a relevant one. Getting government or official data in Egypt isn't easy. We don't have an equivalent to FOIA requests here. You can ask official sources for data but you are never sure if they will give it to you or not. Besides, data is never machine readable, as it's always in PDF format.

I'm working with several partners to change the situation here. For instance, I've made two workshops for employees in several ministries. I gave them a series of recommendations. One of them was not to use PDF! Also, we've done a data for good event in collaboration with the International Development Research Centre.

Let's talk about freedom of the press in Egypt. Do you receive pressures or are limited in any way?

The situation is very hard in Egypt, under the current regime. Egypt has turned into an Iran-like country. The government is surrounded by a virtual red wall, and it's very risky for any journalist to trespass. To be safe you have to work and focus on social topics, not political ones. In Egypt right now there is only one tune, and you are required to sing along.

You need to be outside of Egypt to be able to freely write about Egypt. When our friend Hossam Bahgat reported about the corruption inside the military, he was arrested.

How data journalism, which is intrinsically linked to investigative reporting, thrive in an environment like that?

We are trying to work on that. Sometimes we need to report on wrongdoing in an indirect way. For instance, we cannot say that the government is mismanaging expenditures. So we created a calculator. Readers can input their monthly salaries, and the application shows them which portion of their taxes gets directed to different areas like education, healthcare, etc. Then, maybe they can make an inference.

Tuesday, July 5, 2016

Global Sharknado Threat and other adventures in mapping

I've been a fan of cartographer John Nelson's for a few years now. I featured some of his work in The Truthful Art and now, thanks to Jonathan Crowe, I've discovered that he has a blog with detailed mapping tutorials. In it, John explains how he made the peculiar projection of his now famous historical map of hurricanes, or this map of Global Sharkando Threat, while dropping nerdy asides here and there (Star Wars as a stylistic influence!)

Thursday, June 30, 2016

FiveThirtyEight's 2016 Election Forecast is a visualization delight

Perhaps this is not that surprising, but FiveThirtyEight's new Election Forecast is an interactive visualization delight that combines choropleth maps, time-series line charts, box plots, histograms, cartograms, and numerical tables. It's unusual that a single graphic can tell a complex story; this project is good proof of that.

Yesterday, Steve Wexler tweeted “Nothing short of amazed at the creativity and innovation I'm seeing in #dataviz. I think we're just entering a golden age.” I'm inclined to agree despite being a fan of gloomy manifestos. None of the graphic forms used in this project is really novel. They have existed for decades or even centuries, but some of them were used only in specialized publications. FiveThirtyEight is a news website, not a scientific journal. Isn't it encouraging to see that journalists at so many organizations are losing their fear of “confusing” readers with “complex” graphics like histograms? Or of using proper statistical terminology?

Wednesday, June 29, 2016

VR and interactive 3D for infographics and visualization

In a recent piece at the Smithsonian website I said that I am very intrigued by the potential that virtual reality and interactive 3D technologies can have if applied to explanation infographics and data visualizations. Thanks to Carlos Gámez Kindelán I've discovered Sketchfab, a website that collects tons of examples. I'm embedding two:

The time when I designed children's books

Everybody has a past. On a recent trip to Pamplona, Spain, to attend the Malofiej infographics summit, I stopped by a bookstore and saw something that looked familiar. There, by the cash register, were two children's activity books I designed in the late 90s, when I was beginning my career. This and this.

At that time I was a junior infographics journalist at a newspaper called Diario16, and I was making a small salary. Madrid is a quite expensive city, so I took a second job as a freelancer for a company called DPI Comunicación, a pioneer in the design of information graphics for the Spanish press.

DPI didn't do just infographics, though, but also all sort of odd jobs for companies that didn't have anything to do with journalism. One of them was Susaeta, a firm that publishes mostly children and educational books. Between 1998 and 2000 I designed books and board game boxes for Susaeta through DPI. It was fun and the money was sorely needed, but the results were, I'd have to admit, pretty terrible. Just take a look:

Tuesday, June 28, 2016

Defying conventions in visualization: Should time always be on the horizontal axis?

The main picture on the first page of today's The New York Times is a very nice time-series line graph by Alicia Parlapiano. Notice that time is on the Y-axis. You've probably heard or read that time in statistical graphics like this should always be on the horizontal axis because it feels natural, and that if you do otherwise, readers will be confused.

Could this be a cultural convention? In Western societies the passage of time maps onto a virtual, generally horizontal linear scale: before-after translates into “behind me” and ”ahead of me”, and this scale has a left-to-right orientation. Other cultures and languages (see 1, 2, 3) use both horizontal and vertical metaphors to think and talk about time. It'd be great to do some experiments and see if this has an effect on how people read charts.

As for the objection that readers —mostly Western ones here, I guess— will be confused, well, people aren't stupid. They may be puzzled in the first 5 seconds, but only until they take a quick look at the axis labels. When reading graphics, attention overrides preconceived notions.

Hunches aside, I usually recommend to follow conventions unless there's a good reason not to. This is one of those cases. There's a true cultural metaphor at play in this chart: the more liberal-more conservative spectrum, which translates into a left-right scale. If we put time on the horizontal axis, and the left-right scale on the vertical one, the latter would map as higher-lower (update: this is how it shows in the online version, h/t Nathaniel Lash.)

As a final note, here's a prediction: as a majority of readers are accessing their news through smartphones —latest figure I heard from a major news organization in Miami is 80%,— which are usually held upright and navigated by scrolling vertically, vertical time-series charts with time on the Y-axis will become more common in the next few years. Will we witness a new visual convention being born?

Update: On Twitter, Álvaro Valiño has shared this ISOTYPE chart.

Sunday, June 26, 2016

Red-green color schemes in visualization are tricky

I'm following the results of the Spanish presidential elections and have just seen this map of participation. It uses a diverging color scheme to show the difference in comparison to the previous election.

Red and green color palettes are tricky. Color blind people with deuteranopia have a hard time with them. I ran the map through this application and here are the results:

I've made this mistake myself in the past (see here). Please, always check your color schemes. There are multiple places (1, 2, 3, 4...) where you can read about safe palettes for visualization. Use them.

Diverging color schemes: Showing good data isn't enough; you need to show it well

Fraser Nelson, editor of The Spectator, claims that his map of Brexit is better than a diverging color scheme one. See his comparison:

I beg to disagree. Nelson's map is misleading and far from being “real”, although it does show accurate data. This is yet another example of how to build a dubious visualization using legitimate numbers. A much more truthful depiction of the results appeared in The New York Times (below). It improves on the imperfect binary Brexit map by adding shades of color, which is a great idea. Good data isn't the only component of visualization; the way you depict it matters a lot:

The Guardian used fewer shades of color, but it transformed the map into a cartogram. This emphasizes the relative weight of different regions of the country:

Actually, by taking a look at the maps by the Times and The Guardian, I'd argue that Nelson's article and map obscure the fact that the some highly populated areas of Scotland were strongly in favor of remaining in the EU.

UPDATE: On Twitter, Neil Richards wrote: “Made long comment on your blog but it got swallowed up! One point: red/blue not perfect because of political connotations. But yellow also indicates third party SNP these days. Any divergent palettes that don't include red, blue or yellow? 2/2.” I'd refer to ColorBrewer.

Friday, June 24, 2016

Len De Groot on news graphics, data journalism, and caring about your audience

I'm busy analyzing the nearly 40 interviews conducted for my dissertation, which deals with how news graphics have changed in the past decade or decade and a half. I can't resist sharing some bits from the raw transcript of the interview with Len De Groot, director of data visualization at the LA Times. Len has a long career in the industry and has always been a visionary. Enjoy.

Asked about what visual and data journalists have in common, and if it has changed at all:
I think there is, and generally, it's curiosity about the world. I think that's what draws people to journalism, is there's a curiosity about the world. A sense of wanting to describe the world to other people. And I think that's sort of a core value that I don't think has ever really changed. People may lose that sense of mission, and fall out of journalism, or decide to leave journalism, but the people who are in it, and stay in it, kind of have that need, right. It's not something that's facile, it's something that's deeply held within people. So really that's sort of the thing that I look at. 
Even then, compared to now, people who were doing good work were doing good work because they were creative. And they were willing to look at the world in different ways and try to explain it. That hasn't changed either. 
What's changed is there are some important proficiencies. Statistical proficiencies. They are much more widespread. It used to be, journalism was the field you went into because you didn't want to do math. There were newsrooms full of people that could vouch to you that either they said it, or someone they know said that. And that's no longer the case. I think there's a growing understanding in most universities that students have to come out being proficient in statistics, if not being able to question. That's really, in an age where graphics and data are so prevalent, if you can't question the data and question the source in a way that is smart, you're going to be misled. That's just the end of the day. People will mislead with graphics. And constantly do mislead with graphics. Some intentional, some out of ineptitude. But it's our job as journalists to know the difference. So, that's really a core skill that I think has been a really strong change for the better.
About the audience:
You know, I really don't know if the audience has changed. I think we've started to care about the audience, and that's a change. We used to care about the audience in a different way. We used to care that we were telling them the things that they needed to know, or things that might interest them. But now, we really have to, because there's so much competition, do it in a way that people, we're telling people stories in ways that they're interested in getting them. And that really does mean a whole host of different techniques. 
When it comes down to it at the end of the day, it's really not about us. And it can't be about us. The moment that it's about us, we lose our audience. I think it's one of the reason we've seen really smart startups do well. Because they realize an audience wanted something and they gave it to them. Whereas journalists would say, well we don't do that. We don't put numbers in headlines. We have a style of writing a headline that contains some gravitas and is very important. And meanwhile no one reads it. I don't know that that's changed. Does human nature change that rapidly? I don't know that it does. I think what's happened is we've started listening, and I think that that's sort of an important thing. 
I don't know. I don't know if the core values of human nature changed, or how frequently they change. My suspicion is that people will go to whatever is the best for them. There's an equilibrium in the world that doesn't involve us. And it's up to us. That equilibrium may be in a screen, or in a phone, or in something else. And it's up to us to understand where that equilibrium is, and be able to tell a story there. And that doesn't mean we don't do the other things too. We do. But we have to be effective at communicating where they are, and where they want to get information. And it may not be a phone. It may be something completely different. In fact, I will bet, that it's going to be something completely different that we don't know of yet. That in 10 years… Actually I'll make this promise: if in 15 years, if things aren't different, I'll retire. Because frankly, I'll be bored. I really do think it's going to change a lot. If it's not changing, then it's probably due to our faults, not the audience.
About storytelling:
Yeah, sort of that idea of storytelling being compression. The idea that when we're doing acts of journalism, we're going into people's lives, and we're taking of their lives, and portions of other people's lives, and we're compressing them into a story, or a way of communicating important information. And that compression started on papers that were this big, and then the papers got this big, and then we were on desktop monitors, and then we're on laptops, and then we're on phones. And so there's been this shrinking of the box in which we tell our story. 
And one of the things for me that's so exciting about immersive storytelling is that box, that little window, opens things back up. Because now we're peering through that little window, and we can't step forward into it, but we can sort of project our intelligence into it. And we have space in which to investigate, in which we can explore as people. And I think that's sort of human nature, to want to explore, to want to find things. 
The question is, is what do we do to tell a story there, in a way that makes people willing to strap something on their face. That's a lot to ask for someone to do. That's a lot to say. Will you put on this clunky headset? The responsibility lies with us, to come up with ways of telling stories that are engaging, and that will make people want to experience things like that. The medium in itself is not any greater or worse than any other medium that's come before, be it radio, print, TV, whatever. 
But it has this expansion, this actual expansion of space, is something that we can try to use, and we can try to leverage. I think I've said this a few times this week that one the white whales, my white whale, is campaign finance datavis. And doing it in a way that people can actually understand it. And it's so hard to do. Where you end up usually, is charts. We end up with charts that are simplified to show overall trends. Or we end up with trends or searchable data. But neither of those things help people understand the data. And we've had this phrase called of following the money. And it's really something that reporters did, that we did as journalists. We followed the money and tell people the result. But I think that we can help people understand important issues by letting them have the journey. And either guiding them through part of it, or letting them explore and discover. Yes.”