Tuesday, July 19, 2016

Visualization office hours

Tomorrow I'm beginning a new monthly feature, the News Lab Data Visualization Round Up, a public hangout in which I'll discuss recent news graphics with Jennifer Lee and Nicholas Whitaker.

The conversation will take place at 12 PM (Eastern time.) If you want to listen to it, sign up here. It'll be fun.

Full disclosure: One of my ongoing consulting gigs is with Google News Lab, working with Simon Rogers and several very popular designers (more about this soon) to create visualizations based on Google Trends data.

Wednesday, July 13, 2016

Talking about visualization with John Burn-Murdoch

I keep working on my PhD dissertation, for which my students and myself are interviewing a lot of news graphics professionals. The latest one is Financial Times's John Burn-Murdoch —follow him on Twitter.

I'll release most —if not all— of these interviews, along with the dissertation itself and some quantitative data, by mid-2017 through the project website, www.nerdjournalism.com (under construction.) However, the conversation with John was so compelling that I asked him if I could make it public right away. Listen to it here, or below.

(Note: A small portion of the chat didn't get recorded. I asked John to make some predictions about the future of visualization, and he mentioned a larger role for annotation, good headlines, etc.)

Here are links to some of the projects John mentioned:

Monday, July 11, 2016

Free video tutorials to supplement "The Truthful Art"

At the beginning of The Truthful Art I wrote that I was going to release tutorials explaining how the charts and maps in the book were made. It's taken me a while to get started, but I've just uploaded the first batch of seven videos. They deal with the example of elementary data exploration I describe in the Preface of the book (you can download the first 40 pages for free here.) I used iNZight, an R-based free tool that is very easy to learn.

To see all videos, visit the Tutorials & Resources section on the upper menu, or go to my YouTube channel. Keep an eye on either. I'll continue adding tutorials on a regular basis in the next couple of months, as I'll use them in my classes this coming semester.

Friday, July 8, 2016

The first Arabic data journalism book

It's always a pleasure to witness data journalism, infographics, and visualization gaining popularity worldwide. Egypt's business journalist Amr Eleraqi has just published the first Arabic book about our field (see it in Google Books.)

Amr is the founder of InfoTimes, a firm that has produced a good amount of information graphics for local companies. He is an enthusiastic and tireless data evangelist, so consider the short interview below my shameless attempt to promote his work.

How did you get interested in data journalism, infographics, data visualization, etc?

I have more than ten years of experience as a business editor. I’ve always loved to work with data. I deal with numbers all the time. Nowadays we have a lot of leaks, a lot of data everywhere. With very simple tools we can find stories inside it. I love this part, finding stories inside data and make it readable and shareable.

Tell me a bit about your project, InfoTimes.org

At the end of 2012 I was participating in boot camp organized by ICFJ in Amman. One of the sessions was about infographics. This session inspired me to create a small studio to visualize data. We work for clients like Yahoo Maktoob, Akhbar Elyoum, and Petra, the Jordan news agency. We were shortlisted by GEN’s data journalism awards this year.

We are a small team: 3 journalists, 2 graphic designers, 1 developer, and 1 animator. Sometimes I design, but you can't call me a graphic designer. I'm a journalist who can use graphic design software to present a story in an effective and attractive manner, but my main role is managing the team, besides analyzing data and transform it into stories.

I also train journalists. I'm working with BBC Media Action, Internews and Free Press Unlimited. And right now I'm learning to code. I believe that learning how to code is as relevant for a journalist as learning how to make an interview or writing a story.

You are busy! And besides all that, you wrote a book about data journalism. How did that happen?

It took me two years to write the book. There are no books in Arabic about data journalism, visualization, etc., besides the translation of the Data Journalism Handbook, which is good, but that is not designed for Arab audiences. So it was an obvious opportunity.

That's surprising. Are newsrooms in Egypt and its neighboring countries ready to embrace data journalism, infographics, etc?

Well, we've done more than 100 entry-level workshops just here, in Egypt, and some in other countries like Algeria, Turkey, and Jordan. There is great interest, but very little knowledge.

We do two kind of workshops. One is about data-driven journalism, and it covers topics like how to find data, scrape it, using spreadsheets to analyze it, etc. The other one is about visualization. It deals with how to select the best graph or map for your data, color, and then how to use online tools like Piktochart and Infogr.am.

Tell me about the book, its contents, structure, etc.

The book has three chapters. The first one is an introduction to data journalism. The second deals with how to find, scrape, clean, and analyze data. The third is about visualization. It can be ordered online from all Arab-speaking countries, besides having a presence in book fairs.

The second chapter is a relevant one. Getting government or official data in Egypt isn't easy. We don't have an equivalent to FOIA requests here. You can ask official sources for data but you are never sure if they will give it to you or not. Besides, data is never machine readable, as it's always in PDF format.

I'm working with several partners to change the situation here. For instance, I've made two workshops for employees in several ministries. I gave them a series of recommendations. One of them was not to use PDF! Also, we've done a data for good event in collaboration with the International Development Research Centre.

Let's talk about freedom of the press in Egypt. Do you receive pressures or are limited in any way?

The situation is very hard in Egypt, under the current regime. Egypt has turned into an Iran-like country. The government is surrounded by a virtual red wall, and it's very risky for any journalist to trespass. To be safe you have to work and focus on social topics, not political ones. In Egypt right now there is only one tune, and you are required to sing along.

You need to be outside of Egypt to be able to freely write about Egypt. When our friend Hossam Bahgat reported about the corruption inside the military, he was arrested.

How data journalism, which is intrinsically linked to investigative reporting, thrive in an environment like that?

We are trying to work on that. Sometimes we need to report on wrongdoing in an indirect way. For instance, we cannot say that the government is mismanaging expenditures. So we created a calculator. Readers can input their monthly salaries, and the application shows them which portion of their taxes gets directed to different areas like education, healthcare, etc. Then, maybe they can make an inference.

Tuesday, July 5, 2016

Global Sharknado Threat and other adventures in mapping

I've been a fan of cartographer John Nelson's for a few years now. I featured some of his work in The Truthful Art and now, thanks to Jonathan Crowe, I've discovered that he has a blog with detailed mapping tutorials. In it, John explains how he made the peculiar projection of his now famous historical map of hurricanes, or this map of Global Sharkando Threat, while dropping nerdy asides here and there (Star Wars as a stylistic influence!)

Thursday, June 30, 2016

FiveThirtyEight's 2016 Election Forecast is a visualization delight

Perhaps this is not that surprising, but FiveThirtyEight's new Election Forecast is an interactive visualization delight that combines choropleth maps, time-series line charts, box plots, histograms, cartograms, and numerical tables. It's unusual that a single graphic can tell a complex story; this project is good proof of that.

Yesterday, Steve Wexler tweeted “Nothing short of amazed at the creativity and innovation I'm seeing in #dataviz. I think we're just entering a golden age.” I'm inclined to agree despite being a fan of gloomy manifestos. None of the graphic forms used in this project is really novel. They have existed for decades or even centuries, but some of them were used only in specialized publications. FiveThirtyEight is a news website, not a scientific journal. Isn't it encouraging to see that journalists at so many organizations are losing their fear of “confusing” readers with “complex” graphics like histograms? Or of using proper statistical terminology?

Wednesday, June 29, 2016

VR and interactive 3D for infographics and visualization

In a recent piece at the Smithsonian website I said that I am very intrigued by the potential that virtual reality and interactive 3D technologies can have if applied to explanation infographics and data visualizations. Thanks to Carlos Gámez Kindelán I've discovered Sketchfab, a website that collects tons of examples. I'm embedding two:

The time when I designed children's books

Everybody has a past. On a recent trip to Pamplona, Spain, to attend the Malofiej infographics summit, I stopped by a bookstore and saw something that looked familiar. There, by the cash register, were two children's activity books I designed in the late 90s, when I was beginning my career. This and this.

At that time I was a junior infographics journalist at a newspaper called Diario16, and I was making a small salary. Madrid is a quite expensive city, so I took a second job as a freelancer for a company called DPI Comunicación, a pioneer in the design of information graphics for the Spanish press.

DPI didn't do just infographics, though, but also all sort of odd jobs for companies that didn't have anything to do with journalism. One of them was Susaeta, a firm that publishes mostly children and educational books. Between 1998 and 2000 I designed books and board game boxes for Susaeta through DPI. It was fun and the money was sorely needed, but the results were, I'd have to admit, pretty terrible. Just take a look:

Tuesday, June 28, 2016

Defying conventions in visualization: Should time always be on the horizontal axis?

The main picture on the first page of today's The New York Times is a very nice time-series line graph by Alicia Parlapiano. Notice that time is on the Y-axis. You've probably heard or read that time in statistical graphics like this should always be on the horizontal axis because it feels natural, and that if you do otherwise, readers will be confused.

Could this be a cultural convention? In Western societies the passage of time maps onto a virtual, generally horizontal linear scale: before-after translates into “behind me” and ”ahead of me”, and this scale has a left-to-right orientation. Other cultures and languages (see 1, 2, 3) use both horizontal and vertical metaphors to think and talk about time. It'd be great to do some experiments and see if this has an effect on how people read charts.

As for the objection that readers —mostly Western ones here, I guess— will be confused, well, people aren't stupid. They may be puzzled in the first 5 seconds, but only until they take a quick look at the axis labels. When reading graphics, attention overrides preconceived notions.

Hunches aside, I usually recommend to follow conventions unless there's a good reason not to. This is one of those cases. There's a true cultural metaphor at play in this chart: the more liberal-more conservative spectrum, which translates into a left-right scale. If we put time on the horizontal axis, and the left-right scale on the vertical one, the latter would map as higher-lower (update: this is how it shows in the online version, h/t Nathaniel Lash.)

As a final note, here's a prediction: as a majority of readers are accessing their news through smartphones —latest figure I heard from a major news organization in Miami is 80%,— which are usually held upright and navigated by scrolling vertically, vertical time-series charts with time on the Y-axis will become more common in the next few years. Will we witness a new visual convention being born?

Update: On Twitter, Álvaro Valiño has shared this ISOTYPE chart.

Sunday, June 26, 2016

Red-green color schemes in visualization are tricky

I'm following the results of the Spanish presidential elections and have just seen this map of participation. It uses a diverging color scheme to show the difference in comparison to the previous election.

Red and green color palettes are tricky. Color blind people with deuteranopia have a hard time with them. I ran the map through this application and here are the results:

I've made this mistake myself in the past (see here). Please, always check your color schemes. There are multiple places (1, 2, 3, 4...) where you can read about safe palettes for visualization. Use them.

Diverging color schemes: Showing good data isn't enough; you need to show it well

Fraser Nelson, editor of The Spectator, claims that his map of Brexit is better than a diverging color scheme one. See his comparison:

I beg to disagree. Nelson's map is misleading and far from being “real”, although it does show accurate data. This is yet another example of how to build a dubious visualization using legitimate numbers. A much more truthful depiction of the results appeared in The New York Times (below). It improves on the imperfect binary Brexit map by adding shades of color, which is a great idea. Good data isn't the only component of visualization; the way you depict it matters a lot:

The Guardian used fewer shades of color, but it transformed the map into a cartogram. This emphasizes the relative weight of different regions of the country:

Actually, by taking a look at the maps by the Times and The Guardian, I'd argue that Nelson's article and map obscure the fact that the some highly populated areas of Scotland were strongly in favor of remaining in the EU.

UPDATE: On Twitter, Neil Richards wrote: “Made long comment on your blog but it got swallowed up! One point: red/blue not perfect because of political connotations. But yellow also indicates third party SNP these days. Any divergent palettes that don't include red, blue or yellow? 2/2.” I'd refer to ColorBrewer.

Friday, June 24, 2016

Len De Groot on news graphics, data journalism, and caring about your audience

I'm busy analyzing the nearly 40 interviews conducted for my dissertation, which deals with how news graphics have changed in the past decade or decade and a half. I can't resist sharing some bits from the raw transcript of the interview with Len De Groot, director of data visualization at the LA Times. Len has a long career in the industry and has always been a visionary. Enjoy.

Asked about what visual and data journalists have in common, and if it has changed at all:
I think there is, and generally, it's curiosity about the world. I think that's what draws people to journalism, is there's a curiosity about the world. A sense of wanting to describe the world to other people. And I think that's sort of a core value that I don't think has ever really changed. People may lose that sense of mission, and fall out of journalism, or decide to leave journalism, but the people who are in it, and stay in it, kind of have that need, right. It's not something that's facile, it's something that's deeply held within people. So really that's sort of the thing that I look at. 
Even then, compared to now, people who were doing good work were doing good work because they were creative. And they were willing to look at the world in different ways and try to explain it. That hasn't changed either. 
What's changed is there are some important proficiencies. Statistical proficiencies. They are much more widespread. It used to be, journalism was the field you went into because you didn't want to do math. There were newsrooms full of people that could vouch to you that either they said it, or someone they know said that. And that's no longer the case. I think there's a growing understanding in most universities that students have to come out being proficient in statistics, if not being able to question. That's really, in an age where graphics and data are so prevalent, if you can't question the data and question the source in a way that is smart, you're going to be misled. That's just the end of the day. People will mislead with graphics. And constantly do mislead with graphics. Some intentional, some out of ineptitude. But it's our job as journalists to know the difference. So, that's really a core skill that I think has been a really strong change for the better.
About the audience:
You know, I really don't know if the audience has changed. I think we've started to care about the audience, and that's a change. We used to care about the audience in a different way. We used to care that we were telling them the things that they needed to know, or things that might interest them. But now, we really have to, because there's so much competition, do it in a way that people, we're telling people stories in ways that they're interested in getting them. And that really does mean a whole host of different techniques. 
When it comes down to it at the end of the day, it's really not about us. And it can't be about us. The moment that it's about us, we lose our audience. I think it's one of the reason we've seen really smart startups do well. Because they realize an audience wanted something and they gave it to them. Whereas journalists would say, well we don't do that. We don't put numbers in headlines. We have a style of writing a headline that contains some gravitas and is very important. And meanwhile no one reads it. I don't know that that's changed. Does human nature change that rapidly? I don't know that it does. I think what's happened is we've started listening, and I think that that's sort of an important thing. 
I don't know. I don't know if the core values of human nature changed, or how frequently they change. My suspicion is that people will go to whatever is the best for them. There's an equilibrium in the world that doesn't involve us. And it's up to us. That equilibrium may be in a screen, or in a phone, or in something else. And it's up to us to understand where that equilibrium is, and be able to tell a story there. And that doesn't mean we don't do the other things too. We do. But we have to be effective at communicating where they are, and where they want to get information. And it may not be a phone. It may be something completely different. In fact, I will bet, that it's going to be something completely different that we don't know of yet. That in 10 years… Actually I'll make this promise: if in 15 years, if things aren't different, I'll retire. Because frankly, I'll be bored. I really do think it's going to change a lot. If it's not changing, then it's probably due to our faults, not the audience.
About storytelling:
Yeah, sort of that idea of storytelling being compression. The idea that when we're doing acts of journalism, we're going into people's lives, and we're taking of their lives, and portions of other people's lives, and we're compressing them into a story, or a way of communicating important information. And that compression started on papers that were this big, and then the papers got this big, and then we were on desktop monitors, and then we're on laptops, and then we're on phones. And so there's been this shrinking of the box in which we tell our story. 
And one of the things for me that's so exciting about immersive storytelling is that box, that little window, opens things back up. Because now we're peering through that little window, and we can't step forward into it, but we can sort of project our intelligence into it. And we have space in which to investigate, in which we can explore as people. And I think that's sort of human nature, to want to explore, to want to find things. 
The question is, is what do we do to tell a story there, in a way that makes people willing to strap something on their face. That's a lot to ask for someone to do. That's a lot to say. Will you put on this clunky headset? The responsibility lies with us, to come up with ways of telling stories that are engaging, and that will make people want to experience things like that. The medium in itself is not any greater or worse than any other medium that's come before, be it radio, print, TV, whatever. 
But it has this expansion, this actual expansion of space, is something that we can try to use, and we can try to leverage. I think I've said this a few times this week that one the white whales, my white whale, is campaign finance datavis. And doing it in a way that people can actually understand it. And it's so hard to do. Where you end up usually, is charts. We end up with charts that are simplified to show overall trends. Or we end up with trends or searchable data. But neither of those things help people understand the data. And we've had this phrase called of following the money. And it's really something that reporters did, that we did as journalists. We followed the money and tell people the result. But I think that we can help people understand important issues by letting them have the journey. And either guiding them through part of it, or letting them explore and discover. Yes.”

Thursday, June 23, 2016

There is no “perfect” visualization, but some are more appropriate than others

Steve Wexler has just published a post titled “There is no perfect chart and there is no perfect dashboard.” Go ahead and read it.

Back already? I agree with with Steve, but I'd like to stress, as he suggests, that it's possible to sort visualization types according to how effective they are at conveying a specific message, or at enabling certain tasks. The decision of how to transform data into spatial properties (that's what visualization consists of) is largely subjective, but it can't be based just on personal aesthetic preferences. There is some science behind it, no matter how imperfect some of it may be. Context and purposes also matter.

Let me give you an example, this visualization by Anna Vital (don't miss her posters!) for Google Trends*. It's an interesting, attractive, and elegant graphic, but I think that a few changes could make it even better.

See the screenshot below. The fact that the lines are shaped as curves makes 3h 9m —close to the center of the circle— look only slightly longer than 1h 27min; and 1h 27min looks shorter than 51min, just because the latter is in the outmost ring. Besides, the little clock at the bottom may be a misleading cue, as these lines don't depict times of the day, but time after each of the attacks happened. I think that in this case some of the visual appeal of the graphic ought to be sacrificed in order to increase clarity: lines could be straightened out, or there could be a circle per attack, so proportions would be respected.

I discussed some of these issues (and many more, related to other visualizations) in this recent conversation with the Google News Lab folks.

*Full disclosure: I'm a consultant for the Google News Lab, working mainly with Simon Rogers.

Monday, June 20, 2016

Bring back those histograms and box plots

This data story isn't recent, but I've just been reminded of it by Owen Youngman. Back in 2014, FiveThirtyEight's Allison McCann (who will be at the DH+DH Symposium) and Nate Silver visualized the ages of people with a given name. They did it with several distribution charts, graphic forms that I wish would become more common in news media.

One year later, NYT's Amanda Cox declared 2015 “the year of the histogram.” As she said in that podcast, most people learn to make histograms in Kindergarten, but then many of us forget them for some reason. Let's bring them back!

Tuesday, June 14, 2016

Student visualizes homelessness with d3.js

Luís Melgar is one of our data journalism, visualization, and infographics graduates. For his Masters thesis he decided to do a data-driven story about student homelessness in Florida. This is a problem that is often overlooked by the media.

Many figures presented here are impressive. The number of homeless* students more than doubled between 2005 and 2014. In some counties, one out of five students lack a stable home.

The charts and maps were done mostly with d3.js, and the project looks good on mobile (I'm making my clases mobile-first beginning in the Fall this year, by the way.) Take a look at it.

*A homeless student is not a kid who lives on the street. According to the project, ‘the public education system says that a student is homeless when he or she lacks "a fixed, regular, and adequate nighttime residence," or if they share a home with other people "due to loss of housing" or "economic hardship.”’ As the story shows, this may have serious consequences on a student's performance.

Monday, June 13, 2016

The first Golden Age of data visualization

I'm immersed in my PhD dissertation, so I keep gathering academic books and articles. One I'd like to recommend is Michael Friendly's “The Golden Age of Statistical Graphics” (2008), a breathtaking history of 19th century graphics. This paper ought to be required reading for those who still believe that innovative visualization is the product of modern computers.

Friendly is a Professor of Psychology, Chair of the graduate program in Quantitative Methods at York University, Canada, and creator of the Milestones Project, a timeline of data visualization, thematic cartography, and statistical charts. He also seems to have a good sense of humor, judging by this license plate in his personal website.

In the Golden Age paper, Friendly says:
Collection of data on population and economic conditions (imports, exports, etc.) became widespread in European countries by the beginning of the 19th century. However, there were few data relating to social issues. This was to change dramatically in the period from about 1820 to 1830, and would impel the application of graphical methods to important social issues.
He also explains that the Golden Age was followed by the Dark Ages, the beginning of the 20th century:
The attention and enthusiasm of both theoretical and applied statisticians (turned) away from graphic displays back to numbers and tables, with a rise of quantification that would supplant visualization The statistical theory that had started with games of chance and the calculus of astronomical observations developed into the first ideas of statistical models, starting with correlation and regression, due to Galton, Pearson and others. By 1908, W. S. Gosset developed the t-test, and between 1918–1925, R. A. Fisher elaborated the ideas of ANOVA, experimental design, likelihood, sampling distributions and so forth. Numbers, parameter estimates—particularly those with standard errors—came to be viewed as precise. Pictures of data became considered—well, just pictures: pretty or evocative perhaps, but incapable of stating a “fact” to three or more decimals. At least it began to seem this way to many statisticians and practitioners.
It was only in the 1960's and the 1970's when visualization became popular again among statisticians, thanks to the work of Jacques Bertin, John Tukey, and many others.

Sunday, June 12, 2016

ProPublica visualizes seasonality in dividend payments

It's always nice to see a news organization taking a shot at time-series analysis. A month ago ProPublica launched this visualization of tax avoidance in Germany (here's the rest of the story); the seasonality it reveals is very consistent. Here's the explanation:
German companies typically pay one large dividend in the spring of each year. Shareholders on the official “dividend record date” (abbreviated above as "dividend date") are entitled to the payments, which are usually made the next day. To avoid taxes on the dividends, banks and non-German investors structure short-term loans around these record dates – what’s called “dividend arbitrage.” Stocks are typically loaned over two to 14 days to German investment funds and banks that pay no dividend tax or can claim refunds. This is why demand to borrow shares spikes around the record date.
Notice that the authors, Cezary Podkul and Lena Groeger, took some time to explain the methodology behind the story. I believe this ought to become common practice. Just one question: Isn't it weird that the chart doesn't have a scale on the Y-axis? It's not the first time I see this, and it makes me feel uneasy.

UPDATE (h/t Miska Knapek): Germany has closed the tax loophole that enabled these practices. Who said that investigative reporting and visualization have a minimum impact on society?

UPDATE 2: Bob Rudis has replied to my question about missing scales with this excellent article. Please do make some time to read it.

Saturday, June 11, 2016

Chart: On Twitter, Senate Democrats read more science than Republicans

The Economist has published a bubble chart comparing the ideological leaning of U.S. senators (X-axis) to the amount of science-related Twitter accounts they follow (Y-axis). I immediately spotted Jim Inhofe, a conservative who got famous for calling climate change a “hoax”.

In one of those sad paradoxes of modern American politics, Inhofe chairs the Senate Committee on Environment and Public Works. He is one of the most strident anti-science leaders these days, which is quite an accomplishment considering how anti-science the political landscape is already. Inhofe is one of those folks who aren't happy with just now knowing; his ilk* goes a step beyond that by refusing even the possibility of learning. I guess that they are happy with the post-fact presidential nominee they have fostered: Three quarters –give or take– of his statements are blatant lies.

Rants aside, The Economist chart is based on this paper, which contains many other graphics: Bar charts, tables, network diagrams, etc. The researchers made their data available, in case you want to check it or transform this into a class project.

CORRECTION: On Twitter, Randy Olson points out that following doesn't equal reading, and he's right. The title of this post should use that verb instead. Still, the pattern the chart reveals is surprisingly consistent.

*Just to clarify, this is independent of ideology.

Friday, June 10, 2016

Ben Schmidt: Visualization in the Digital Humanities

One of the speakers at our upcoming Digital Humanities + Data Journalism Symposium is Ben Schmidt, a professor at Northeastern University. Ben is a historian, and he uses data and visualization broadly and smartly. A while back he got a lot of media attention for this project about gendered language in RateMyProfessor.com reviews.

My favorite among his visualizations is the map of shipping paths shown below. It's pure graphic poetry. You shouldn't miss the other projects showcased in his personal gallery.

Thursday, June 9, 2016

In infographics, the pencil is still the mightiest weapon

Even if I have mostly focused on data visualization in the past few years, I've kept an eye on the careers of many infographics designers, as I want to write a book about pictorial explanations at some point. Adolfo Arranz is one of those designers. His work is consistently impressive. He has just published a post comparing his sketches with the pieces that finally got published. Wow.