Sunday, June 26, 2016

Red-green color schemes in visualization are tricky

I'm following the results of the Spanish presidential elections and have just seen this map of participation. It uses a diverging color scheme to show the difference in comparison to the previous election.

Red and green color palettes are tricky. Color blind people with deuteranopia have a hard time with them. I ran the map through this application and here are the results:

I've made this mistake myself in the past (see here). Please, always check your color schemes. There are multiple places (1, 2, 3, 4...) where you can read about safe palettes for visualization. Use them.

Diverging color schemes: Showing good data isn't enough; you need to show it well

Fraser Nelson, editor of The Spectator, claims that his map of Brexit is better than a diverging color scheme one. See his comparison:

I beg to disagree. Nelson's map is misleading and far from being “real”, although it does show accurate data. This is yet another example of how to build a dubious visualization using legitimate numbers. A much more truthful depiction of the results appeared in The New York Times (below). It improves on the imperfect binary Brexit map by adding shades of color, which is a great idea. Good data isn't the only component of visualization; the way you depict it matters a lot:

The Guardian used fewer shades of color, but it transformed the map into a cartogram. This emphasizes the relative weight of different regions of the country:

Actually, by taking a look at the maps by the Times and The Guardian, I'd argue that Nelson's article and map obscure the fact that the some highly populated areas of Scotland were strongly in favor of remaining in the EU.

UPDATE: On Twitter, Neil Richards wrote: “Made long comment on your blog but it got swallowed up! One point: red/blue not perfect because of political connotations. But yellow also indicates third party SNP these days. Any divergent palettes that don't include red, blue or yellow? 2/2.” I'd refer to ColorBrewer.

Friday, June 24, 2016

Len De Groot on news graphics, data journalism, and caring about your audience

I'm busy analyzing the nearly 40 interviews conducted for my dissertation, which deals with how news graphics have changed in the past decade or decade and a half. I can't resist sharing some bits from the raw transcript of the interview with Len De Groot, director of data visualization at the LA Times. Len has a long career in the industry and has always been a visionary. Enjoy.

Asked about what visual and data journalists have in common, and if it has changed at all:
I think there is, and generally, it's curiosity about the world. I think that's what draws people to journalism, is there's a curiosity about the world. A sense of wanting to describe the world to other people. And I think that's sort of a core value that I don't think has ever really changed. People may lose that sense of mission, and fall out of journalism, or decide to leave journalism, but the people who are in it, and stay in it, kind of have that need, right. It's not something that's facile, it's something that's deeply held within people. So really that's sort of the thing that I look at. 
Even then, compared to now, people who were doing good work were doing good work because they were creative. And they were willing to look at the world in different ways and try to explain it. That hasn't changed either. 
What's changed is there are some important proficiencies. Statistical proficiencies. They are much more widespread. It used to be, journalism was the field you went into because you didn't want to do math. There were newsrooms full of people that could vouch to you that either they said it, or someone they know said that. And that's no longer the case. I think there's a growing understanding in most universities that students have to come out being proficient in statistics, if not being able to question. That's really, in an age where graphics and data are so prevalent, if you can't question the data and question the source in a way that is smart, you're going to be misled. That's just the end of the day. People will mislead with graphics. And constantly do mislead with graphics. Some intentional, some out of ineptitude. But it's our job as journalists to know the difference. So, that's really a core skill that I think has been a really strong change for the better.
About the audience:
You know, I really don't know if the audience has changed. I think we've started to care about the audience, and that's a change. We used to care about the audience in a different way. We used to care that we were telling them the things that they needed to know, or things that might interest them. But now, we really have to, because there's so much competition, do it in a way that people, we're telling people stories in ways that they're interested in getting them. And that really does mean a whole host of different techniques. 
When it comes down to it at the end of the day, it's really not about us. And it can't be about us. The moment that it's about us, we lose our audience. I think it's one of the reason we've seen really smart startups do well. Because they realize an audience wanted something and they gave it to them. Whereas journalists would say, well we don't do that. We don't put numbers in headlines. We have a style of writing a headline that contains some gravitas and is very important. And meanwhile no one reads it. I don't know that that's changed. Does human nature change that rapidly? I don't know that it does. I think what's happened is we've started listening, and I think that that's sort of an important thing. 
I don't know. I don't know if the core values of human nature changed, or how frequently they change. My suspicion is that people will go to whatever is the best for them. There's an equilibrium in the world that doesn't involve us. And it's up to us. That equilibrium may be in a screen, or in a phone, or in something else. And it's up to us to understand where that equilibrium is, and be able to tell a story there. And that doesn't mean we don't do the other things too. We do. But we have to be effective at communicating where they are, and where they want to get information. And it may not be a phone. It may be something completely different. In fact, I will bet, that it's going to be something completely different that we don't know of yet. That in 10 years… Actually I'll make this promise: if in 15 years, if things aren't different, I'll retire. Because frankly, I'll be bored. I really do think it's going to change a lot. If it's not changing, then it's probably due to our faults, not the audience.
About storytelling:
Yeah, sort of that idea of storytelling being compression. The idea that when we're doing acts of journalism, we're going into people's lives, and we're taking of their lives, and portions of other people's lives, and we're compressing them into a story, or a way of communicating important information. And that compression started on papers that were this big, and then the papers got this big, and then we were on desktop monitors, and then we're on laptops, and then we're on phones. And so there's been this shrinking of the box in which we tell our story. 
And one of the things for me that's so exciting about immersive storytelling is that box, that little window, opens things back up. Because now we're peering through that little window, and we can't step forward into it, but we can sort of project our intelligence into it. And we have space in which to investigate, in which we can explore as people. And I think that's sort of human nature, to want to explore, to want to find things. 
The question is, is what do we do to tell a story there, in a way that makes people willing to strap something on their face. That's a lot to ask for someone to do. That's a lot to say. Will you put on this clunky headset? The responsibility lies with us, to come up with ways of telling stories that are engaging, and that will make people want to experience things like that. The medium in itself is not any greater or worse than any other medium that's come before, be it radio, print, TV, whatever. 
But it has this expansion, this actual expansion of space, is something that we can try to use, and we can try to leverage. I think I've said this a few times this week that one the white whales, my white whale, is campaign finance datavis. And doing it in a way that people can actually understand it. And it's so hard to do. Where you end up usually, is charts. We end up with charts that are simplified to show overall trends. Or we end up with trends or searchable data. But neither of those things help people understand the data. And we've had this phrase called of following the money. And it's really something that reporters did, that we did as journalists. We followed the money and tell people the result. But I think that we can help people understand important issues by letting them have the journey. And either guiding them through part of it, or letting them explore and discover. Yes.”

Thursday, June 23, 2016

There is no “perfect” visualization, but some are more appropriate than others

Steve Wexler has just published a post titled “There is no perfect chart and there is no perfect dashboard.” Go ahead and read it.

Back already? I agree with with Steve, but I'd like to stress, as he suggests, that it's possible to sort visualization types according to how effective they are at conveying a specific message, or at enabling certain tasks. The decision of how to transform data into spatial properties (that's what visualization consists of) is largely subjective, but it can't be based just on personal aesthetic preferences. There is some science behind it, no matter how imperfect some of it may be. Context and purposes also matter.

Let me give you an example, this visualization by Anna Vital (don't miss her posters!) for Google Trends*. It's an interesting, attractive, and elegant graphic, but I think that a few changes could make it even better.

See the screenshot below. The fact that the lines are shaped as curves makes 3h 9m —close to the center of the circle— look only slightly longer than 1h 27min; and 1h 27min looks shorter than 51min, just because the latter is in the outmost ring. Besides, the little clock at the bottom may be a misleading cue, as these lines don't depict times of the day, but time after each of the attacks happened. I think that in this case some of the visual appeal of the graphic ought to be sacrificed in order to increase clarity: lines could be straightened out, or there could be a circle per attack, so proportions would be respected.

I discussed some of these issues (and many more, related to other visualizations) in this recent conversation with the Google News Lab folks.

*Full disclosure: I'm a consultant for the Google News Lab, working mainly with Simon Rogers.

Monday, June 20, 2016

Bring back those histograms and box plots

This data story isn't recent, but I've just been reminded of it by Owen Youngman. Back in 2014, FiveThirtyEight's Allison McCann (who will be at the DH+DH Symposium) and Nate Silver visualized the ages of people with a given name. They did it with several distribution charts, graphic forms that I wish would become more common in news media.

One year later, NYT's Amanda Cox declared 2015 “the year of the histogram.” As she said in that podcast, most people learn to make histograms in Kindergarten, but then many of us forget them for some reason. Let's bring them back!

Tuesday, June 14, 2016

Student visualizes homelessness with d3.js

Luís Melgar is one of our data journalism, visualization, and infographics graduates. For his Masters thesis he decided to do a data-driven story about student homelessness in Florida. This is a problem that is often overlooked by the media.

Many figures presented here are impressive. The number of homeless* students more than doubled between 2005 and 2014. In some counties, one out of five students lack a stable home.

The charts and maps were done mostly with d3.js, and the project looks good on mobile (I'm making my clases mobile-first beginning in the Fall this year, by the way.) Take a look at it.

*A homeless student is not a kid who lives on the street. According to the project, ‘the public education system says that a student is homeless when he or she lacks "a fixed, regular, and adequate nighttime residence," or if they share a home with other people "due to loss of housing" or "economic hardship.”’ As the story shows, this may have serious consequences on a student's performance.

Monday, June 13, 2016

The first Golden Age of data visualization

I'm immersed in my PhD dissertation, so I keep gathering academic books and articles. One I'd like to recommend is Michael Friendly's “The Golden Age of Statistical Graphics” (2008), a breathtaking history of 19th century graphics. This paper ought to be required reading for those who still believe that innovative visualization is the product of modern computers.

Friendly is a Professor of Psychology, Chair of the graduate program in Quantitative Methods at York University, Canada, and creator of the Milestones Project, a timeline of data visualization, thematic cartography, and statistical charts. He also seems to have a good sense of humor, judging by this license plate in his personal website.

In the Golden Age paper, Friendly says:
Collection of data on population and economic conditions (imports, exports, etc.) became widespread in European countries by the beginning of the 19th century. However, there were few data relating to social issues. This was to change dramatically in the period from about 1820 to 1830, and would impel the application of graphical methods to important social issues.
He also explains that the Golden Age was followed by the Dark Ages, the beginning of the 20th century:
The attention and enthusiasm of both theoretical and applied statisticians (turned) away from graphic displays back to numbers and tables, with a rise of quantification that would supplant visualization The statistical theory that had started with games of chance and the calculus of astronomical observations developed into the first ideas of statistical models, starting with correlation and regression, due to Galton, Pearson and others. By 1908, W. S. Gosset developed the t-test, and between 1918–1925, R. A. Fisher elaborated the ideas of ANOVA, experimental design, likelihood, sampling distributions and so forth. Numbers, parameter estimates—particularly those with standard errors—came to be viewed as precise. Pictures of data became considered—well, just pictures: pretty or evocative perhaps, but incapable of stating a “fact” to three or more decimals. At least it began to seem this way to many statisticians and practitioners.
It was only in the 1960's and the 1970's when visualization became popular again among statisticians, thanks to the work of Jacques Bertin, John Tukey, and many others.

Sunday, June 12, 2016

ProPublica visualizes seasonality in dividend payments

It's always nice to see a news organization taking a shot at time-series analysis. A month ago ProPublica launched this visualization of tax avoidance in Germany (here's the rest of the story); the seasonality it reveals is very consistent. Here's the explanation:
German companies typically pay one large dividend in the spring of each year. Shareholders on the official “dividend record date” (abbreviated above as "dividend date") are entitled to the payments, which are usually made the next day. To avoid taxes on the dividends, banks and non-German investors structure short-term loans around these record dates – what’s called “dividend arbitrage.” Stocks are typically loaned over two to 14 days to German investment funds and banks that pay no dividend tax or can claim refunds. This is why demand to borrow shares spikes around the record date.
Notice that the authors, Cezary Podkul and Lena Groeger, took some time to explain the methodology behind the story. I believe this ought to become common practice. Just one question: Isn't it weird that the chart doesn't have a scale on the Y-axis? It's not the first time I see this, and it makes me feel uneasy.

UPDATE (h/t Miska Knapek): Germany has closed the tax loophole that enabled these practices. Who said that investigative reporting and visualization have a minimum impact on society?

UPDATE 2: Bob Rudis has replied to my question about missing scales with this excellent article. Please do make some time to read it.

Saturday, June 11, 2016

Chart: On Twitter, Senate Democrats read more science than Republicans

The Economist has published a bubble chart comparing the ideological leaning of U.S. senators (X-axis) to the amount of science-related Twitter accounts they follow (Y-axis). I immediately spotted Jim Inhofe, a conservative who got famous for calling climate change a “hoax”.

In one of those sad paradoxes of modern American politics, Inhofe chairs the Senate Committee on Environment and Public Works. He is one of the most strident anti-science leaders these days, which is quite an accomplishment considering how anti-science the political landscape is already. Inhofe is one of those folks who aren't happy with just now knowing; his ilk* goes a step beyond that by refusing even the possibility of learning. I guess that they are happy with the post-fact presidential nominee they have fostered: Three quarters –give or take– of his statements are blatant lies.

Rants aside, The Economist chart is based on this paper, which contains many other graphics: Bar charts, tables, network diagrams, etc. The researchers made their data available, in case you want to check it or transform this into a class project.

CORRECTION: On Twitter, Randy Olson points out that following doesn't equal reading, and he's right. The title of this post should use that verb instead. Still, the pattern the chart reveals is surprisingly consistent.

*Just to clarify, this is independent of ideology.

Friday, June 10, 2016

Ben Schmidt: Visualization in the Digital Humanities

One of the speakers at our upcoming Digital Humanities + Data Journalism Symposium is Ben Schmidt, a professor at Northeastern University. Ben is a historian, and he uses data and visualization broadly and smartly. A while back he got a lot of media attention for this project about gendered language in reviews.

My favorite among his visualizations is the map of shipping paths shown below. It's pure graphic poetry. You shouldn't miss the other projects showcased in his personal gallery.

Thursday, June 9, 2016

In infographics, the pencil is still the mightiest weapon

Even if I have mostly focused on data visualization in the past few years, I've kept an eye on the careers of many infographics designers, as I want to write a book about pictorial explanations at some point. Adolfo Arranz is one of those designers. His work is consistently impressive. He has just published a post comparing his sketches with the pieces that finally got published. Wow.

Wednesday, June 8, 2016

Talking about visualization books

This past Monday, Stephanie Evergreen, Jorge Camões, Andy Kirk and I had a Skype conversation about our most recent books. We covered a little bit of everything, including how we discipline ourselves to sit down and write the damn thing, or how these books complement each other. You can listen to the conversation (see below) or download it here. Here are the books:

• Andy's Data Visualisation: A Handbook for Data Driven Design

• Stephanie's Effective Data Visualization: The Right Chart for the Right Data

• Jorge's Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel

• And my own The Truthful Art: Data, Charts, and Maps for Communication

Tuesday, June 7, 2016

Just this week: Buy The Truthful Art with a 40% discount

My publisher is offering The Truthful Art with a discount of 40% until Friday, June 10th at midnight (EST). To purchase it follow this link and use the code TRUTHFUL when paying.

UPDATE: It seems that the offer is valid only in the U.S.

Saturday, June 4, 2016

Plagiarism: When professors are to blame

I’ve just discovered that it's not just certain students who don’t understand what constitutes plagiarism. It's much worse: It’s some professors and even deans.

A while ago Miguel Alcíbar, a professor at the University of Seville, Spain, called my attention to a master's thesis which had just appeared online. Miguel pointed out that the author had copied several paragraphs of Infografía 2.0., my first book about infographics, published only in Spain back in 2008. The student submitted this thesis as part of his Masters studies at the school of design of the University of Rosario, Argentina.

I took a quick look at the thesis and recognized my words in one paragraph. As I saw my name quoted as a source in the bibliography and in one chapter, I thought that the student had just been sloppy and had forgotten to add quotation marks to that paragraph. No big deal. Anybody can be absent-minded. I know I can be (see here), so I tend to be a bit lax with everybody else (see here), certainly much more than most of my colleagues. A simple correction suffices in a case like this, I thought.

But Miguel insisted: He suggested to open Infografía 2.0, put it side by side with the document, and compare both. Miguel was right. At the bottom of this post you can see the evidence for yourselves. On the left side of each of those images I put the original words from Infografía 2.0; on the right, I put the pages from the thesis. I didn’t scrutinize the whole thing. I just picked pages from one or two of its chapters.

I then contacted the professors who were part of the committee, Javier Armentano, Horacio Gorodischer, Gabriela Nazario, Damián Vezzani, and the dean of the school, Olga Corna. They promised to look into it.

The first response I got from the dean was disheartening. She claimed that, upon preliminary evaluation, they hadn’t detected true plagiarism. She explained that students from her school don’t always follow APA citation standards. My reply was that this is not a matter of following any kind of citation guidelines, but of good manners. I’d be fine with the student using a couple of sentences without quotation marks, but full pages? How can a dean justify that?

Well, I’ve just received the evaluation of the entire committee, and it's not any better. It begins by saying that the student will correct all pages where paragraphs without attribution appear. Fine.

But what comes next is bad: the professors double down on their claim that this is not a case of plagiarism because the student mentioned me here and there as a source. This is outrageous. If you read the images below carefully you'll see that my words are mixed with his. In none of these pages it is clear which words were extracted verbatim from a book and which were written by the student himself. I guess that the next step should be to send all the evidence and the committee's response to the Ministry of Education of the province of Santa Fe, where Rosario is.

You know the worst part? I'm not even upset by this. Not with the student, at least. I feel ashamed for some reason, perhaps because I fear that this kind of blindness to even the most elementary rules of academic decorum is becoming widespread. Not just among students, but among damn professors, as well.

NOTE 1: I worry that I am not the only one who has been plagiarized in this thesis. I was about to put a link to it so other authors mentioned in it could check it, but it has been erased.

NOTE 2: Infografía 2.0 isn't available as an e-book, so either the student typed all this stuff —which is absurd; if you are willing to put so much effort into that, why not writing your own words?— or got an illegal PDF copy on the Internet.

Wednesday, June 1, 2016

The first Data Journalism and Digital Humanities Symposium

This Fall (Sept. 30-Oct. 1) the University of Miami is hosting the first Digital Humanities+Data Journalism Symposium.

I'm part of the organization of the event, which intends to bring together two disciplines that use similar tools and techniques, and have them talk to and learn from each other. Besides the official schedule there will be plenty of time to engage in informal conversations.

The symposium was inspired by this article by Dan Cohen, from the Digital Public Library of America. Here's how we envision it:
Digital humanists and data journalists face common challenges, opportunities, and goals, such as how to communicate effectively with the public. They use similar software tools, programming languages, and techniques, and they can learn from each other. Join us for lectures and tutorials about shared data types, visualization methods, and data communication — including text visualization, network diagrams, maps, databases and data wrangling. In addition to the scheduled content, there will be opportunities for casual conversation and networking. The DH+DJ Symposium will take place in the Newman Alumni Center at the University of Miami (Coral Gables Campus).
The symposium will be a small gathering: Registration is limited to 150 people, so be quick. The speaker list (more names will be added soon) looks quite nice, I believe. I plan to sit in all these talks and classes myself! Also, the symposium hotel is the Sonesta, which is really nice. To book a room for a reduced rate, follow the instructions in the website.

Finally, in case you need more reasons to attend, Miami is warm and nice in that time of the year. Just sayin'...

Friday, May 27, 2016

Visualizing chess

Ootro Estudio is a firm based in Alicante, Spain, that offers graphic and furniture design and 3D services. Their latest project is an alluring piece of data art titled Arbor Ludi, which portraits the game tree of eight top chess players from the last century. Here's a description:
The selected players are José Raúl Capablanca, Mikhail Tal, Tigran Petrosian, Bobby Fischer, Anatoly Karpov, Garry Kasparov, Viswanathan Anand and current champion, Magnus Carslen. We chose players with different styles, so their game trees would display the contrast between them. 
To generate the game tree of each champion we used a database of more than 10,000 games. The number of games changes significantly for each player: Capablanca was the less active (596 games) and Karpov was the most active (3,374 games.) This difference in the number of games affects the final result of the representations. 
In order to transform the data into a tree shaped object we used an algorithm programmed with parametric design tools. This algorithm deciphers the topological diagram of all games of each player while it builds a three-dimensional tree whose growth reproduce the said diagram. 
Following this topological diagram, the tree starts with a trunk that represents the total amount of games of each player. From it, the principal branches of the tree emerge, and each one correspond to the first move of all the games, being the thick of each branch proportional to the number of times this moves have been done. In each node, it appears written the move corresponding to the previous branch. The criterion recurs with the next moves, so that new branches emerge which are the result of the different paths that the players have taken in all their games. Each step adds a level of complexity to the topological diagram.
A limitation of the project is that branches that represent identical moves are not positioned identically on the different trees. Therefore, comparisons between players are hard, if not impossible, even if you zoom in to read the labels. Regardless, this is a quite impressive effort, and it'd certainly look beautiful if framed and hanged on your office walls!

Thursday, May 26, 2016

That time when I made readers click 50 times in an infographic

Robert Kosara has written a nice rant (don't miss the comments) against scrollytelling in visualization. He's an advocate for steppers, those graphics that divide the information into sequential screens which can be navigated through numbered or labeled buttons.

During a conversation on Twitter, Knight Foundation's Shazna Resna said that she likes scrolling when using a mobile device and step-by-step graphics when on a computer. I agree with that, but we're moving toward a mobile-first world —if we're not there already,— aren't we? And scrollable visualizations can be done really well; here's some advice from Mike Bostock.

Anyway, this debate reminded me of this monster of an infographic (Flash warning! This won't work on an Apple tablet or phone!) that I designed in 2003. I'm still fond of the 3D models and the vector animations, but I certainly don't think that it's a good idea to make people click more than 50 times to get to the end of the presentation!

We all have a past, I guess. Here you have some screenshots of that awful thing:

Wednesday, May 25, 2016

Simulating the lives of hunter-gatherers with animation and visualization (and good humor)

Simulpast is a large interdisciplinary project organized by the Barcelona Supercomputing Center (BCS) intended to model past human behavior. Its latest effort is Simulados, a simulation of the lives of prehistoric hunter-gatherers in the region of modern Gujarat, India. From the technical description:
This region of India has a strong seasonality and one of the most unpredictable climates in theworld. The main goal is to build an agent based model (ABM), through which we can studythe management of resources and the decision making process of hunter­gatherer groups thatinhabited the region between 10000 BC and 2000 BC. We are interested in analyzing theircapacity for resilience to the extreme variability of the environment, as well as theirinteraction with agro­pastoral groups.
The project is based on a tool called Pandora:
Pandora is an agent based modeling tool designed to run complex simulation models in a high performance environment. The agents represent individuals or groups of people (a family in the case of Simulados) with a complex artificial intelligence algorithm that gives them power to make their own choices and act upon them, thereby interacting between them and with the environment in a totally autonomous way. Pandora is able to simulate millions of agents in large and detailed terrains.
If this sounds too geeky, watch this well-paced and fun animation combining 3D characters, charts, and maps that the BCS put together. It explains the science behind the project quite nicely. This video may inspire those struggling to explain complex ideas to the general public.

H/T Fernando Cucchietti

Tuesday, May 24, 2016

Stacked bar graphs and small multiples

Stacked bar graphs are tricky, particularly when you design more than one and you arrange them in a sequence: Only the bottom and upper portions are comparable to each other, as they sit on common baselines. However, there are cases when this graphic form is appropriate. See this elegant small multiple array just published by The New York Times.

What matters in this graphic is not to compare all parties, but to emphasize the hard-right ones, and then to compare them to all other parties as a whole. Therefore, I think that the decision of coloring all center-right and center-left parties identically makes sense: It's red versus white and gray.

Sunday, May 22, 2016

Hiram Henriquez's "The Importance of Explanatory Infographics in Journalism"

This year I'm busy with my PhD dissertation (see its title here,) so expect some posts about sources that I'm planning to quote. The first one is Hiram Henríquez's “The Importance of Explanatory Infographics in Journalism” (PDF.) Hiram is a colleague of mine at the School of Communication of the University of Miami, where he landed after a long career in the news.

The aforementioned document is the thesis Hiram wrote for his MFA at Savannah College of Art and Design, and it's worth your time. It describes the demise of the traditional news graphics department, and the rise of web visualization and news animation. Its tone is somber, as Hiram believes that the disappearance of the large print infographics that newspapers embraced in the 80s and the 90s is a negative phenomenon.

As you'll see when I make my own dissertation public, likely by March 2017, my view of the changes news infographics has experienced in the past decade is more optimistic — no matter how much I love big graphics myself!