Monday, July 21, 2014

Thank you for a fun roundtable, Periscopic

I've just landed from a trip to Portland. Kennedy Elliott, Sarah SlobinScott Murray and I were invited by Periscopic to participate in a roundtable, as part of their their 10th anniversary celebrations. It was a lot of fun, and I even had some time to interview Kim Rees and Dino Citraro for my next book.

The Periscopic crowd has just launched a new version of their website/portfolio. Their work, as they say, is both art and science: Emotional data visualization.

Here's an accurate summary of our discussion. I'm sorry we couldn't be more helpful!
And here you have some photographs:







Tuesday, July 15, 2014

Gorgeous infographics from Fortune magazine (March 1938 issue)

Map by Richard Edes Harrison for Fortune (March 1938)


Last month I went to Steven Heller's moving sale in NYC with Scott Klein, Eric Sagara, and Kaiser Fung. I was not planning to buy anything but, unfortunately (for my wallet), Heller was selling several issues of Fortune magazine from the 30s, 40s, and 50s. I walked out with 12 under my arm. That wasn't a trivial feat, as each old Fortune is as big as a tabloid newspaper and as thick and heavy as a hardcover book. Good workout!

It's impossible to talk about the history of news graphics and visualization without mentioning Fortune. A simple Google search ("fortune infographics”) will reveal several posts and articles that collect beautiful examples: 1234. However, let me assure you that it's not the same to see those graphics on a computer screen than to hold them on your hands, printed on high-quality paper. They are glorious.

It's taking me hours to go over each of the issues I got, so for now I'll just show you a few pictures from the March 1938 one.

Thursday, July 10, 2014

Two new articles: Data journalism is about the journalism, not about the data — and why visualization matters

I guess that at this point you've all read the article about data journalism at the Harvard's Nieman Lab's website. I'd just want to recommend that you also see Zeynep Tufeczi's analysis of FiveThirtyEight's predictive models for the World Cup, Mark Berman's hilarious spoof ‘President Obama and the horse mask person: An investigation involving data and charts.’ Derik Harris' ‘Data journalism could use a jolt of data science,’ and this conversation and article about what J-schools could be teaching. Oh, and this collection of maps (h/t David Shiffman.)

The latest issue of the German magazine Message, which covers topics related to journalism and the news media, focuses on information graphics and visualization. I wrote a short piece for it, titled ‘Why Visualizing Information Matters.’

Finally, after the Nieman thing went live, something funny happened on Twitter between me, Aron Pilhofer, and Andrew Losowsky. I really do my best to avoid being grumpy (the article ends in a positive note!) —but I guess that it's all part of the linear model:




Wednesday, July 9, 2014

More readings on data, visualization, and infographics (4)

(See the previous here's-tons-of-stuff posts here, here, and here. And my main reading list.)

Alex Howard has been interviewed by Mediashift about his recent Tow report on data journalism. It's a meaty conversation. Excerpts that are music to my ears:
My observation when comparing and contrasting the humanities and sciences — I was a double major — is that you often are using data in the context of experiments, in trying to understand the why of something and what data exists to help us understand. That gives you a different kind of pursuit of knowledge than the classic journalist approach, whereby you go out and talk to people and then you have a story, but it might not give you a broader perspective about the number of times this issue happens or where — in other words, context. 
It’s critical that data journalism, even though it’s a hot, new topic, not be divorced from decades of computer-assisted reporting or investigative journalism. These are new tools, new techniques, new opportunities and there are new risks that go along with them, but the ethics of creating knowledge from data aren’t fundamentally divorced from the ethics of creating knowledge from talking to people as sources. You may need to protect the data — its prominence or its sourcing. You may need to secure it if it’s sensitive data. 
 It’s not just about the hard skills. It’s also about computational thinking — thinking about data as a strategic resource. There’s a real challenge around digital literacy in traditional print journalism. If you look around journalism schools, there are a lot of people who are from that side or from broadcast journalism, but they may not have the grounding in how to go about doing these kind of data stories. 
In the same way that a young person is expected to use a computer, they’ll also need to open up a spreadsheet and do basic statistical analysis. They’ll need to be able to understand the end value of a study and to know what someone is talking about with R-values and regression. They’ll need to have some literacy around maps and charts and infographics and ways to present information and visualize data. Just in the same way young journalists are learning how to create basic webpages, how to take pictures, how to use mobile devices, shoot video and create basic apps, these are tools that are going to become part of the ways that 21st century journalists practice their craft. To not use one of the tools is to be unable to practice part of the craft, as it is currently being defined and expanded.
Paul Bradshaw is surprised by the fact that “over 1,000 journalists are now exploring scraping techniques” thanks to his excellent e-book. Actually, the surprising thing for me is that ONLY 1,000 journalists are doing so.

• 9 free platforms for journalists to learn how to code. You actually won't learn how to code if you just use those. You'll need to do a lot of real work on your own. But they'll be helpful anyway if you've never written a line of Python and still believe that coding is hard (it is, but not as much as you may think.)

• The Facebook mood manipulation experiment has been on the news for more than a week. Two articles to put the controversy in context: With Big Data Comes Big Responsibility, and The Test We Can—and Should—Run on Facebook. They reminded me of when, a while ago, some fellow nerds laughed at my liking Evgeny Morozov's and Jaron Lanier's latest books (1, 2).

• What about that study linking biking and prostate cancer that you've probably heard about? Here's what StatsChat's Thomas Lumley has to say: “There’s borderline evidence from a weak study design for a sensational finding that isn’t supported by any prior evidence. This is fine as research, but it shouldn’t be in the headlines.” I haven't read the study myself, though.

Sunday, July 6, 2014

El Mundo's infographics get a nice shout-out

I've just finished Making News at The New York Times, an ethnography in which GWU's Nikki Usher describes a newsroom struggling to change and adapt. If you've read the NYT’s famous (and leaked) innovation report and this other from Duke University, the main themes in Usher's book will sound familiar. Among them, the increasing relevance and visibility of the graphics, interactive, multimedia, and data projects.

On a personal note, I was happy to see that my ex-colleagues at El Mundo's online infographics team (David Alameda, Miguel Nuño, Juan Carlos Sánchez) got a nice shout-out on page 162 thanks to their coverage of the 2010 earthquake in Haiti. Well deserved!


Saturday, July 5, 2014

Two interactive visualizations by Spanish students



Every year, I give advice to some students from the Universitat Oberta de Catalunya (the one we're I'm getting my PhD) on their capstone projects. UOC is a public institution which puts great emphasis on new technologies, including multimedia, coding, visualization, and infographics. Months ago I wrote about the Visualizing Buffy project, and today I'd like to mention David García's Everest: Adventure or Business, and Gemma Pallàs' The New Believers, which have been just turned in after months of hard work. They've used interactive charts and maps (d3.js-based) and 3D animation extensively. They aren't perfect yet —copy needs editing, and performance is a bit lacking in some places— but I'm quite satisfied with the results. If you need interns or entry-level people, perhaps you may want to consider them!

Thursday, July 3, 2014

An elegant visualization about the latest mass extinction

Anna Flagg's ‘A Disappearing Planet, the latest interactive visualization from ProPublica, was inspired by Elizabeth Kolbert's book The Sixth Extinction. Take time to explore it. Having seen this d3.js-based data project in the making during the past five weeks, I must confess that I really like how it turned out stylistically. The graphic is elegant and clean. And the little animated photos are quite nice, aren't they?


Wednesday, July 2, 2014

The challenges of classification in choropleth maps

Building classes for choropleth maps is always tricky business. By grouping values together as intervals, you always put yourself at the risk of hiding important nuances in the data. There are reliable guidelines you can follow, but the process always requires a good dose of common sense. This excellent article by John Nelson (h/t Rob Simmon and Jorge Camões) explains this challenge really well.

The map below, published today by The New York Times —see it online,— is a good example. Notice that the last class corresponds to the values above 30%. The problem is that this class includes values as big as 89% —or even higher, I didn't check! Perhaps it makes sense to create a fifth class for the counties in which Evangelicals and Mormons are a majority of the population (51%)? Besides, I'm not sure that using equal intervals is the best choice here. But it may be just me. I haven't seen their dataset, after all.


Monday, June 30, 2014

Stupid icons based on NSA programs with stupid names

The latest interactive graphic by ProPublica is a matrix that maps all NSA programs. The chart was designed by Jeff Larson and Julia Angwin, author of Dragnet Nation (I got a signed copy; thanks, Julia!) The first version of the project was a bunch of gridlines, dots, and little labels. It was good, but it also looked a bit flat. Assistant managing editor Eric Umansky suggested that we could spice it up a bit by adding funny illustrations, considering the names of many of the programs, such as ‘Nosy Smurf’ or ‘Egotistical Goat’. I made a few quick sketches (see below) and we got a go. The final drawings were done with Adobe Illustrator. This is an explanation of the project.

At first, I thought that these were too cartoony and childish, and not a good fit for ProPublica, a serious news organization fully devoted to investigative reporting, but Eric, Jeff, and Julia thought otherwise. I guess that these stupid icons are an appropriate depiction of the stupid names that the NSA had chosen. You could say that style follows purpose.

(A note about jokes: Hats and masks are inspired by Spy vs. Spy, and the Royal Concierge icon may puzzle those who have not watched a certain TV show yet.)





Sunday, June 29, 2014

On the origins of the scatter plot

Above, a scatter plot by
Francis Galton
Chapter fifteen of How Not to Be Wrong, a book that I recommended a couple of posts ago, talks about John Herschel, Francis Galton, and the history of the scatter plot. I had read about this in Howard Wainer's Picturing the Uncertain World but I got curious and did a quick search in Google. I wanted to take a look at Galton's and Herschel's charts. And here's what I found: Michael Friendly's and Daniel Denis' ‘The early origins and development of the scatterplot’ (Journal of the History of the Behavioral Sciences, 2005.) Quoting:
“Among all the forms of statistical graphics, the humble scatterplot may be considered the most versatile, polymorphic, and generally useful invention in the entire history of statistical graphics.”
Friendly and Denis also cover connected scatter plots, the Phillips Curve —“one of the most famous curves in economic theory” (see below,)— the scatter plot matrix, etc. Don't miss the article.


Saturday, June 28, 2014

Teaching visualization podcast: The best parts

I guess that it's hardly a surprise that I found the latest datastori.es podcast fascinating. Enrico Bertini, Moritz Stefaner, Scott Murray, and Andy Kirk did a great job at describing many of the tensions, struggles, and trade-offs visualization educators face, and at offering useful suggestions. Quick notes:

Enrico: “When people think about professors they believe that once the semester ends you’re done with your job. Which is actually the opposite: Now you can finally do your job!”

Andy: “(One of the main challenges is) to find a way to bridge the gap between the top people (…) the ‘Illuminati’ of the field, (those who are) pushing the boundaries of what we should do, what we could do creatively and theoretically, and the everyday person who is working with data (…) How to make the translation to the lower end of the pyramid.”

Andy: “I bang the drum incessantly on ‘it depends‘. You need to embrace all these principles, but I try to be not very dogmatic and avoid some sort of cookie-cutter approach.”

Enrico: “We all know that the principles that we teach are not necessarily black and white (…) But it’s a struggle because on the one hand, if you make everything relative you run the risk that the students will walk away with nothing. I wonder if it’s better to give some clear-cut rules and principles and then let them discover that there are cases in which this doesn’t work.”

Scott: “One of my main ongoing challenges is the tension between tools, process, principles, and history on one side, and then the technology on the other side, because the technology can just eat up so much time!”

Scott: “You cannot really separate the tools from the process.”

Enrico: “It doesn’t matter how much theory or how much principles you teach. You need to have your students practicing those principles. Otherwise, they won’t absorb them.”

Enrico: “The last time I taught my course I introduced d3.js. I didn’t let students use any other tools or frameworks and this worked just perfectly, much better than I expected. I think that one of the reasons is that students managed to help each other a lot, they mastered the language in a few weeks. I had an assistant teaching them a d3 seminar, and the feedback from the students was great.”

Enrico: “One thing that frustrates me is that many of my students come to my class with this mindset that data analysis is just aggregation, aggregating everything and coming up with four numbers. And it’s not. It’s about disaggregation, about showing as many details as you can without overwhelming people. Then it starts working really well.”

Scott: “I think that it’s a great idea to begin by seeing tons of examples. The first assignment in my course is a short one: Go out into the world and find a handful of infographics, statistical charts, whatever. Choose the ones that you believe are successful or unsuccessful and then write about them, critique them, and tell us about them. To me the first step is to build this library in your head of what the possibilities are.”

Friday, June 27, 2014

Summer reading: Recent books on Math and statistics



Mathematical thinking seems to be quite popular nowadays. After Nate Silver's The Signal and the Noise became a #1, and books like Naked Statistics made it to the bestseller list, publishers have continued offering new titles on a regular basis. A few months ago, I recommended David J. Hand's The Improbability Principle, and Kaiser Fung's Numbersense. Today I'd like to mention the very recent How Not to Be Wrong: The Power of Mathematical Thinking (which I'm currently reading; it's witty and deep,) Everyday Calculus, (which is waiting here by my side,) and The Grapes of Math: How Life Reflects Numbers and Numbers Reflect Life (which I'll buy at Barnes&Noble over the weekend.) Summer is the best time to enjoy a great book, my fellow geeks. Use your time wisely.

Disclaimer: All links above are affiliate links to Amazon.com. That means that I'm paid a small amount of money for the books you buy after clicking on them. I don't get any cash directly from Amazon, though, but gift cards that I use to buy more books. The average monthly payment I got last year was $75.

What happened to El País' infographics?

Note: This post was written after talking with ten sources who are familiar with El País graphics desk, including its director, Tomás Ondarra. At least two sources asked independently from each other corroborated each assertion. Most sources chose to remain anonymous to avoid retaliation.

ProPublica's News Apps director Scott Klein is getting ready to teach a class at The New School. Last week, he was asking everyone in the nerd cube for examples of good interactive visualizations from all over the world to discuss with his students. After a bunch of Latin American and European news organizations were suggested, he asked: “What about El País?”

“What about it, indeed?” I thought. After all, El País is the largest newspaper in Spain (1). Here's an explanation for Scott:

El País was still a name to consider in infographics in 2005, when it won the Peter Sullivan award in the Malofiej competition with a multimedia project. Since then, it has vanished from the map. It keeps producing graphics, but what appears in El País online is unexciting static projects adapted from the print paper.

How is that possible in a time when doing an interactive chart or map is almost trivial thanks to Plot.ly, Datavisu.al, RawBlockSpring, Datawrapper, Tableau Public, and so many other free tools? After all, plenty of tiny visualization departments and individuals who are not backed by large budgets or staffs are capable of publishing outstanding work thanks to them nowadays. The answer is a lesson on how rigid structures and dynamics inherited from the past can hinder change in news organizations.

El País' graphics desk is headed by Tomás Ondarra, an experienced infographics artist, painter, and book author. Nothing wrong with being an artist, of course. A lot of illustrators are also excellent graphics journalists who have kept up with the times. The challenge El País faces is different, and it doesn't seem to be essentially based on the background of its graphics director, on shrinking resources, or on the fact that this is a notoriously knotty organization to navigate, contrary to what Ondarra —“backstabbing is common currency here”— uses to point out.

Although it's true that after a painful series of layoffs El País graphics desk is much smaller than it used to be, its output today is identical to what it was before the crisis: Basic static tables, charts, and maps designed after getting requests from reporters and editors. El País graphics desk is a service department, not a fully proactive one. As a consequence, it's at the mercy of the whims of the rest of the newsroom, something that Ondarra acknowledges, albeit implicitly. Two of my sources explained: “If the graphics desk had been at the cutting edge a few years ago, it would have been much more protected from the crisis. Its value would have been visible.”

The challenge El País faces is simple and tricky at the same time: Its graphics desk is led by someone who is oblivious of Web visualization technology, motion graphics, or even new tools for print graphics, who is unaware of the fact that history is overtaking him, and who, therefore, can do little to remedy the situation (2). He's completely focused on ‘feeding the goat,’ as the title of a recent report about the news industry says, and claims to be incapable of making his team steer toward a more productive and satisfying path. Not to mention his lack of professional decorum when he threatens and insults colleagues as a reply to fair remarks (3).

This is, at the end, a leadership problem.

Years ago, the small group that used to produce multimedia infographics for El País (and that was, in practice, independent from the graphics desk) left the organization. Since then, the scarce interactive visualizations (1, 2) at elpais.com began to be created not by the graphics department, but by the technology and research team, which has its own designers. This state of affairs has led to a huge waste of talent —El País graphics desk has plenty of it— in what used to be, two decades ago, one of the leading names in news infographics. It also illustrates how bad newspapers can be at identifying ineffective management within their ranks.

(1) I'm talking about general news, not about sports newspapers.
(2) The scenario is made even more paradoxical by the fact that Ondarra's deputy, Rodrigo Silva, was trained as a computer scientist, so it's reasonable to assume that he's familiar with some of these technologies.
(3) Don't try to use Twitter's translate feature to read that. It's hard to understand even if you are a native Spanish speaker. In any case, you know how much I love bullies, don't you?

Tuesday, June 24, 2014

The stuff happiness is made of: Original drawings by NGM's Fernando Baptista

A couple of weeks ago, I visited Washington DC briefly, and saw NGM's Juan Velasco, Xaquín G.V. and Fernando Baptista. Fernando gave me a wonderful gift: Some original sketches he made for his now famous Sagrada Familia infographic, which I showcased in The Functional Art. At Xaquin's apartment I also saw an original draft of Fernando's lioness graphic. Watch this amazing video to learn more about that project.


Thursday, June 19, 2014

Ethical Infographics: In data visualization, journalism meets engineering

I have an article (PDF) in the Spring issue of The IRE Journal, the magazine of Investigative Reporters and Editors Inc. It's titled 'Ethical Infographics: In data visualization, journalism meets engineering,' so I guess that you can foresee where I'm going with it. Before you read it, you may want to learn about some of my assumptions. I've copied this from a book chapter that I recently wrote (it hasn't been published yet):

1. Morally good actions are those that increase the well-being of as many people as possible, either directly or indirectly (this approach to normative ethics is called utilitarianism.)

2. Accurate and useful information which is presented in compelling ways is likely to increase awareness of relevant matters. Good visualizations also enhance understanding and knowledge.

3. Good understanding of relevant matters can inform future decisions, so it is likely to increase the chances of people conducting fruitful, happy lives. In other words, understanding can have a positive influence on people's well-being.

4. Therefore, it is the obligation of the designer of visualizations to create graphics that (a) are intended to bring attention to relevant matters, (b) are based on a thorough analysis of the information, (c) are built in ways that enable comprehension. To do this, designers ought to base their decisions on scientific evidence or, in case that this is not available, on judgments derived from their experience and personal observations.

Read the IRE article below or download it here. By the way, its last line is a joke.

Doing vector infographics for ProPublica

I enjoy data and visualization but deep in my heart I'm still an old-fashioned infographics person. There's little I enjoy more than reading and designing explanation graphics (side note: You should read this interview with Adolfo Arranz.) Therefore, when I was offered the opportunity to draw some simple vector illustrations for ProPublica, where I'm conducting a research project, I seized it. You can see the results here. Sisi Wei created the interactive map and the layout is by David Sleight and Gerald Arthur. It was fun, even if the topic is grim and worrying, if you're a parent.

Tuesday, June 17, 2014

A book about the intersection between science, art, computation, and visualization

Just a quick note to bring attention to Arthur I. Miller's new book, Colliding Worlds: How Cutting-Edge Science Is Redefining Contemporary Art, which has just been released. I haven't read it all yet but, based on what I've seen so far, it's a nice overview of the relationship between art, science, and computation, beginning with Picasso, and ending with some of the craziest stuff you regularly see in websites like Creative Applications or Colossal.

The book is very well written and it includes at least three chapters that deal with data art and visualization, showcasing some of the usual suspects —Aaron Koblin, for instance. I got the book just because of these pages, but ended up being enthralled by the rest of the content. Don't miss it.

Thursday, June 12, 2014

More small multiple goodness

Another example of small multiple goodness, courtesy of ProPublica's Eric Sagara and Charles Ornstein. I've seen this in development for the past few days, and I could barely wait to write about it. It's simple and elegant. Don't forget to read the story, too.


A smart interactive chart —that lacks a y-axis!

The Pew Research Center has released a new report about political polarization in the U.S. The report includes an embeddable chart which combines animation and interaction quite effectively. NYT's The Upshot showcases it in its latest story. The challenge? It's hard to know what the chart is showing, as it isn't properly labeled —and it lacks a y-axis!  I get that this a distribution/density plot or histogram, but it's not clear at all. How am I supposed to understand the data that the curves' height is representing? Is this kind of minimalism becoming trendy? Or am I missing anything here?

UPDATE: See these other charts, also by Pew. They do have y-axes.

UPDATE 2: The Pew folks have explained why they didn't include the y-axis. I get their point, but I'm still unconvinced. Both the area (areas, actually, as these are intervals) under the curve and the height of the segments matter. If you believe that most people may find histograms confusing, you can include a short explanation of how to read them. And if you fear that showing some very valuable context data —as they do in the little charts at the bottom of the post— "would overwhelm the visual impact of the changing distribution," why don't you make those numbers and tick marks visible just on demand? A little toggle (visible-invisible) button could work.

Wednesday, June 4, 2014

Soccer, Math, and small multiples


NYT's The Upshot has published an intriguing piece about the upcoming World Cup this morning. According to Kevin Quealy and Gregor Aisch, authors of the charts and the accompanying article, the selection method used by FIFA to build the World Cup groups is unfair, so they propose an alternative based on the work of Julien Guyon, a French mathematician. Guyon has explained his calculations here.

The charts at the bottom of the page are probability distributions based on thousands of simulations that follow either FIFA's method (light blue curves) or Guyon's one (dark blue curves.) The Y-axis is probability (%) and the X-axis represents levels of difficulty. FIFA's method leads to a much larger variance than Guyon's: the light blue curves are flatter and wider than the dark ones. This means that a mediocre team can easily find itself in a tough group, and a strong team can end up surrounded by shaky rivals. You can clearly see this if you play with the draw simulator on top of the page. Click several times and you'll notice that very strong and very weak groups are much more likely to appear using FIFA's method.

The Upshot deserves praise for several reasons: (a) the terrific integration of copy, simulator, and graphics; (b) the beautiful small multiple array of probability distribution charts (I still believe that this kind of graphic is too unusual in the news;) (c) the fact that The New York Times is not afraid of challenging its readers with such a geeky discussion. This could be a reminder for other news organizations: Readers aren't dumb.