Saturday, June 15, 2013

Simon Rogers' book: Facts are sacred (and stubborn) even today

"Facts are stubborn things," claimed John Adams while defending the soldiers at the Boston Massacre trial, "and whatever may be our wishes, our inclinations, or the dictates of our passion, they cannot alter the state of facts and evidence." Had he lived in our era —one of extreme relativism, spin, and denial— Adams would have been called a naïve idealist, but his words are as current, elegant, and true as they've always been, no matter how much you're inclined to misinterpret what Samuel Arbesman has beautifully explained.

I was reminded of Adams when I received a copy of Simon RogersFacts are Sacred, which I'm reading over the weekend. Rogers used to be the data editor at The Guardian; he has recently moved to the U.S to work for Twitter. His book opens with this —also very bold— quotation:  "Comment is free, but facts are sacred." It was written in 1921 by C.P. Scott, editor and owner of the venerable British newspaper at the time, and it has become one of the most cherished and mocked slogans in journalism.

Anyway, the book: If you are interested in how some modern data-driven journalism, visualization, and infographics teams work, Facts are Sacred seems to be a good place to start. It's not a textbook; you will not learn all-encompassing theories or how to use software tools here. Rather, it's a this-is-how-we-do-it-at-The-Guardian kind of book that can help you anticipate the challenges, tradeoffs, and limitations you may face if you decide to pursue a career in data reporting, editing, or graphics.

There's a lot to like in Facts are Sacred. To begin with, the detailed explanation of The Guardian's coverage of the 2011 riots in the U.K. This chapter will give you an idea of the huge effort that the paper did, gathering large amounts of data, interviewing nearly 300 rioters, and bringing experts and academics in. The fact that The Guardian folks didn't just report on these events, but that they also made all data available to readers (something they do constantly, hallowed be their names) makes their work even more praiseworthy.

Second, many of its pages are very quotable:
  • "You can become a top coder if you want. But the bigger task is to think about the data like a journalist, rather than an analyst. What's interesting about these numbers? What's new? What would happen if mashed it up with something else? Answering those questions is more important than anything else."
  • "Data journalism is not graphics and visualisations. It's about telling the story in the best way possible. Sometimes that will be a visualisation or a map. But sometimes it's a news story. Sometimes, just publishing the number is enough."
  • "If data journalism is about anything, it's the flexibility to search for new ways of storytelling. And more and more reporters are realizing that. Suddenly we have company —and competition. So being a data journalist is not longer unusual. It's just journalism."
Third, Rogers gives room to other voices. I particularly liked the article by Jonathan Gray, of the Open Knowledge Foundation:
"Data can be an immensely powerful asset, if used in the right way. But as users and advocates of this potent and intoxicating stuff we should strive to keep our expectations of it proportional to the opportunity it represents. We should strive to cultivate a critical literacy with respect to our subject matter. While we can't expect to acquire the acumen or fluency of an experienced statistician or veteran investigative reporter overnight, we can at least try to keep various data-driven myths from the door. To that end, here are a few reminders for lovers of data:
  • Data is not a force unto itself (...)
  • Data is not a perfect reflection of the world (...)
  • Data does not speak for itself (...)
  • Data is not power (...)
  • Interpreting data is not easy."
Do I have misgivings? I do. Many infographics showcased in the book are not as great as they should. As The Guardian is such an influential institution, I believe that its journalists and designers have an obligation to be very careful with what they publish, as beginners and professionals in small organizations tend to imitate what the big names in the industry do. Take a look at this chart (click to enlarge):


As I've tried to explain in the past (and this is not a matter of personal taste, mind you) bubbles may be effective when your goal is to show the big picture, general trends and patterns in the data. But that doesn't seem to be the purpose of this graphic. The goal here is to visualize the jaw-dropping differences between healthcare systems. Bubbles are inappropriate, then, precisely because they minimize those differences. They are misleading, and they undermine the fundamental point the graphic is trying to make. See per capita spending: Can you tell —without reading the figures; if you need to read the figures better design a table— that the U.S. bubble is double the size than the U.K.'s? You can't.

There are many (too many) graphics in this book that put the cart before the horse: They are fun before they are useful and accurate. Structure, functionality, and visual appeal are all important components of any information graphic, but I'd argue that beauty cannot make up for choosing the wrong variables, for lack of proper structure, or for poor functionality. In some graphics included in Facts are Sacred information is encoded in a map when it would have been more accessible if displayed in some sort of graph, or it's transformed into circles instead of into your old, perhaps-not-that-exciting —but ultimately bulletproof,— bars or lines.

On the other hand, there's plenty of quite nice smaller pieces that make a very clear point. As I grow older, I tend to appreciate these little things much more than their poster-size counterparts:


By the way, I'm reading an actual copy of the book, not its e-book version. The e-book is sold through Amazon.com for a very good price, but it seems to come without the images, which is bizarre, as the text doesn't make a lot of sense by itself. So it'd be better to get the real thing from one of the vendors listed here.