Saturday, January 14, 2017

When doing data reporting, look at the raw numbers, not just at percentages —and write an accurate headline

A headline in The New York Times today reads “In the Shopping Cart of a Food Stamp Household: Lots of Soda.” Is it true?

The story itself provides hints that the headline is misleading, and likely to damage the image of the SNAP program and its beneficiaries. This is dangerous, considering that many readers look at clickbaity headlines, like the NYTimes one, but don't read stories. SNAP households aren't different than the rest of households. Most Americans buy and drink way too much soda and, as a result, obesity and Type II diabetes have reached epidemic levels.

The story says that households that receive food stamps spend 9.3% of their grocery budget on soft drinks, while families in general spend 7.1%. This is one of those cases when reporting just percentages, and not taking into account other variables, such as total spending in groceries, sounds fishy.

Here's why: Never focus just on the data in front of your eyes, or on derived variables, like percentages or rates. Think about the raw numbers behind them. Say that you are comparing two families of four people each, a SNAP one and a non-SNAP one. They spend, respectively, $100 and $200 on groceries —my guess is that SNAP recipients are poorer than Americans in general.

The SNAP household would be spending $9.3 a week on soda, or $2.3 per person; the non-SNAP one —$14,2, or $3.6 per person! Who's buying “lots of soda” now?

To add insult to injury, the story says about the report from the Department of Agriculture: “One limitation of the report was that it could not always distinguish when SNAP households used their benefits, other money or a combination of the two to pay for transactions.” Michele Simon, a lawyer, is quoted in the story: “This is the first time we’ve had confirmation that this massive taxpayer program is promoting all the wrong kinds of foods.” Well, that isn't true. If the story is a good reflection of the U.S.D.A. report, there isn't enough basis to claim that tons of SNAP money is being used to pay for sugary drinks.

This doesn't render the story completely wrong —I couldn't find the original source,— but it does challenge its headline, which singles out SNAP families. The reactionary insurgency taking over government in a few days will likely use this as ammunition to attack safety net programs, as they know that most people share headlines through Facebook and Twitter, but don't spend time reading long and nuanced stories.

(One last note: newspaper reporters often don't write the headlines for their own stories; editors do.)

UPDATE: This post on Facebook by University of Minnesota's Joe Soss provides much more information, and a link to the report itself. I'll take a look at it, as it looks like that the story is much worse than it seemed to me at first.

Tuesday, January 10, 2017

In visualization, white space is your friend

One point I make in the introduction to graphic design classes I've taught in the past is that white space isn't empty space. Empty space is needlessly unused space; white space has meaning. There's even a good introduction to graphic design titled White Space is Not Your Enemy.

This morning's New York Times has a good example of that principle, courtesy of Margot Sanger-Katz and Quoctrung Bui. The top-right quadrant is occupied by a large majority of the American public —to the right and to the left,— and experts in gun violence. The bottom, a vast, blank ocean, probably belongs to Republican legislators, NRA lobbyists, and the minority President-elect.

(Click the image to expand.)


Thursday, January 5, 2017

Data and visualization conferences: Tapestry, NICAR, Malofiej, DH+DJ

As long as I'm finishing that-project-that-should-not-be-named, I've been able to register for several conferences this year. The first two are Tapestry and IRE+NICAR. If you're going to one, you should go to the other, as they take place in consecutive days, and in cities that are very close to each other, St. Augustine and Jacksonville. Tapestry is on March 1, and NICAR is between March 2 and 5.

I've only attended these conferences once, and I really enjoyed them. Tapestry is a small (around 100 people) gathering of data visualization and infographics nerds. IRE+NICAR is a large conference of investigative reporters, data journalists, visualization designers, and programmers. You need to apply to attend Tapestry, but I heard that there's still room, so do it ASAP.

Next, I'm going to the Malofiej International Infographics Summit, in Pamplona, Spain. The 2017 edition, on March 26-31, is the 25th anniversary, so I'm not skipping this one. I began attending Malofiej in 2002, and I've only missed it a couple of times since then.

Finally, at the University of Miami we're doing the second Digital Humanities + Data Journalism Symposium on September 14-16. The first edition was a success, so we've decided that the 2017 edition is going to be bigger (first year was 120+ people; we want 200+ this time) and more affordable ($99 for 2.5 days of awesomeness.)

We have 8 confirmed speakers so far —see them below— and many more will be added to the list soon. Registration will open in early February, but save the dates, as this is going to happen for sure. You can take a look at the draft of the website here (this isn't the final URL.)

Thursday, December 29, 2016

A few graphs from Nerd Journalism

Last night I finished writing the draft of Nerd Journalism, my PhD dissertation. Now I need to spend the next 30 days or so editing the damn thing. I want to submit it at the end of January, and I'll try to defend it over the Summer.

I'll release the entire book as a free PDF in www.nerdjournalism.com, probably after July 2017. I'll also publish around 30 video interviews, along with their transcripts, and the data collected from the Malofiej International Infographics awards, which covers around 20 years of the competition. Besides those, my other source was observations conducted at ProPublica and at UnivisiĆ³n News online.

The goal of this project is to explore the changes news information graphics have experienced in the past two decades. One of them is the kind of graphics visual journalists favor. In the past, they were mostly pictorial/figurative explanations and descriptions, and their central elements used to be illustrations, photographs, locator maps, etc. In the present, abstract representations of data are much more common.

With the help of some students I put together a spreadsheet of nearly 2,000 winners of the Malofiej awards, quantifying their country of origin, the publication where they appeared, and also the elements they included: Illustrations, graphs, charts, maps of different kinds, etc. Then, we also identified which of those elements had a more dominant place in the composition.

Let me show you a few graphs, even if I still need to verify and copy-edit their content.

The first one shows the dominance of countries like the U.S. and Spain in the Malofiej awards, but also the increasing presence of Asian publications thanks —I believe— to the South China Morning Post, which in recent years has been published a lot of excellent work. Click on the image to expand:



The graph below is about the most dominant element on each winning entry. You'll notice that I grouped graphic forms into two large categories, “pictorial” — figurative representations of physical entities, such as illustrated explanations and descriptions, photographs, locator maps, etc.,— and “abstract” —graphs, charts, data maps, etc. In recent years, abstract graphics are the main element in nearly half of the projects that won awards at Malofiej:


On this one I split up the abstract-pictorial data by region. “Non-Latin America” means mainly the United States. The differences, in comparison to the competition average, are stark:


This one shows the same in the countries that have won more awards, the United States, Spain, Argentina, Brazil, Germany, and the United Kingdom; the first graph on each country shows total counts, and the second percentages:


As long as one of the variables in my spreadsheet is “publication,” it was possible to show these trends also on specific organizations. See The New York Times, for instance:


More coming in the Summer of 2017...

Friday, December 16, 2016

Our brand new MOOC is here! Data Exploration and Storytelling

It's been a while since I last taught a Massive Open Online Course. My friend Heather Krause —a statistician and data journalist— and I have created a brand new one, offered through UT's Knight Center, and titled “Data Exploration and Storytelling: Finding Stories in Data with Exploratory Analysis and Visualization.”  It begins on January 16, 2017.

You can sign up for free through this link.

I guess that the long title and subtitle explain what the course is about: It doesn't just cover visualization, but also data exploration and understanding, and how to write stories about your findings. The materials include several chapters from The Truthful Art and The Functional Art, besides many new video lectures and tutorials about software tools.

I recorded a short intro video:

Wednesday, December 14, 2016

The Financial Times chooses "The Truthful Art" as one of the best books for data geeks

This is truly an honor: The Financial Times has just chosen The Truthful Art as one of the best books “for data geeks” in 2016. Here's their tweet, and here's the article, written by Alan Smith, their data visualization editor. I am very familiar with all the others, except Where the Animals Go, which I've just ordered. You should take a look at them, as well.


Saturday, December 3, 2016

Datatrump

I needed a 10 minute break from that-project-that-must-not-be-named, so I decided to use Robert Grant's DrawMyData to design a Trump chartoon. See it below, along with the correlation coefficient and other summary statistics. Feel free to use it. If you want the CSV with the data, can download it here.

(A while ago I did a datasaurus; I got tired of showing Anscombe's quartet when discussing the importance of visualization for data exploration.)


Thursday, December 1, 2016

The rise of data visualization in news graphics

The next two months are going to be quite busy. I need to finish writing my PhD dissertation, titled Nerd Journalism. I'll release it as a free e-book through this website probably in the Summer of 2017.

One of the elements of my analysis of how journalistic visualization and infographics have changed in the past decade and a half is the kind of projects that win awards in the Malofiej Infographics competition, the most popular one among news graphics creators.

The graph below shows the percentage of the nearly 2,000 projects recognized by Malofiej juries that had a pictorial graphic (a visual explanation, a photograph, etc.) as central or main element, versus those that emphasize some sort of abstract representation (data graphs, charts, numerical tables, etc.) A couple of important notes: I'm still cleaning up the data, so take this with a grain of salt. Also, this is just a quick summary; other graphics that I'll design for the book will break this down further by type of graphic —graphs, data maps, explanatory illustrations, locator maps, etc.— by country of origin, by publication, etc.

Finally, the last two editions of Malofiej I got data from, the 22nd and the 23rd, are missing in this graph, but the trend continues: Abstract graphics —mostly data visualizations— keep growing, and constitute half of the projects that get awards:




Wednesday, November 30, 2016

New visualization: Rhythm of Food

The second project coming out of my collaboration with Google News Lab has just been published. It was designed by Moritz Stefaner, and it's titled Rhythm of Food. As in the case of the first visualization in this series, Worldpotus, I can't take credit for anything other than talking to Moritz and Google News Lab's Simon Rogers every week or two to offer some feedback here and there.

(Simon has written about this visualization.)

Rhythm of Food reveals how Google searches for food have varied since 2004 through a series of fun circular plots. Some of you may contend that more traditional graphs —time-series line graphs?— may have been more appropriate, but I'd disagree; just consider: a) the goal here isn't accuracy, but to reveal overall, general patterns, b) the circular plots fit well on mobile screens, c) they are visually alluring, d) they look like food on a dish —OK, perhaps not a strong reason,— e) you can see the data as line graphs by clicking on the + symbol of each chart. Here:




The point is that, as in the case of Worldpotus and other projects we'll release in the future —next one will likely be one by Jan Willem Tulp in January,— we're trying to let people see data in multiple ways: As eye-catching and sometimes unorthodox charts first, and as more bread-and-butter graphs or tables if they want further detail.

We're also trying to combine the narrative/explanatory with the exploratory. This project first describes the data, highlights some interesting cases —annotating peaks— and then it lets you explore at will.

Here are some early exploratory sketches; enjoy:




Fake data, fake causation, fake news

The headline I'm showing here is an example of how fake news websites bullshit people. In the case of this article from Glenn Beck's The Blaze, by making up a causal link between two consecutive events, which is one of the variants of the famous “correlation does not imply causation” mantra. I talk about old tricks like this in The Truthful Art.

There is simply no evidence to support the claim that protests to raise the minimum wage led McDonald's to launch their new self-serving machines. A column The Blaze links, by Ed Rensi, proves nothing; it's just old, plain confirmation bias. Automation would likely have happened regardless —as McDonald's itself acknowledged.

Allow me an aside: There's an ongoing discussion these days about the role that “fake news” played in the election. Focusing just on scrappy websites put together by a young fellow living in, say, Georgia —the country, not the state,— or on how Russia may have helped spreading lies is wrong.

U.S. organizations like The Blaze, Infowars, and radio and TV shows like Rush Limbaugh's or Sean Hannity's are fake news as well, and they are far more influential. If you think that calling them “fake” is an exaggeration, you haven't listened to Rush Limbaugh enough. Here's a sample. For your entertainment, begin at 6:45, when he describes what “true Americans” are, in comparison to those “other” people.

Sunday, November 27, 2016

About scheduling and productivity

A student has just asked me how I organize my time. I get this question quite often, so let me share some tips, in case you're interested.

The key to being reasonably productive is to discover what kind of person you are. We are born with certain personality traits that we can't really modify much. Some people are able do three or four different things a day. I tried. I failed. I am a one-main-task-a-day person.

This means that, ideally, I try to assign one core activity to each day of the week. If I'm writing, I mostly just write. If I'm doing university-related things —lecturing, grading, preparing for classes, meetings— I mostly do that.

My current week looks like this:
MONDAY: Preparing for classes or writing
TUESDAY: Teaching and meetings
WEDNESDAY: Writing
THURSDAY: Teaching and meetings
FRIDAY: Consulting/traveling or writing
SATURDAY: Freelancing or writing
SUNDAY: Family day
I usually begin work at 9 a.m. and stop either at 3 p.m. or at 5 p.m., depending on the day. Every hour or two —this is flexible, I must admit— I allow myself some goofing around in social media. This is when you may see me tweeting.

If there's an activity that needs to fit into one specific day —say a phone call with a colleague, a meeting, or filling out some paperwork on Wednesday, when I should be writing— I treat it as a pause, like if I were on social media.

If this secondary activity takes longer than 15 minutes, I make up the extra time extending my work day. I do the same if I get carried away on Twitter, something that recently has happened too often.

I try to reserve at least two or three hours a day to reading. I read two print newspapers and then my RSS and Twitter feeds during breakfast. This usually takes one hour or a bit more. I read books in the late afternoon or evening.

Thursday, November 17, 2016

Search data can be really helpful, but always think carefully when you use it

(Full disclosure: I'm a consultant-art director for Google in this series of visualizations.)

The Washington Post's Christopher Ingraham says that the chart below is one of “the most depressing” he's seen this year. He's written an article about it.


I really appreciate Ingraham's work, but I think that, in this specific case, he's reading a bit too much into that Google Trends chart —or not showing a big enough picture. He says:
Google's data doesn't indicate peoples' sentiment toward the Klan when they search for it — whether they view it positively or negatively. It does, however, illustrate how the Klan is now seen as part of current events, rather than a relic of the past. [...] In 2006, for example, people who searched for the Ku Klux Klan were also searching primarily for topics related to history and racism, according to Google's data, suggesting attempts to situate the clan within the country's history. In the past year, however, people searching for the Klan were also looking for information on Trump, Hillary Clinton and African Americans in general, according to Google.
Well, yes, of course. People use Google to look for information. That could actually be the headline, but it wouldn't be that catchy, wouldn't it? See: “People are using Google to inform themselves about current events —like Stephen K. Bannon's appointment as White House strategist.” See what happens if we search for terms like “fascism”, “autocrat”, or “white supremacism” in Google Trends —related searches, which Ingraham mentions, and U.S.-only searches are very similar:




Sunday, November 13, 2016

More bad data: the number of criminal undocumented immigrants is 180k, not 3 million

We're just a few days after the election and the bad data tsunami I announced is catching momentum. In this interview with CBS after winning, Trump says that he's planning to deport up to 3 million undocumented immigrants “with criminal records”.

In case you feel tempted to normalize or spin this somehow, consider that the real number of undocumented immigrants with criminal records is less than 180,000, something that Trump himself knows very well. This may mean that the definition of “criminal” his administration will use right after inauguration will be dangerously wide and vague. It also means that, contrary to what some prominent journalists who are victims of baseless wishful thinking say, Trump remains committed to his most authoritarian ideas.

In a time of great need, donate and subscribe. And make it public

A few days back I announced a donation of $1,000 to ProPublica. As I explained there, I'd just read Peter Singer's wonderful Ethics in the Real World, where he suggests we must be more outspoken about the donations we make. Not to boast about them, but to appeal to those who can afford doing the same, but still give very little —or nothing. There's evidence from the psychology literature showing that public announcements of donations increase donations.

A list of recent donations I've made is at the bottom of this post. I encourage you make yours public, too.

My list includes ongoing subscriptions to news publications that do investigative reporting or aggressive commentary, things that we'll desperately need during an administration that won't be constrained by any serious checks or balances; this isn't about partisanship, but about preparing ourselves to preserve democracy —and no, I don't think I'm being hyperbolic at all:

• One-time gift to ProPublica: $1,000
• One-time gift to the Against Malaria Foundation: $500
• Monthly gift to ACLU: $50
• Monthly gift to Planned Parenthood: $40
• (I'm still looking into other organizations to donate to, using Singer's website as guidance.)

 Magazines: The Weekly Standard, New Yorker, Mother Jones, The Atlantic, Rolling Stone
 Newspapers: The Miami Herald, The New York Times
 Online subscriptions: The Washington Post, The New Tropic, eldiario.es, The Wall Street Journal

Friday, November 11, 2016

Now more than ever: Call out data and visualization bullshit

After the results of the election, we're going to witness a shit storm of bad data and misleading graphics, coming from all directions. If you follow this blog, if you've read my books, you already know where I stand: Every single one of us has a responsibility to call out bullshit, and to fight against it with reason and evidence, regardless of what agenda that bullshit is trying to push. We mustn't sit on the fence and witness the deluge of misinformation in ironic silence. That's irresponsible.

This will require our keeping an eye on all sorts of media publications to the left and to the right, even —or particularly— those that we deeply dislike.

Here's an example: I visited the #buildTheWall hashtag on Twitter, and saw that people were spreading this story and video by Infowars. Read the story and watch the video to understand how crooks like Alex Jones are unashamedly using every trick in the data trickster textbook to manipulate their audience: First, confounding correlation and causation —sanctuary cities mean more Democratic vote, when it's likely the other way around: liberal cities are more protective of undocumented immigrants;— second, talking about a “sea of red” on a map, when Jones and his colleagues know perfectly well that geographical area isn't proportional to population density. The blue areas have a much larger population than the red ones.

You may think that this effort is pointless, that you'll just be preaching to the choir. I don't think so, for two reasons: Good information does change minds, as Tom Stafford explains in his For Argument's Sake; also, if you promote your critiques using the same Twitter hashtags and social media channels that crooks use to spread their bullshit, or if you post them in their websites as a comment, a part of their audience will see them. An overwhelming majority of them will dismiss you, and even attack you, but a tiny portion will have a small seed of doubt planted in their brains. A single seed does nothing, but thousands will. Some will bloom.

So get ready. Join the movement. Help build the wall, the right kind of wall: a wall made of truthful information. Let's keep bullshit at bay.

Saturday, November 5, 2016

I've just donated $1,000 to ProPublica. Why we all should do our share

A while back I announced my quixotic campaign to convince you all that paying for your favorite journalism is a civic responsibility (hashtag: #payForJournalism.) I've just made a promise real, and donated $1,000 to ProPublica. Many news organizations do excellent reporting. Without it, much wrongdoing would go unnoticed. Just see ProPublica's Electionland, or what your local newspapers, radio, and TV stations regularly uncover.

Not supporting them with your money, at least with a simple subscription —if you know that you benefit from what they publish, and if you can afford it— is ethically dubious, to say the least. A weekly newspaper subscription costs less than a large latte at Starbucks.

I know, “news publications screw up all the time, reporters are far from perfect, and, hey, these companies decided to put their stuff on the Internet for free! I'm not doing anything bad by being a free rider.” Yada yada yada. Lame excuses. You know that these organizations aren't getting nearly enough money from ads. You read what they publish, you learn useful stuff from it and, as a result, your life gets better. Any other consideration is a caveat, a footnote, or an example of Trumpian rationalization.

You may think that making a donation public is gratuitous boasting. That isn't the case. According to Peter Singer's wonderful Ethics in the Real World, speaking openly about donations helps convince other people to donate. So, yes, if you are a pure free rider when it comes to benefiting from good journalism, my explicit goal with this post is not to boast, but to embarrass you.

Here's Singer (full disclosure: I also donate to charities, which is what Singer focuses on mostly in this chapter):



By the way, I recommend you read Rolling Blackouts, a graphic novel that illustrates how reporters truly think and work. Some images:




Wednesday, October 26, 2016

Y'all, youse, and you guys must get this book right away

Unless you live under a rock, you're likely aware that the dialect quiz that Josh Katz, Wilson Andrews, and Eric Buth built in 2013 is the most viewed page ever in the history of NYTimes.com. Katz has just published a book, Speaking American: How Ya'll, Youse, and You Guys Talk, that highlights some of the most interesting, funny, and entertaining facts. I received my copy yesterday and I've been browsing it for several hours. It's making me really happy.

Besides tons of maps, the book includes advice on how to pretend you're from different cities and regions. I'm applying these teachings, beginning with Wisconsin. I'm now pronouncing the state name Wi-scon-sin, rather than Wis-con-sin...

Some of my favorite pages:





Tuesday, October 18, 2016

Doing visualizations with Google News Lab

Here's the exciting ongoing project I mentioned before today: For the past few months I've been working with Google News Labs's Simon Rogers and a bunch of extremely talented and celebrated designers —Giorgia Lupi, Moritz Stefaner, Jan Willem Tulp, and others that will be announced in the future— to produce a series of ambitious visualizations based on Google search data. The first project in that series, WorldPotus, has just been launched.

Simon has written a very good post describing this initiative, and Wired is covering WorldPotus specifically. I won't repeat what's said in those articles. I'd just like to add that I'm being credited as an “art director” but, as you may guess, people of Giorgia's, Moritz's and Jan Willem's caliber don't need much art direction. My role as a consultant is to offer some initial ideas, draw visual mockups —that often get discarded!— and then give constant feedback during the development. It's a lot of fun, and I'm learning a lot from it.

(I'm also doing a monthly series of “office hours” with Google.)


A profile about visualization and data journalism

In the past few months I've been consulting for Google and Microsoft in two big projects. I'll announce the Google one —which is really exciting— in the next post. For now, here's a profile written for Microsoft by Thomas Kohnstamm. Besides a discussion about the present and future of visualization and data in journalism, the story includes an announcement of a series of lectures that Microsoft will release soon, plus a short PowerBI tie-in (this is Microsoft, after all.) Side note: Thomas is the author of a hilarious memoir, Do Travel Writers Go to Hell?. You should read it.

Here's the main photograph of the story. I look like an Ayn Rand hero, which is quite embarrassing, as I think that Rand was a dreadful thinker and an even worse writer:



And here's a photo of the shelves in my office at UM's School of Communication —the other half of my visualization-related books are in my home office:


Sunday, October 16, 2016

Pay for your news visualizations and infographics

A few days ago I launched a quixotic Twitter campaign (#payForJournalism) to convince my followers to subscribe to their favorite news organizations, whether print or online. I believe that paying for the journalism you consume regularly is a civic duty—if you can afford it, of course. If you think that good, honest reporting is essential for democracy, not endorsing it with your money makes you a free rider. It's ethically wrong.

I'm now a subscriber to The New York Times, the Miami Herald, the Washington Post, the Wall Street Journal, the New Tropic, the New Yorker, the Weekly Standard, the Atlantic, and eldiario.es. I'll likely make a donation to ProPublica at the end of this year.

No, I don't have time to read all that. That's not the point. I don't pay for the privilege of reading or seeing everything that journalists at those organizations write or design. I pay because I know that the work they do is fundamental for a healthy public conversation. Have you seen Spotlight? You should. A single story like that is worth the price of a year's subscription. Without journalists, much wrongdoing would go unnoticed, many stories would be untold, much science would remain unexplained. We can't afford that.

Paying for your favorite journalism also means supporting the people who produce the wonderful data journalism, visualizations, and infographics you enjoy in social media every day. The photograph on the right is from today's Miami Herald. Kara Dapena is the author of those graphics. I'm proud to support organizations that pay the salaries of people like her.