Wednesday, April 26, 2017

Datasketching culture

Our new Google News Lab visualization project is out. This time, it's actually two projects in one: Beautiful In English, by Nadieh Bremer, and Adventure is Out There!, by Shirley Wu. As usual, I provided some art direction.

You may be familiar with Nadieh's and Shirley's Datasketch.es. Each month they pick a topic (nature, music, etc.) and each of them creates an experimental and quirky artistic visualization separately. When I learned about it, I thought that it was a marvelous idea, so I asked them whether they'd be interested in doing something with Google search data. They accepted and they chose Culture as their theme.

Nadieh's visualization shows the most common words translated into English from other languages, and Shirley's focuses on popular travel destinations. You can read all details about both visualizations in the writeup they've put together. It includes drafts and early prototypes. Don't miss it.

If you wish to learn more about our ongoing series of visualizations —see them all so far here; more coming soon,— Fast Company's Co.Design has just published a nice article about it. It describes what we're trying to do quite well: As an art director, I don't really want to direct very strictly. Instead, I prefer to give the designers we work with some freedom. We are fond of graphics that are compelling, informative, and fun, but we also want designers to let their imaginations fly a bit, even if the end result is wackier than usual. We don't mind. Some projects will succeed at balancing creativity with understandability, others maybe not so much, but that's exactly the point. Some novel ideas will be discarded, but others may stick, and eventually expand our shared graphics vocabulary.

(Google's Simon Rogers —the brains behind this initiative— has also shared some thoughts about making visualization friendlier.)

Friday, April 14, 2017

Multiple graphics, multiple possible insights

One of my favorite mantras about visualization is that a single chart, graph, or map is unlikely to show everything you need to know about a story. It's through the combination of several adjacent graphics that compelling insights often arise.

NPR Visuals's latest project, “Maps Show A Dramatic Rise In Health Insurance Coverage Under ACA,” is a simple and lovely example of that principle: the choropleth map reveals geographic patterns; the histogram above it displays the decrease in the number of counties with high percentages of uninsured people.

The interactive visualization is followed by a story, some static graphs, and small multiple array of maps. They also are worth your attention.

Monday, March 20, 2017

Visualization book club

Tableau Public's Sophie Sparkes has just launched a visualization book club (Twitter hashtag is #VizBookClub.) She's planning to read a graphics book every two months, and she invites everyone to join her in a conversation about its contents.

The first book she's chosen is The Truthful Art, which is a huge honor. Here are the topics she'd like to discuss:


Thanks for doing this, Sophie. Looking forward to reading people's thoughts.

Wednesday, March 15, 2017

An update on the Trumpery lecture series

After I announced my Trumpery lecture tour, I received requests from more than 30 cities. I've been organizing my schedule this week; below you can see a screenshot of the Excel spreadsheet I'm using. Some cities and dates are already confirmed.

As mentioned in the original first post, the talks will be free and open to anyone. I'll provide more information about how to sign up as soon as I can.

A few other updates:

• I've decided to drop the word “visual” from the title. I'll certainly talk about bad graphics, but they won't be the only focus. The current working title is Trumpery: How to fight against fake data, fake facts, and fake visualizations —from the left and from the right. The subtitle may change a bit. I'm still debating it.

• There'll be a dedicated website for the tour, www.trumperytour.com.

• I'll try not to mention any politician by name in the talk, which will be, as the subtitle indicates, bipartisan. However, the left-right split in the examples won't be 50-50. Nowadays a disproportionate amount of data bullshit comes from the far right.

• Trumpery may be the title of a book, too.

More news coming soon.


Monday, March 13, 2017

The new Digital Humanities and Data Journalism Symposium

Last year I helped organize the first Digital Humanities + Data Journalism Symposium at the University of Miami. It was a success, bringing together two communities that have a lot to learn from each other, so we decided to do it again this year.

Our website is updated and registration is open. If you work in visualization, infographics, data journalism etc., I think that most of the presenters will be very familiar to you. We'll cover a variety of topics, from data quality to visualization, from fake news to the ethical challenges of Artificial Intelligence. There will also be plenty of time to mingle and make new friends.

The dates are September 14-16 (Thursday-Saturday until noon.) This year we're aiming to make it bigger than in 2016, and also more affordable. We've reduced the registration rate to $99. Also, you can submit a proposal for a lightning talk or to show your academic or professional work in a booth or desk.

Space is quite limited so if the symposium sounds interesting to you, I'd register right away and book your flights and hotel. We don't have an official hotel, but there are many options at a reasonable distance from the Newman Alumni Center, where the conference will take place. Public transportation —Lyft, taxi, or Metrorail— is quite good in the area.

Sponsors are not announced in our website yet, but so far we have the Knight Foundation, Google News Lab, and several departments at the University of Miami. If you're interested in sponsoring the conference, send me an email.


Friday, March 3, 2017

You aren't qualified to be a professional journalist

Just a quick thought: I'm at the 2017 CAR conference these days. I'll be talking about communicating uncertainty with Mark Hansen and Jen Christiansen. I've put together this folder with readings and our slides, in case you're curious.

Anyway, I've just attended a panel with Reveal's Jennifer LaFleur and NBC's Ronald Campbell on how to spread data literacy in news organizations. They gave some very good suggestions, but some things they said were quite worrying. For instance, when asked during the Q/A, they estimated that four out of five of the reporters and editors they regularly train aren't able to even calculate percentage change.

Let me be blunt here: If your level of numeracy is so abysmal, you aren't qualified to be a professional journalist. I know it may hurt to read this, but it's the truth. Nobody who lacks a working understanding of math, statistics, and scientific reasoning can properly inform the public. Not knowing such basic stuff is the equivalent of being unable to write coherent sentences.

We've all faced this problem —I forgot too much math in college myself!— and the solution isn't to deny that it's a problem indeed, but to solve it quickly. Get to work. Right away. Stop with the I'm-not-good-at-Math bullshit. This isn't magic, and it certainly isn't knowledge that should belong to specialized teams in a newsrooms.

Here are some books to get you started, sorted from basic to more advanced; these, and many others, informed my own The Truthful Art:



Wednesday, February 22, 2017

Mapping Oscar movies

Our new project, Mapping America’s Taste in Oscar Films, is live. It was designed by Polygraph and it maps YouTube trailer views of Oscar-nominated movies during their opening week. This is part of a series of visualizations that includes other projects like World Potus, Rhythm of Food, Inaugurate, and The Year in Language. (If you're interested in how the data was put together, scroll down; there is a note at the bottom of the page.)

Here are the maps for three movies I enjoyed last year: Arrival, Hacksaw Ridge, and Hidden Figures. Notice the patterns:






Monday, February 20, 2017

Extrapolation is risky business

Here's an interesting case of potential trumpery™  that I may use in the lecture tour and book. This is a post-in-progress, so feel free to send suggestions.

Infowars, The Washington Times (WT) and other publications are reporting that “Nearly 2 million non-citizen Hispanics are illegally registered to vote.” Infowars adds that “a survey of Hispanics in the U.S. revealed as many as two million non-citizens are illegally registered to vote, reinforcing claims by President Donald Trump that millions of illegal votes were cast in the 2016 election.”

What's the evidence for such a claim of 2 million of illegal voters? There is none. It's based on a reckless extrapolation from a study that was designed with a completely different purpose.

It all began with this 2013 survey of 800 Hispanic adults conducted by Mclaughlin & Associates. The survey itself looks fine to me. I asked the author, John McLaughlin, and he provided detailed explanations of the methodology, how the sample was randomly chosen, etc. My problem, then, is not the survey, but the far-fetched extrapolations that Infowars and WT made, which have gone viral, unfortunately.

On page 68 of the summary of results you will see that among the Hispanics who aren't citizens, 13% said that they are registered to vote:




The stories at Infowars and WT quote James D. Agresti, who leads a think-tank called Just Facts:

[Agresti] applied the 13 percent figure to 2013 U.S. Census numbers for non-citizen Hispanic adults. In 2013, the Census reported that 11.8 million non-citizen Hispanic adults lived here, which would amount to 1.5 million illegally registered Latinos.
Accounting for the margin of error based on the sample size of non-citizens, Mr. Agresti calculated that the number of illegally registered Hispanics could range from 1.0 million to 2.1 million.
“Contrary to the claims of many media outlets and so-called fact-checkers, this nationally representative scientific poll confirms that a sizable number of non-citizens in the U.S. are registered to vote,” Mr. Agresti said.

I thought that Agresti's reasoning was a bit off, so I took a look at the data.

First, according to WT and Infowars, 56% of people in the survey (448 out of 800) were non-citizens. But that figure is incorrect. As the documentation of the survey itself explains, they didn't ask all the 800 people about their citizenship. They only asked those people who said that they were born outside of the United States.

Here is the actual breakdown of approximate percentages and corresponding number of people, based on the documentation of the survey (see page 4):


So it's 263 non-citizens, not 448. Of those, 13% said they are registered to vote anyway. That is around 34 people out of a sample of 800.

I sent an e-mail to Agresti pointing out that his initial calculations were based on an incorrect number of non-citizen Hispanics, 448 instead of 263. He replied very graciously, acknowledged the mistake, and proposed this correction with a larger margin of error:
For 2013, the year of the survey, the Census Bureau reports that 11,779,000 Hispanic non-citizens aged 18 and older resided in the United States. At a 13% registration rate, this is 1,531,270 Hispanic non-citizens registered to vote. Accounting for the sampling margin of error, there were about 264 non-citizens in this survey. In a population of 11.8 million, the margin of error for a sample of 264 is 6.0% with 95% confidence. Applied to the results of the survey, this is 824,530 to 2,238,010 Hispanic non-citizens registered to vote (with 95% confidence).
But this is still wrong. That 34 may look worrying (update; see comments section: It's actually just 29 people,) but it could be due to questions that might not have been well understood, even if they were clearly worded —they were; I checked,— to responders not being open to disclose their immigration situation, voter registration status, etc., or to those people even lying about any of those. These are a crucial factors to ponder.

Moreover, the original survey by McLaughlin was designed with a specific purpose —asking Hispanics about politics— and it must be used just for that. If you want to analyze voter registration fraud, you ought to design a completely different study with a questionnaire crafted with that goal, and to help overcome the aforementioned challenges, including, for instance, control or repeated questions to dodge misunderstandings or lies. I'd add that the sample of such a survey should be just of non-citizen Hispanics —not of Hispanics in general— to be truly representative.

Also, confidence intervals and their margin of errors aren't particularly precise, and ought to be used with great care, even when the sample is perfectly representative and you do no extrapolation from it to its population. Statistician Heather Krause has this excellent summary about their many limitations, and about why they are often much wider than they look. Andrew Gelman, also a statistician, has written extensively (also this, and this) about why using confidence intervals to make sweeping inferences is risky. This other article is also relevant.

Just for fun, I computed my own extrapolations using a different method: calculating the margin of error of the original percentage, 13%. There are online tools, but I prefer to do it the back-of-the-napkin way, with pencil, paper, and a basic calculator.

First, the formula to calculate a confidence interval of a sample proportion is:



This looks much more complicated than it is. First, that z value in there is 1.96 when we want a confidence level of 95% —don't worry about where that comes from; if you want to learn more about it, read the middle chapters of The Truthful Art.

So, z = 1.96. Let's move on.

What about p? That is the proportion that those 34 non-citizen Hispanics who declared to be registered to vote represent over the 263 non-U.S-born, non-citizens. So: 13%.

In statistics percentages are often represented as proportions of 1.0. Therefore, 13% becomes 0.13, and the 1-p in the formula becomes 0.87 (that's the remaining 87% of the 263.)

Now that we know that z= 1.96 and p = 0.13, let's input them in the formula. Here is the result:



That 0.04 means +/-4 percentage points. That's the margin of error that surrounds the 13% figure. 

Therefore, I can claim that if I could run a survey like this —with the exact same sample size and the same design— 100 times, I believe that 95 of them would contain the percentage of non-citizen Hispanics in the population who would say that they are registered to vote, and that it'd be within the 9% to 17% (13% +/- 4) boundaries of the confidence interval. I cannot say the same about the remaining 5 surveys. In those, the results could be completely different.

However, this calculation would only work if the percentage is close to 50% —see the comments section,— and if the sample is carefully and randomly chosen specifically in relationship to the question at hand. If it isn't, as it's the case here, uncertainty may increase astronomically.

This is, by the way, without taking into account other possible uncertainties, like the one surrounding the 11.8 million figure from the Census, which I didn't bother to check.

Again, what I've done here is just an arithmetic game. I think that all these figures and computations are way too uncertain to say anything that isn't absurd. Based solely on the survey data, we cannot suggest that we have an illegal voter problem in the U.S. —or that we don't. The data from the survey is useless for this purpose, and it certainly doesn't support a headline saying that 2 million people are illegally registered to vote.* Besides the problems with casually extrapolating from a sample, the survey wasn't designed to analyze voter fraud anyway.

(My friends, statisticians Heather Krause, Diego KuonenJerzy Wieczorek, and Mark Hansen, read this post and provided very valuable feedback. Thanks a lot!)

This funny XKCD cartoon is worth remembering, by the way:




*Disclaimer: I am not opposed to requiring a photo ID to vote in principle. There are arguments in favor and against it in the U.S: We have solid evidence that photo ID laws are used to restrict the vote of minorities (this book is a great starting point); but I also understand the concerns of those who want to keep elections 100% clean. I'm Spanish, and all Spaniards have a DNI (National Identification Document,) which you must show to vote. I can't see why this cannot happen in the U.S, too. There are some big and hairy “buts” in this comparison, though: Spain's DNI is extremely easy and inexpensive to get. And we are registered to vote by default.



UPDATE (02/21/2016): The great Mark Hansen, from Columbia University, has just sent me an e-mail with these comments about the confidence intervals:

But what does this interval 13%+/-4 mean? Suppose 100 other organizations ran the same survey —with the exact same sample size and the same design -- but drawing their own sample of the population. OK 100 is large, but there are lots of polling organizations out there taking the public's temperature on various topics. Suppose each of the 100 groups then computes an interval like I've done here. In some samples, they will again find 34 people claiming to have registered to vote. But some groups will have a number that's larger, and some will have a number that's smaller. It depends on the sample the group has drawn.

However, because they are all taking random samples, statisticians assure us that we should expect 95 of the 100 intervals they've constructed will contain the number you're interested in, the true percentage of non-citizen Hispanics in the population who would say that they are registered to vote. Now here's the trick. You don't know if the true percentage you're after is in any particular interval. Like your 13%+/-4. This interval could be one of the 95 that contains the true percentage, or, if you're unlucky, it is one of the 5 that doesn't. You don't know.

This is what is meant when statisticians use the term "confidence." It might not sound particularly confident, but the researchers who pioneered this idea were looking for "rules to govern our behavior... which insure that, in the long run of experience, we shall not be too often wrong." So the 95 out of 100 refers to repeated uses of the survey _procedure_ (conduct random sample, construct interval). Wording it differently, it tells us that our confidence intervals won't actually contain the number we are hoping discover in 5 out of 100 surveys. Yes,  5 out of 100 organizations will get it wrong. That's 1 in 20. Of course that begs the question, who decided being wrong 1 in 20 times is OK? Save that for another post!

Confidence!”

Friday, February 3, 2017

Announcing Visual Trumpery: A lecture tour

Trumpery means worthless nonsense, something that is, simultaneously, deceitful and showy. When I learned about this splendid and timely word, I immediately thought that it could be the right title for a two-hour presentation —and even a book— that I've been entertaining for a while. It'd describe strategies to fight back against the deluge of bullshit coming from left and right (sleaziness is quite bipartisan,) and it'd be not just for data designers or journalists, but for school teachers and citizens in general.

This morning I thought that I could do a tour of free talks, titling them Visual Trumpery (I'm still undecided about the subtitle.) I'd like to begin in the second semester of 2017. Maybe with your help.

I've already been contacted by people who might bring this talk to New York, Atlanta, Portland, Barcelona, Paris, etc. If you're interested in hosting a Visual Trumpery event in your city, contact me. I'll give priority to U.S. cities, but I'm not ruling out other countries.

Here's what I propose:

1. I won't charge my regular daily fee. I'll waive salary. You'll only need to cover a roundtrip flight (coach,) hotel, if needed (I'm not picky,) taxi, and meals (I'm a very easy guest.) I'd also appreciate a glass of red wine or two, but that's not mandatory.

2. The talk will cover data, visualization, and infographics, but it won't be limited to them. I'd like the presentation to be a bit broader: Improving rational thinking among the public, fostering a better understanding of probability and uncertainty, etc.

3. The talk must not be limited to your organization, company, or educational institution. It must be open to the public, free, and promoted widely. Closer to the Summer, I'll try to come up with a flyer and a poster you can use to spread the word. I may even create a small dedicated website.

Let's see if we can make this happen. Looking forward to hearing from you.


Wednesday, January 25, 2017

Visualizing word popularity: Scrollytelling, line graphs, micromaps, etc.

Another day, another visualization coming from Google News Lab, now in collaboration with Polygraph: The Year In Language 2016. It combines scrollytelling with multiple animated and interactive charts to highlight words which grew in popularity during 2016. Think of terms like “bigly” and ”gaslighting,” for instance.

(Full disclosure: I'm a consultant for Google News Lab's visualizations, as I've mentioned before.)

My favorite part of The Year In Language 2016 is the linked micromaps at the bottom of the page. Here are the ones for the aforementioned words:





Monday, January 23, 2017

Our new visualization: Presidential inaugural addresses

Google has just launched a new visualization in the series I'm helping with. Its title is Inaugurate, and the author is Jan Willem Tulp. It's already been featured by USA Today, and Simon Rogers, who is at the helm of this series —previous ones are WorldPotus and Rhythm of Food,— has written a good explainer about it. I worked as an advisor.

Inaugurate visualizes the first presidential inauguration speeches of the 12 most searched-for presidents. Each speech is a column, and each rectangle within one column represents the length of a sentence. It was surprising for me to see how short Abraham Lincoln's 1861 address was.

The circles on top of the rectangles mark mentions of the most common subjects —God, democracy, justice, economy, etc.,— and the gradient on top of each section represents the current search interest in Google: Red is high interest; blue is low. In the image below you can see that “liberty” appears in many speeches, and it's still a topic of interest in Google searches:


Exploring this visualization will reveal some insights. For instance, go to the “Emotions & Human Values” section. In the addresses by George W. Bush or Barack H. Obama, subjects are varied: compassion, courage, happiness, dignity, and morality. Trump is all patriotism and, above all, loyalty. In the “Society” section you'll also see that wealth appears in Trump's speech more times than in any other:


Saturday, January 14, 2017

When doing data reporting, look at the raw numbers, not just at percentages —and write an accurate headline

A headline in The New York Times today reads “In the Shopping Cart of a Food Stamp Household: Lots of Soda.” Is it true?

The story itself provides hints that the headline is misleading, and likely to damage the image of the SNAP program and its beneficiaries. This is dangerous, considering that many readers look at clickbaity headlines, like the NYTimes one, but don't read stories. SNAP households aren't different than the rest of households. Most Americans buy and drink way too much soda and, as a result, obesity and Type II diabetes have reached epidemic levels.

The story says that households that receive food stamps spend 9.3% of their grocery budget on soft drinks, while families in general spend 7.1%. This is one of those cases when reporting just percentages, and not taking into account other variables, such as total spending in groceries, sounds fishy.

Here's why: Never focus just on the data in front of your eyes, or on derived variables, like percentages or rates. Think about the raw numbers behind them. Say that you are comparing two families of four people each, a SNAP one and a non-SNAP one. They spend, respectively, $100 and $200 on groceries —my guess is that SNAP recipients are poorer than Americans in general.

The SNAP household would be spending $9.3 a week on soda, or $2.3 per person; the non-SNAP one —$14,2, or $3.6 per person! Who's buying “lots of soda” now?

To add insult to injury, the story says about the report from the Department of Agriculture: “One limitation of the report was that it could not always distinguish when SNAP households used their benefits, other money or a combination of the two to pay for transactions.” Michele Simon, a lawyer, is quoted in the story: “This is the first time we’ve had confirmation that this massive taxpayer program is promoting all the wrong kinds of foods.” Well, that isn't true. If the story is a good reflection of the U.S.D.A. report, there isn't enough basis to claim that tons of SNAP money is being used to pay for sugary drinks.

This doesn't render the story completely wrong —I couldn't find the original source,— but it does challenge its headline, which singles out SNAP families. The reactionary insurgency taking over government in a few days will likely use this as ammunition to attack safety net programs, as they know that most people share headlines through Facebook and Twitter, but don't spend time reading long and nuanced stories.

(One last note: newspaper reporters often don't write the headlines for their own stories; editors do.)

UPDATE: This post on Facebook by University of Minnesota's Joe Soss provides much more information, and a link to the report itself. I'll take a look at it, as it looks like that the story is much worse than it seemed to me at first.

Tuesday, January 10, 2017

In visualization, white space is your friend

One point I make in the introduction to graphic design classes I've taught in the past is that white space isn't empty space. Empty space is needlessly unused space; white space has meaning. There's even a good introduction to graphic design titled White Space is Not Your Enemy.

This morning's New York Times has a good example of that principle, courtesy of Margot Sanger-Katz and Quoctrung Bui. The top-right quadrant is occupied by a large majority of the American public —to the right and to the left,— and experts in gun violence. The bottom, a vast, blank ocean, probably belongs to Republican legislators, NRA lobbyists, and the minority President-elect.

(Click the image to expand.)


Thursday, January 5, 2017

Data and visualization conferences: Tapestry, NICAR, Malofiej, DH+DJ

As long as I'm finishing that-project-that-should-not-be-named, I've been able to register for several conferences this year. The first two are Tapestry and IRE+NICAR. If you're going to one, you should go to the other, as they take place in consecutive days, and in cities that are very close to each other, St. Augustine and Jacksonville. Tapestry is on March 1, and NICAR is between March 2 and 5.

I've only attended these conferences once, and I really enjoyed them. Tapestry is a small (around 100 people) gathering of data visualization and infographics nerds. IRE+NICAR is a large conference of investigative reporters, data journalists, visualization designers, and programmers. You need to apply to attend Tapestry, but I heard that there's still room, so do it ASAP.

Next, I'm going to the Malofiej International Infographics Summit, in Pamplona, Spain. The 2017 edition, on March 26-31, is the 25th anniversary, so I'm not skipping this one. I began attending Malofiej in 2002, and I've only missed it a couple of times since then.

Finally, at the University of Miami we're doing the second Digital Humanities + Data Journalism Symposium on September 14-16. The first edition was a success, so we've decided that the 2017 edition is going to be bigger (first year was 120+ people; we want 200+ this time) and more affordable ($99 for 2.5 days of awesomeness.)

We have 8 confirmed speakers so far —see them below— and many more will be added to the list soon. Registration will open in early February, but save the dates, as this is going to happen for sure. You can take a look at the draft of the website here (this isn't the final URL.)

Thursday, December 29, 2016

A few graphs from Nerd Journalism

Last night I finished writing the draft of Nerd Journalism, my PhD dissertation. Now I need to spend the next 30 days or so editing the damn thing. I want to submit it at the end of January, and I'll try to defend it over the Summer.

I'll release the entire book as a free PDF in www.nerdjournalism.com, probably after July 2017. I'll also publish around 30 video interviews, along with their transcripts, and the data collected from the Malofiej International Infographics awards, which covers around 20 years of the competition. Besides those, my other source was observations conducted at ProPublica and at UnivisiĆ³n News online.

The goal of this project is to explore the changes news information graphics have experienced in the past two decades. One of them is the kind of graphics visual journalists favor. In the past, they were mostly pictorial/figurative explanations and descriptions, and their central elements used to be illustrations, photographs, locator maps, etc. In the present, abstract representations of data are much more common.

With the help of some students I put together a spreadsheet of nearly 2,000 winners of the Malofiej awards, quantifying their country of origin, the publication where they appeared, and also the elements they included: Illustrations, graphs, charts, maps of different kinds, etc. Then, we also identified which of those elements had a more dominant place in the composition.

Let me show you a few graphs, even if I still need to verify and copy-edit their content.

The first one shows the dominance of countries like the U.S. and Spain in the Malofiej awards, but also the increasing presence of Asian publications thanks —I believe— to the South China Morning Post, which in recent years has been published a lot of excellent work. Click on the image to expand:



The graph below is about the most dominant element on each winning entry. You'll notice that I grouped graphic forms into two large categories, “pictorial” — figurative representations of physical entities, such as illustrated explanations and descriptions, photographs, locator maps, etc.,— and “abstract” —graphs, charts, data maps, etc. In recent years, abstract graphics are the main element in nearly half of the projects that won awards at Malofiej:


On this one I split up the abstract-pictorial data by region. “Non-Latin America” means mainly the United States. The differences, in comparison to the competition average, are stark:


This one shows the same in the countries that have won more awards, the United States, Spain, Argentina, Brazil, Germany, and the United Kingdom; the first graph on each country shows total counts, and the second percentages:


As long as one of the variables in my spreadsheet is “publication,” it was possible to show these trends also on specific organizations. See The New York Times, for instance:


More coming in the Summer of 2017...

Friday, December 16, 2016

Our brand new MOOC is here! Data Exploration and Storytelling

It's been a while since I last taught a Massive Open Online Course. My friend Heather Krause —a statistician and data journalist— and I have created a brand new one, offered through UT's Knight Center, and titled “Data Exploration and Storytelling: Finding Stories in Data with Exploratory Analysis and Visualization.”  It begins on January 16, 2017.

You can sign up for free through this link.

I guess that the long title and subtitle explain what the course is about: It doesn't just cover visualization, but also data exploration and understanding, and how to write stories about your findings. The materials include several chapters from The Truthful Art and The Functional Art, besides many new video lectures and tutorials about software tools.

I recorded a short intro video:

Wednesday, December 14, 2016

The Financial Times chooses "The Truthful Art" as one of the best books for data geeks

This is truly an honor: The Financial Times has just chosen The Truthful Art as one of the best books “for data geeks” in 2016. Here's their tweet, and here's the article, written by Alan Smith, their data visualization editor. I am very familiar with all the others, except Where the Animals Go, which I've just ordered. You should take a look at them, as well.


Saturday, December 3, 2016

Datatrump

I needed a 10 minute break from that-project-that-must-not-be-named, so I decided to use Robert Grant's DrawMyData to design a Trump chartoon. See it below, along with the correlation coefficient and other summary statistics. Feel free to use it. If you want the CSV with the data, can download it here.

(A while ago I did a datasaurus; I got tired of showing Anscombe's quartet when discussing the importance of visualization for data exploration.)


Thursday, December 1, 2016

The rise of data visualization in news graphics

The next two months are going to be quite busy. I need to finish writing my PhD dissertation, titled Nerd Journalism. I'll release it as a free e-book through this website probably in the Summer of 2017.

One of the elements of my analysis of how journalistic visualization and infographics have changed in the past decade and a half is the kind of projects that win awards in the Malofiej Infographics competition, the most popular one among news graphics creators.

The graph below shows the percentage of the nearly 2,000 projects recognized by Malofiej juries that had a pictorial graphic (a visual explanation, a photograph, etc.) as central or main element, versus those that emphasize some sort of abstract representation (data graphs, charts, numerical tables, etc.) A couple of important notes: I'm still cleaning up the data, so take this with a grain of salt. Also, this is just a quick summary; other graphics that I'll design for the book will break this down further by type of graphic —graphs, data maps, explanatory illustrations, locator maps, etc.— by country of origin, by publication, etc.

Finally, the last two editions of Malofiej I got data from, the 22nd and the 23rd, are missing in this graph, but the trend continues: Abstract graphics —mostly data visualizations— keep growing, and constitute half of the projects that get awards:




Wednesday, November 30, 2016

New visualization: Rhythm of Food

The second project coming out of my collaboration with Google News Lab has just been published. It was designed by Moritz Stefaner, and it's titled Rhythm of Food. As in the case of the first visualization in this series, Worldpotus, I can't take credit for anything other than talking to Moritz and Google News Lab's Simon Rogers every week or two to offer some feedback here and there.

(Simon has written about this visualization.)

Rhythm of Food reveals how Google searches for food have varied since 2004 through a series of fun circular plots. Some of you may contend that more traditional graphs —time-series line graphs?— may have been more appropriate, but I'd disagree; just consider: a) the goal here isn't accuracy, but to reveal overall, general patterns, b) the circular plots fit well on mobile screens, c) they are visually alluring, d) they look like food on a dish —OK, perhaps not a strong reason,— e) you can see the data as line graphs by clicking on the + symbol of each chart. Here:




The point is that, as in the case of Worldpotus and other projects we'll release in the future —next one will likely be one by Jan Willem Tulp in January,— we're trying to let people see data in multiple ways: As eye-catching and sometimes unorthodox charts first, and as more bread-and-butter graphs or tables if they want further detail.

We're also trying to combine the narrative/explanatory with the exploratory. This project first describes the data, highlights some interesting cases —annotating peaks— and then it lets you explore at will.

Here are some early exploratory sketches; enjoy: