Thursday, July 6, 2017

Stack and unstack

My friend Geoff McGhee, who works for Stanford University's Bill Lane Center for the American West, has just published a nice series of interactive graphs and maps about California's move toward renewable energies. One of them caught my attention. Here's an animated GIF of it:

This is a nice example of a principle I explain in The Truthful Art: Stacked graphs show both the total and its components, but they emphasize the former, not the latter. When the total is more relevant than the parts, a stacked graph may be an appropriate choice.

However, what if seeing the variation of each portion with great accuracy is as important as the total? Then you need to let readers unstack the graph, or see each portion separately. Otherwise, estimating the variation of the parts not sitting on the horizontal baseline is hard.

Here's another example, a little classic by The New York Times.

Saturday, July 1, 2017

Brit Hume misreads a graph and unintentionally parrots a White House talking point

Fox News pundit Brit Hume complained about this graph from the Congressional Budget Office's report on the Senate Better Care Reconciliation Act (BCRA):

Here's Hume's tweet, replying to an article by Jonathan Cohn claiming that the BCRA would reduce Medicaid by 26% on 2026 and by 35% on 2036, which is exactly what the CBO suggests in its report:

The graph isn't misleading at all if you bother to read its title, X-axis labels, and source. What Hume is doing here is to unintentionally parrot a White House talking point, calling the future shrinkage of Medicaid not a “cut”, but a “reduction in the rate of increase.”

The way we use words frame discussions, so I tweeted at Hume with the following analogy, which all pundits can grasp (I certainly can!): Imagine that I hire you for the next 20 years, and our contract limits your yearly salary “increase” to just 1% a year. Then, we let inflation play its magic. After two decades, when comparing your purchasing power in 2037 to the one in 2017, wouldn't you call that a “cut”? You'd certainly experience it as such, if inflation stays at its current level.

This isn't a great analogy —for one thing, I don't know if Medicaid increases take into account the rising costs of healthcare. However, I was just trying to illustrate the difference between absolute change and relative change, and the fact that “increases” are always relative to variables like needs. In the case of Medicaid it's not just inflation (if it isn't built in already) that may have pernicious effects in the future, but also predicted variations in population size, its composition, economic growth, number of Medicaid recipients (gross expense vs. per capita expense,) societal expectations, etc. I'm no expert, but that is what the CBO said.

Hume then replied to my tweet:

His response unleashed a barrage of abuse from the troll horde that follows him, which also attacked Cohn. Some accused me of engaging in “liberal Math” and many xenophobic goons told me to go learn some English. Not a bad suggestion, I'll admit, but, as I'm busy, I decided not to waste time replying to each of them. I erased my tweet, and decided to write this post instead.

What the White House and its propaganda machine are doing —and I'm not claiming that Hume is part of it; he strikes me as a polite and professional fellow— is to twist the English language the aforementioned trolls claim to cherish. They do it to sugarcoat their message, as they know that cuts to the safety net —present and projected— are unpopular.

Contrary to what Hume claims, an increase that doesn't keep pace with projected inflation and other relevant factors, like the number of people predicted to need aid, is not an increase, unless that you believe that the piece of paper below has value on its own, independent of the economy:

That isn't how money works. To quote Charles Wheelan's wonderful Naked Money, “it is a piece of paper with no intrinsic value.” Here's how I reason about this: Let's say that I give my kid $10 dollars a week, which he uses exclusively to pay for his favorite hobby, ice skating. I drive him to skate with his friends every Friday evening.

Then, at the end of 2017 I decide to increase my kid's 2018 ice skating allotment to $11. Is that a real increase? It depends. In isolation, it sounds like it. My kid will be receiving 11 wrinkled sheets of paper instead of 10, after all.

But if the cost of an evening of ice skating changes to $12 and I can't drive him anymore, so he needs to pay $1 for the bus, my kid will lose purchasing power. And here comes the critical part, which is what companies and governments do all the time to control future expenses: If somehow I can predict —with the always unavoidable uncertainty attached to any long-term forecast— that ice skating will get that expensive next year, and that my kid will need to take the bus, I'll be doing something akin to cutting his allotment by adding just $1 to it.

(Imagine then that I also predict that my younger daughter will also become a fan of ice skating soon, and will really need to practice it. However, I decide in advance not to give her an allotment, but to split the current one in half, and keep increasing it just $1 a year.)

That's why many are talking about future “cuts” to Medicaid, even people that —not like me— are native English speakers. See this explainer by the Washington Post, this critique of a White House graph by, and Cohn's own tweetstorm.

That said, if Hume's problem is with the word “cut” itself, I'll agree to stop using it. In return, I'd ask him and the White House to abandon the term “rate of increase” in this case, as may be equally misleading. Let's call Medicaid's future reduction a “covfefe” instead. The underlying reality won't change, as numbers are stubborn creatures: Medicaid will be roughly 1/4 smaller on 2026 and more than 1/3 smaller on 2036 in comparison to the original baseline. Exactly what the chart represents.

You may argue that this'd be a good thing —I happen to be a moderate European-style fiscal conservative— or a bad one, but a covfefe is a covfefe.

UPDATE: My friend, economist Jon Schwabish, author of Better Presentations, has sent me the following suggestions for the example:
—You tell your son that you are going to give him an extra 2 dollars every month to keep up with rising costs of ice skating (skate rental, ice time, whatever). He will get $10 in January, $12 in February, $14 in March, and so on. Over the course of the year, he will have received $252 in total allowance.
—You then decide you are not going to give him as much and instead you increase his allowance by $1 per month: He'll get $10 in January, $11 in February, $12 in March, and so on. Over the course of the year under this scenario, he will have received $186 in total allowance.
—Under the new world in which he gets a $1 increase per month instead of $2, his allowance is still growing over the period (you could even make those increases rise with inflation, so there is no actual loss of purchasing power), but it's less than the original promise; in CBO-speak, the original is the baseline and the new world —where he gets the $1 increase— is the new policy.
—That's what's happening in the Medicaid debate. Under BCRA, Medicaid spending grows more slowly than the baseline. I can see Hume's point of not calling that a cut, because there is a connotation that the word “cut” implies a literal decline, which is not what happens in BCRA. That being said, there is a cut relative to the rate of Medicaid growth relative to the baseline, but not absolute dollars.
That is, I guess, if health care cost growth is indeed factored in already, and other aforementioned variables —possible population size, age composition, Medicaid recipients, etc.— are considered correctly. In any case, as I said, let's just find a new term we can all agree on, and call this a covfefe.

Friday, June 23, 2017

New tutorials and resources section

I've completely revamped the Tutorials & Resources section of this website. Check it out. Here are the main changes:

1. I've added Maarten Lambrechts's wonderful Data Cleaning With Excel tutorial to the list, besides a couple of short videos about pivot tables.

2. I've recorded a brand new tutorial about INZight, an easy to use but powerful free tool for data exploration through visualization.

3. The order of the Illustrator tutorials has changed a bit.

Next time I have some time —crossing my fingers— I'll try to do a tutorial about time-series analysis with INZight, and also grab a few free tutorials about Tableau and PowerBI.

These are all videos that I use in my classes at the University of Miami, in the Journalism and the Interactive Media programs. The textbooks of my courses are both The Functional Art and The Truthful Art, besides a long series of articles.

Here are some screenshots from the INZight tutorials:

Thursday, June 22, 2017

Student work: Visualizing forest loss in the Amazon

I want to bring your attention to Silent Forest, a large news documentary project that maps forest degradation and destruction in the Brazilian Amazon. The Guardian has a good story about it, and the About section itself can help you understand what it covers, so I won't bore you with details. I just wish to highlight the visualizations and illustrations in it, designed by my student Laura Kurtzberg.

Laura is a student in our MFA in Interactive Media. She came to the program with experience in coding and mapping, and her skills have only gotten better in this first year with us. Silent Forest can give you an idea of what she's capable of: From interactive data visualizations to pictorial illustrations, like the ones you'll see in the section about bird species. She can also model and animate in 3D, but she didn't show off here.

(Full disclosure: I'm listed as a “special advisor in data visualization” but my participation was limited to giving some feedback on the graphics.)

Sunday, June 11, 2017

Nerd Journalism

On Friday I defended my PhD dissertation in Barcelona. Its title is Nerd Journalism: How Data And Digital Technology Transformed News Graphics.

The latest draft still has plenty of errors and typos, and I need to incorporate the feedback from the committee, but if you want to take a look at it —being aware that the final version will likely be different,— it's here. Later this year I'll also publish the data and those video interviews and transcripts authorized by the nearly 40 interviewees.

(Side note, as some friends are already making fun of this in social media: I still believe that the only true Dr. Alberto Cairo is the famous Italian medical doctor who for many years saved lives in Afghanistan.)

Saturday, May 13, 2017

Data videos

It may be because I'm getting older, but in the past few years I've grown a tiny bit less fond of intricate interactive graphics, and a tiny bit more fond of videos and animations that explain complex ideas and data in a linear fashion. This post collects some I've seen this week.

The first one is an entertaining overview of the Dunning-Kruger effect (less knowledge leads to more confidence in your own opinions) narrated by Stephen Fry. Fry mentions something that Nigel Holmes has said for many years: If you make people smile and feel good before presenting a message, the message becomes more persuasive. It turns out that fun —and not just pure clarity and efficiency— matters a lot:

Amanda Cox's OpenVis presentation about the visualization of uncertainty is very informative. She even discusses hurricane forecasts and the cone of uncertainty, referring, I believe, to the roundtable Jen Christiansen, Mark Hansen, and myself had a while ago.

All OpenVis talks can be found here. Have fun.

The Pew Research Center is launching a series about elementary statistical methods titled Methods 101. Here's the first video, explaining random sampling with some good analogies:

Finally, Sophie Sparkes chose The Truthful Art for Tableau Public's first Data Viz Book Club. Sophie and Andy Cotgreave have just posted a conversation about it. I'm happy to see that they mention that it surely is a visualization book, but that I was aiming at something a bit bigger: A book about reasoning.

Thursday, May 4, 2017

The importance of writing making-of articles in visualization

Our new visualization is out. This time we've collaborated with Maarten Lambrechts on a visualization about Eurovision, The Eurosearch Song Contest. Go take a look at it. It's fun.

Something we're asking everyone we collaborate with is to write a making-of article. We want this series of visualizations to be not only informative and experimental, but also be used as educational tools by instructors and beginners, either to praise them or to critique them.

Maarten's article is excellent, as it explains the design process and his choices in detail. It also includes some hand sketches. I have a soft spot for those.

You can see a gallery of all our previous projects here, and this article explains our goals. There are many more visualizations coming soon.

Wednesday, April 26, 2017

Datasketching culture

Our new Google News Lab visualization project is out. This time, it's actually two projects in one: Beautiful In English, by Nadieh Bremer, and Adventure is Out There!, by Shirley Wu. As usual, I provided some art direction.

You may be familiar with Nadieh's and Shirley's Each month they pick a topic (nature, music, etc.) and each of them creates an experimental and quirky artistic visualization separately. When I learned about it, I thought that it was a marvelous idea, so I asked them whether they'd be interested in doing something with Google search data. They accepted and they chose Culture as their theme.

Nadieh's visualization shows the most common words translated into English from other languages, and Shirley's focuses on popular travel destinations. You can read all details about both visualizations in the writeup they've put together. It includes drafts and early prototypes. Don't miss it.

If you wish to learn more about our ongoing series of visualizations —see them all so far here; more coming soon,— Fast Company's Co.Design has just published a nice article about it. It describes what we're trying to do quite well: As an art director, I don't really want to “direct” very strictly. Instead, I prefer to give the designers we work with some freedom. We are fond of graphics that are compelling, informative, and fun, but we also want designers to let their imaginations fly a bit, even if the end result is wackier than usual. We don't mind. Some projects will succeed at balancing creativity with understandability, others maybe not so much, but that's exactly the point. Some novel ideas will be discarded, but others may stick, and eventually expand our shared graphics vocabulary.

(Google's Simon Rogers —the brains behind this initiative— has also shared some thoughts about making visualization friendlier.)

Friday, April 14, 2017

Multiple graphics, multiple possible insights

One of my favorite mantras about visualization is that a single chart, graph, or map is unlikely to show everything you need to know about a story. It's through the combination of several adjacent graphics that compelling insights often arise.

NPR Visuals's latest project, “Maps Show A Dramatic Rise In Health Insurance Coverage Under ACA,” is a simple and lovely example of that principle: the choropleth map reveals geographic patterns; the histogram above it displays the decrease in the number of counties with high percentages of uninsured people.

The interactive visualization is followed by a story, some static graphs, and small multiple array of maps. They also are worth your attention.

Monday, March 20, 2017

Visualization book club

Tableau Public's Sophie Sparkes has just launched a visualization book club (Twitter hashtag is #VizBookClub.) She's planning to read a graphics book every two months, and she invites everyone to join her in a conversation about its contents.

The first book she's chosen is The Truthful Art, which is a huge honor. Here are the topics she'd like to discuss:

Thanks for doing this, Sophie. Looking forward to reading people's thoughts.

Wednesday, March 15, 2017

An update on the Trumpery lecture series

After I announced my Visual Trumpery lecture tour, I received requests from more than 30 cities. I've been organizing my schedule this week; below you can see a screenshot of the Excel spreadsheet I'm using. I'll do my best to announce places and dates in early June.

As mentioned in the original post, the talks will be free and open to anyone. I'll provide more information about how to sign up as soon as I can.

A few other updates:

• The current working title is Visual Trumpery: How to fight against fake data, fake facts, and fake visualizations —from the left and from the right. The subtitle may change a bit. I'm still debating it.

• There'll be a dedicated website for the tour,

• I'll try not to mention any politician by name in the talk, which will be, as the subtitle indicates, bipartisan. However, the left-right split in the examples won't be 50-50. Nowadays a disproportionate amount of data bullshit comes from the far right.

• I may write a book about it, too.

More news coming soon.

Monday, March 13, 2017

The new Digital Humanities and Data Journalism Symposium

Last year I helped organize the first Digital Humanities + Data Journalism Symposium at the University of Miami. It was a success, bringing together two communities that have a lot to learn from each other, so we decided to do it again this year.

Our website is updated and registration is open. If you work in visualization, infographics, data journalism etc., I think that most of the presenters will be very familiar to you. We'll cover a variety of topics, from data quality to visualization, from fake news to the ethical challenges of Artificial Intelligence. There will also be plenty of time to mingle and make new friends.

The dates are September 14-16 (Thursday-Saturday until noon.) This year we're aiming to make it bigger than in 2016, and also more affordable. We've reduced the registration rate to $99. Also, you can submit a proposal for a lightning talk or to show your academic or professional work in a booth or desk.

Space is quite limited so if the symposium sounds interesting to you, I'd register right away and book your flights and hotel. We don't have an official hotel, but there are many options at a reasonable distance from the Newman Alumni Center, where the conference will take place. Public transportation —Lyft, taxi, or Metrorail— is quite good in the area.

Sponsors are not announced in our website yet, but so far we have the Knight Foundation, Google News Lab, and several departments at the University of Miami. If you're interested in sponsoring the conference, send me an email.

Friday, March 3, 2017

You aren't qualified to be a professional journalist

Just a quick thought: I'm at the 2017 CAR conference these days. I'll be talking about communicating uncertainty with Mark Hansen and Jen Christiansen. I've put together this folder with readings and our slides, in case you're curious.

Anyway, I've just attended a panel with Reveal's Jennifer LaFleur and NBC's Ronald Campbell on how to spread data literacy in news organizations. They gave some very good suggestions, but some things they said were quite worrying. For instance, when asked during the Q/A, they estimated that four out of five of the reporters and editors they regularly train aren't able to even calculate percentage change.

Let me be blunt here: If your level of numeracy is so abysmal, you aren't qualified to be a professional journalist. I know it may hurt to read this, but it's the truth. Nobody who lacks a working understanding of math, statistics, and scientific reasoning can properly inform the public. Not knowing such basic stuff is the equivalent of being unable to write coherent sentences.

We've all faced this problem —I forgot too much math in college myself!— and the solution isn't to deny that it's a problem indeed, but to solve it quickly. Get to work. Right away. Stop with the I'm-not-good-at-Math bullshit. This isn't magic, and it certainly isn't knowledge that should belong to specialized teams in a newsrooms.

Here are some books to get you started, sorted from basic to more advanced; these, and many others, informed my own The Truthful Art:

Wednesday, February 22, 2017

Mapping Oscar movies

Our new project, Mapping America’s Taste in Oscar Films, is live. It was designed by Polygraph and it maps YouTube trailer views of Oscar-nominated movies during their opening week. This is part of a series of visualizations that includes other projects like World Potus, Rhythm of Food, Inaugurate, and The Year in Language. (If you're interested in how the data was put together, scroll down; there is a note at the bottom of the page.)

Here are the maps for three movies I enjoyed last year: Arrival, Hacksaw Ridge, and Hidden Figures. Notice the patterns:

Monday, February 20, 2017

Extrapolation is risky business

Here's an interesting case of potential trumpery™  that I may use in the lecture tour and book. This is a post-in-progress, so feel free to send suggestions.

Infowars, The Washington Times (WT) and other publications are reporting that “Nearly 2 million non-citizen Hispanics are illegally registered to vote.” Infowars adds that “a survey of Hispanics in the U.S. revealed as many as two million non-citizens are illegally registered to vote, reinforcing claims by President Donald Trump that millions of illegal votes were cast in the 2016 election.”

What's the evidence for such a claim of 2 million of illegal voters? There is none. It's based on a reckless extrapolation from a study that was designed with a completely different purpose.

It all began with this 2013 survey of 800 Hispanic adults conducted by Mclaughlin & Associates. The survey itself looks fine to me. I asked the author, John McLaughlin, and he provided detailed explanations of the methodology, how the sample was randomly chosen, etc. My problem, then, is not the survey, but the far-fetched extrapolations that Infowars and WT made, which have gone viral, unfortunately.

On page 68 of the summary of results you will see that among the Hispanics who aren't citizens, 13% said that they are registered to vote:

The stories at Infowars and WT quote James D. Agresti, who leads a think-tank called Just Facts:

[Agresti] applied the 13 percent figure to 2013 U.S. Census numbers for non-citizen Hispanic adults. In 2013, the Census reported that 11.8 million non-citizen Hispanic adults lived here, which would amount to 1.5 million illegally registered Latinos.
Accounting for the margin of error based on the sample size of non-citizens, Mr. Agresti calculated that the number of illegally registered Hispanics could range from 1.0 million to 2.1 million.
“Contrary to the claims of many media outlets and so-called fact-checkers, this nationally representative scientific poll confirms that a sizable number of non-citizens in the U.S. are registered to vote,” Mr. Agresti said.

I thought that Agresti's reasoning was a bit off, so I took a look at the data.

First, according to WT and Infowars, 56% of people in the survey (448 out of 800) were non-citizens. But that figure is incorrect. As the documentation of the survey itself explains, they didn't ask all the 800 people about their citizenship. They only asked those people who said that they were born outside of the United States.

Here is the actual breakdown of approximate percentages and corresponding number of people, based on the documentation of the survey (see page 4):

So it's 263 non-citizens, not 448. Of those, 13% said they are registered to vote anyway. That is around 34 people out of a sample of 800.

I sent an e-mail to Agresti pointing out that his initial calculations were based on an incorrect number of non-citizen Hispanics, 448 instead of 263. He replied very graciously, acknowledged the mistake, and proposed this correction with a larger margin of error:
For 2013, the year of the survey, the Census Bureau reports that 11,779,000 Hispanic non-citizens aged 18 and older resided in the United States. At a 13% registration rate, this is 1,531,270 Hispanic non-citizens registered to vote. Accounting for the sampling margin of error, there were about 264 non-citizens in this survey. In a population of 11.8 million, the margin of error for a sample of 264 is 6.0% with 95% confidence. Applied to the results of the survey, this is 824,530 to 2,238,010 Hispanic non-citizens registered to vote (with 95% confidence).
But this is still wrong. That 34 may look worrying (update; see comments section: It's actually just 29 people,) but it could be due to questions that might not have been well understood, even if they were clearly worded —they were; I checked,— to responders not being open to disclose their immigration situation, voter registration status, etc., or to those people even lying about any of those. These are a crucial factors to ponder.

Moreover, the original survey by McLaughlin was designed with a specific purpose —asking Hispanics about politics— and it must be used just for that. If you want to analyze voter registration fraud, you ought to design a completely different study with a questionnaire crafted with that goal, and to help overcome the aforementioned challenges, including, for instance, control or repeated questions to dodge misunderstandings or lies. I'd add that the sample of such a survey should be just of non-citizen Hispanics —not of Hispanics in general— to be truly representative.

Also, confidence intervals and their margin of errors aren't particularly precise, and ought to be used with great care, even when the sample is perfectly representative and you do no extrapolation from it to its population. Statistician Heather Krause has this excellent summary about their many limitations, and about why they are often much wider than they look. Andrew Gelman, also a statistician, has written extensively (also this, and this) about why using confidence intervals to make sweeping inferences is risky. This other article is also relevant.

Just for fun, I computed my own extrapolations using a different method: calculating the margin of error of the original percentage, 13%. There are online tools, but I prefer to do it the back-of-the-napkin way, with pencil, paper, and a basic calculator.

First, the formula to calculate a confidence interval of a sample proportion is:

This looks much more complicated than it is. First, that z value in there is 1.96 when we want a confidence level of 95% —don't worry about where that comes from; if you want to learn more about it, read the middle chapters of The Truthful Art.

So, z = 1.96. Let's move on.

What about p? That is the proportion that those 34 non-citizen Hispanics who declared to be registered to vote represent over the 263 non-U.S-born, non-citizens. So: 13%.

In statistics percentages are often represented as proportions of 1.0. Therefore, 13% becomes 0.13, and the 1-p in the formula becomes 0.87 (that's the remaining 87% of the 263.)

Now that we know that z= 1.96 and p = 0.13, let's input them in the formula. Here is the result:

That 0.04 means +/-4 percentage points. That's the margin of error that surrounds the 13% figure. 

Therefore, I can claim that if I could run a survey like this —with the exact same sample size and the same design— 100 times, I believe that 95 of them would contain the percentage of non-citizen Hispanics in the population who would say that they are registered to vote, and that it'd be within the 9% to 17% (13% +/- 4) boundaries of the confidence interval. I cannot say the same about the remaining 5 surveys. In those, the results could be completely different.

However, this calculation would only work if the percentage is close to 50% —see the comments section,— and if the sample is carefully and randomly chosen specifically in relationship to the question at hand. If it isn't, as it's the case here, uncertainty may increase astronomically.

This is, by the way, without taking into account other possible uncertainties, like the one surrounding the 11.8 million figure from the Census, which I didn't bother to check.

Again, what I've done here is just an arithmetic game. I think that all these figures and computations are way too uncertain to say anything that isn't absurd. Based solely on the survey data, we cannot suggest that we have an illegal voter problem in the U.S. —or that we don't. The data from the survey is useless for this purpose, and it certainly doesn't support a headline saying that 2 million people are illegally registered to vote.* Besides the problems with casually extrapolating from a sample, the survey wasn't designed to analyze voter fraud anyway.

(My friends, statisticians Heather Krause, Diego KuonenJerzy Wieczorek, and Mark Hansen, read this post and provided very valuable feedback. Thanks a lot!)

This funny XKCD cartoon is worth remembering, by the way:

*Disclaimer: I am not opposed to requiring a photo ID to vote in principle. There are arguments in favor and against it in the U.S: We have solid evidence that photo ID laws are used to restrict the vote of minorities (this book is a great starting point); but I also understand the concerns of those who want to keep elections 100% clean. I'm Spanish, and all Spaniards have a DNI (National Identification Document,) which you must show to vote. I can't see why this cannot happen in the U.S, too. There are some big and hairy “buts” in this comparison, though: Spain's DNI is extremely easy and inexpensive to get. And we are registered to vote by default.

UPDATE (02/21/2016): The great Mark Hansen, from Columbia University, has just sent me an e-mail with these comments about the confidence intervals:

But what does this interval 13%+/-4 mean? Suppose 100 other organizations ran the same survey —with the exact same sample size and the same design -- but drawing their own sample of the population. OK 100 is large, but there are lots of polling organizations out there taking the public's temperature on various topics. Suppose each of the 100 groups then computes an interval like I've done here. In some samples, they will again find 34 people claiming to have registered to vote. But some groups will have a number that's larger, and some will have a number that's smaller. It depends on the sample the group has drawn.

However, because they are all taking random samples, statisticians assure us that we should expect 95 of the 100 intervals they've constructed will contain the number you're interested in, the true percentage of non-citizen Hispanics in the population who would say that they are registered to vote. Now here's the trick. You don't know if the true percentage you're after is in any particular interval. Like your 13%+/-4. This interval could be one of the 95 that contains the true percentage, or, if you're unlucky, it is one of the 5 that doesn't. You don't know.

This is what is meant when statisticians use the term "confidence." It might not sound particularly confident, but the researchers who pioneered this idea were looking for "rules to govern our behavior... which insure that, in the long run of experience, we shall not be too often wrong." So the 95 out of 100 refers to repeated uses of the survey _procedure_ (conduct random sample, construct interval). Wording it differently, it tells us that our confidence intervals won't actually contain the number we are hoping discover in 5 out of 100 surveys. Yes,  5 out of 100 organizations will get it wrong. That's 1 in 20. Of course that begs the question, who decided being wrong 1 in 20 times is OK? Save that for another post!


Friday, February 3, 2017

Announcing Visual Trumpery: A lecture tour

Trumpery means worthless nonsense, something that is, simultaneously, deceitful and showy. When I learned about this splendid and timely word, I immediately thought that it could be the right title for a two-hour presentation —and even a book— that I've been entertaining for a while. It'd describe strategies to fight back against the deluge of bullshit coming from left and right (sleaziness is quite bipartisan,) and it'd be not just for data designers or journalists, but for school teachers and citizens in general.

This morning I thought that I could do a tour of free talks, titling them Visual Trumpery (I'm still undecided about the subtitle.) I'd like to begin in the second semester of 2017. Maybe with your help.

I've already been contacted by people who might bring this talk to New York, Atlanta, Portland, Barcelona, Paris, etc. If you're interested in hosting a Visual Trumpery event in your city, contact me. I'll give priority to U.S. cities, but I'm not ruling out other countries.

Here's what I propose:

1. I won't charge my regular daily fee. I'll waive salary. You'll only need to cover a roundtrip flight (coach,) hotel, if needed (I'm not picky,) taxi, and meals (I'm a very easy guest.) I'd also appreciate a glass of red wine or two, but that's not mandatory.

2. The talk will cover data, visualization, and infographics, but it won't be limited to them. I'd like the presentation to be a bit broader: Improving rational thinking among the public, fostering a better understanding of probability and uncertainty, etc.

3. The talk must not be limited to your organization, company, or educational institution. It must be open to the public, free, and promoted widely. Closer to the Summer, I'll try to come up with a flyer and a poster you can use to spread the word. I may even create a small dedicated website.

Let's see if we can make this happen. Looking forward to hearing from you.

Wednesday, January 25, 2017

Visualizing word popularity: Scrollytelling, line graphs, micromaps, etc.

Another day, another visualization coming from Google News Lab, now in collaboration with Polygraph: The Year In Language 2016. It combines scrollytelling with multiple animated and interactive charts to highlight words which grew in popularity during 2016. Think of terms like “bigly” and ”gaslighting,” for instance.

(Full disclosure: I'm a consultant for Google News Lab's visualizations, as I've mentioned before.)

My favorite part of The Year In Language 2016 is the linked micromaps at the bottom of the page. Here are the ones for the aforementioned words:

Monday, January 23, 2017

Our new visualization: Presidential inaugural addresses

Google has just launched a new visualization in the series I'm helping with. Its title is Inaugurate, and the author is Jan Willem Tulp. It's already been featured by USA Today, and Simon Rogers, who is at the helm of this series —previous ones are WorldPotus and Rhythm of Food,— has written a good explainer about it. I worked as an advisor.

Inaugurate visualizes the first presidential inauguration speeches of the 12 most searched-for presidents. Each speech is a column, and each rectangle within one column represents the length of a sentence. It was surprising for me to see how short Abraham Lincoln's 1861 address was.

The circles on top of the rectangles mark mentions of the most common subjects —God, democracy, justice, economy, etc.,— and the gradient on top of each section represents the current search interest in Google: Red is high interest; blue is low. In the image below you can see that “liberty” appears in many speeches, and it's still a topic of interest in Google searches:

Exploring this visualization will reveal some insights. For instance, go to the “Emotions & Human Values” section. In the addresses by George W. Bush or Barack H. Obama, subjects are varied: compassion, courage, happiness, dignity, and morality. Trump is all patriotism and, above all, loyalty. In the “Society” section you'll also see that wealth appears in Trump's speech more times than in any other:

Saturday, January 14, 2017

When doing data reporting, look at the raw numbers, not just at percentages —and write an accurate headline

A headline in The New York Times today reads “In the Shopping Cart of a Food Stamp Household: Lots of Soda.” Is it true?

The story itself provides hints that the headline is misleading, and likely to damage the image of the SNAP program and its beneficiaries. This is dangerous, considering that many readers look at clickbaity headlines, like the NYTimes one, but don't read stories. SNAP households aren't different than the rest of households. Most Americans buy and drink way too much soda and, as a result, obesity and Type II diabetes have reached epidemic levels.

The story says that households that receive food stamps spend 9.3% of their grocery budget on soft drinks, while families in general spend 7.1%. This is one of those cases when reporting just percentages, and not taking into account other variables, such as total spending in groceries, sounds fishy.

Here's why: Never focus just on the data in front of your eyes, or on derived variables, like percentages or rates. Think about the raw numbers behind them. Say that you are comparing two families of four people each, a SNAP one and a non-SNAP one. They spend, respectively, $100 and $200 on groceries —my guess is that SNAP recipients are poorer than Americans in general.

The SNAP household would be spending $9.3 a week on soda, or $2.3 per person; the non-SNAP one —$14,2, or $3.6 per person! Who's buying “lots of soda” now?

To add insult to injury, the story says about the report from the Department of Agriculture: “One limitation of the report was that it could not always distinguish when SNAP households used their benefits, other money or a combination of the two to pay for transactions.” Michele Simon, a lawyer, is quoted in the story: “This is the first time we’ve had confirmation that this massive taxpayer program is promoting all the wrong kinds of foods.” Well, that isn't true. If the story is a good reflection of the U.S.D.A. report, there isn't enough basis to claim that tons of SNAP money is being used to pay for sugary drinks.

This doesn't render the story completely wrong —I couldn't find the original source,— but it does challenge its headline, which singles out SNAP families. The reactionary insurgency taking over government in a few days will likely use this as ammunition to attack safety net programs, as they know that most people share headlines through Facebook and Twitter, but don't spend time reading long and nuanced stories.

(One last note: newspaper reporters often don't write the headlines for their own stories; editors do.)

UPDATE: This post on Facebook by University of Minnesota's Joe Soss provides much more information, and a link to the report itself. I'll take a look at it, as it looks like that the story is much worse than it seemed to me at first.

Tuesday, January 10, 2017

In visualization, white space is your friend

One point I make in the introduction to graphic design classes I've taught in the past is that white space isn't empty space. Empty space is needlessly unused space; white space has meaning. There's even a good introduction to graphic design titled White Space is Not Your Enemy.

This morning's New York Times has a good example of that principle, courtesy of Margot Sanger-Katz and Quoctrung Bui. The top-right quadrant is occupied by a large majority of the American public —to the right and to the left,— and experts in gun violence. The bottom, a vast, blank ocean, probably belongs to Republican legislators, NRA lobbyists, and the minority President-elect.

(Click the image to expand.)