Friday, December 19, 2014

Adventures in the margin of error

El País has published a series of charts showing the percent of people in Catalonia who are either in favor or against the independence from Spain.

The headline of the story (English) where the graphic is shown claims that there are more Catalans saying ‘no’ to the independence than ‘yes’.

Not quite. That headline is wrong.

Why? Well, if you read the English summary of the survey used by El País, you will notice that the sampling error is +/- 2.95. The real values in the pie chart could be, therefore, almost 3 percentage points larger or smaller. In fact, right now there may be more people who want the independence than people who reject that possibility! The ‘yes’ can be as low as 41.55% or as high as 47.45%.* If we take error into account, ‘yes’ and ‘no’ are practically tied.

(*I've been reminded by Ben Jones that a more detailed explanation may be necessary: In reality, it is more probable that the real values are closer to the ones shown in the chart than they are as high or low as the boundaries of the confidence interval. However, remember that the difference here is just 0.8. See this article. As it is explained in the middle of it, the closer the two values are to a 50/50 split, which is the case here, the larger the error is.)

What could El País have done with its headline? Perhaps it could have highlighted the comparison with previous surveys discussed in the story. The ‘yes’ option has experienced a noticeable drop larger than the sampling error from 49.4% to the current 44.5%. That's news. It could have been a good headline: The ‘yes’ was clearly leading a while ago, but has become much less popular.

Lesson: When using stats, always mention the sample size, the margin of error, and the level of confidence. Otherwise, your numbers will be meaningless.

UPDATE 1 (December 20) Ramón Martínez has shared a graphic he's just made:



UPDATE 2 (December 20) It seems that no editor at El País wants to use mathematics. They have the same headline in today's first page of the print edition.

UPDATE3 (December 21) A reader has pointed out that in the story published the day after I wrote this post, which criticizes an earlier one, El País did mention the sampling error. See screenshot at the bottom. It says that it's “a relevant factor because of how close the figures are.” Indeed! This makes matters even worse. Many, many people only read headlines and ledes. That's why so important that they are accurate to the extreme.

UPDATE 4 (December 26) Why accurate headlines matter.