Monday, September 15, 2014

Facing the Dataclysm

At last, a “big data” book that is well written.

You could argue that other recent books about numbers, like Kaiser Fung's Numbersense and Jordan Ellenberg's How Not to Be Wrong, are far from dull, and you'd be right. But they aren't about “big data” per se; they are about classic numeracy. Christian Rudder's Dataclysm: Who We Are (When We Think No One's Looking), on the other hand, is about “big data”, whatever that term means, and also about how we can take advantage of it to know ourselves better. The book consists of a long series of stories with a common theme: Good data and analysis are the best antidote to prejudices and biases. Hans Rosling —whose motto “help us cross the river of myths” should be adopted as a battle cry by information designers and journalists alike— would be proud.

Needless to say, this post is just an excuse to recommend the book, even if I have only read half of it, around 150 pages. I began last night; that should give you an idea of how much I like it already. I foresee that it'll be among my favorites this year. I'm sharing some photos (see below) I took while highlighting and writing notes on the margins. Notice the beautiful charts. There are tons of them.

Related links: Rudder's blog, and review by the New Yorker.

UPDATE: Two mathematicians that I follow closely have written about Dataclysm and reached opposite conclusions. Jordan Ellenberg likes it quite a lot, and Cathy O'Neil hates it.

Even if I'm inclined to side with Ellenberg in this case, O’Neil's in-depth critique is excellent. However, I think that Rudder was anticipating some of the problems that she mentions. Just to give you an example, I read the entire book with this idea on the back of my mind: “This analysis applies only to users of dating websites.” I think that I did this unconsciously because Rudder himself suggests it at least twice (I'd need to double-check this to be sure.) He certainly gets carried away in some chapters, and extracts sweeping conclusions, though.

Although it's true, as O'Neil says, that some parts of the book (like the one about race) go too far, Dataclysm is far from being sloppy. It has plenty of thoughtful passages about the issues that the ubiquity of data may pose, to begin with. Judge for yourself by reading the last two or three pages shown at the bottom of this post. You may disagree with Rudder's enthusiasm about what the future holds, but a fool he is not.

Disclaimer: Links on this post are affiliate links to That means that I get a small amount of money for anything that you buy after clicking on them. I don't get any cash directly from Amazon, though, but gift cards that I use to buy books. The average monthly payment I got last year was $75.