Big Data has its perils, to be sure. With huge data sets and fine-grained measurement,
statisticians and computer scientists note, there is increased risk of “false discoveries.”
The trouble with seeking a meaningful needle in massive haystacks of data, says Trevor
Hastie, a statistics professor at Stanford, is that “many bits of straw look like needles.”
Big Data also supplies more raw material for statistical shenanigans and biased factfinding
excursions. It offers a high-tech twist on an old trick: I know the facts, now let’s
find ’em. That is, says Rebecca Goldin, a mathematician at George Mason University,
“one of the most pernicious uses of data.”
Data is tamed and understood using computer and mathematical models. These models,
like metaphors in literature, are explanatory simplifications. They are useful for
understanding, but they have their limits. A model might spot a correlation and draw a
statistical inference that is unfair or discriminatory, based on online searches, affecting
the products, bank loans and health insurance a person is offered, privacy advocates
warn.
Despite the caveats, there seems to be no turning back. Data is in the driver’s seat. It’s
there, it’s useful and it’s valuable, even hip.
Veteran data analysts tell of friends who were long bored by discussions of their work but
now are suddenly curious. “Moneyball” helped, they say, but things have gone way
beyond that. “The culture has changed,” says Andrew Gelman, a statistician and political
scientist at Columbia University. “There is this idea that numbers and statistics are
interesting and fun. It’s cool now.”
Steve Lohr is a technology reporter for The New York Times.