Wednesday, October 7, 2015

"The deception that lurks in our data-driven world"

From Fusion:
I start each day with a lie.

I get up, walk into the bathroom, and weigh myself. The data streams from the Chinese scale to an app on my phone and into an Apple dataserver, my permanent record in the cloud.

I started this ritual because I thought it would keep me honest. It would keep me from deluding myself into thinking my clothes didn’t fit because of an overzealous dryer rather than beer and cheese. The data would be real and fixed in a way that my subjective evaluations are not. The scale could not lie.

And of course, the number that shows on the scale isn’t, technically, a lie. It is my exact weight at that exact moment. If I were an ingredient in a cake recipe or cargo for a rocketship, this is the number you’d want to believe.

But one thing you learn weighing yourself a lot—or wrestling in high school—is that one’s weight, this number that determines whether you’re normal or obese, skinny or fat, is susceptible to manipulation. (This is the warning embedded in the pithy title of NYU professor Lisa Gitelman’s 2013 book: “Raw Data Is An Oxymoron.”)

If I want to weigh in light, I go running and sweat out some water before getting on the scale. If I’m worried that my fitness resolve is slipping and I need to scare myself back into healthy eating, I’ll weigh myself a little later—after some food and plenty of water— and watch my weight spike upwards.

Sure, the difference in all of these measurements is only plus or minus five pounds, but for someone with my own psychology—and maybe some of you—those differences are enough to make me this guy....

You might like to think that this is just one man’s data deception. That the data out in the rest of the world, like the stuff that gets published in science journals, is less susceptible to human manipulation.

But then you see studies like the one that recently came out in Science, America’s leading scientific journal, that subjected 100 supposed high-quality psychology papers to a large-scale replication study. When new research groups replicated the experiments in the papers to see if they’d get the same results, they were only able to do so 36% of the time. Almost two-thirds of the papers’ effects couldn’t be replicated by other careful, professional researchers.

“This project provides accumulating evidence for many findings in psychological research and suggests that there is still more work to do to verify whether we know what we think we know,” concluded the authors of the Science paper.

In many fields of research right now, scientists collect data until they see a pattern that appears statistically significant, and then they use that tightly selected data to publish a paper. Critics have come to call this p-hacking, and the practice uses a quiver of little methodological tricks that can inflate the statistical significance of a finding. As enumerated by one research group, the tricks can include:
  • “conducting analyses midway through experiments to decide whether to continue collecting data,”
  • “recording many response variables and deciding which to report postanalysis,”
  • “deciding whether to include or drop outliers postanalyses,”
  • “excluding, combining, or splitting treatment groups postanalysis,”
  • “including or excluding covariates postanalysis,”
  • “and stopping data exploration if an analysis yields a significant p-value.”
Add it all up, and you have a significant problem in the way our society produces knowledge....MUCH MORE