Science/Not Science: Crusading Against Multiple Regression Analysis
"A huge range of science projects are done with multiple regression analysis. The results are often somewhere between meaningless and quite damaging. ... "
From Edge:
I hope that in the future, if I’m successful in communicating
with people about this, that there’ll be a kind of upfront warning in
New York Times articles: These data are based on multiple regression
analysis. This would be a sign that you probably shouldn’t read the
article because you’re quite likely to get non-information or
misinformation.
RICHARD NISBETT is a professor of psychology and co-director of the
Culture and Cognition Program at the University of Michigan. He is
the author of Mindware: Tools for Smart Thinking; and The Geography of Thought. Richard Nisbett's Edge Bio Page.
THE CRUSADE AGAINST MULTIPLE REGRESSION ANALYSIS
The thing I’m most interested in right now has become a kind of
crusade against correlational statistical analysis—in particular, what’s
called multiple regression analysis. Say you want to find out whether
taking Vitamin E is associated with lower prostate cancer risk. You look
at the correlational evidence and indeed it turns out that men who take
Vitamin E have lower risk for prostate cancer. Then someone says,
"Well, let’s see if we do the actual experiment, what happens." And what
happens when you do the experiment is that Vitamin E contributes to the
likelihood of prostate cancer. How could there be differences? These
happen a lot. The correlational—the observational—evidence tells you one
thing, the experimental evidence tells you something completely
different.
In the case of health data, the big problem is something that’s come
to be called the healthy user bias, because the guy who’s taking Vitamin
E is also doing everything else right. A doctor or an article has told
him to take Vitamin E, so he does that, but he’s also the guy who’s
watching his weight and his cholesterol, gets plenty of exercise, drinks
alcohol in moderation, doesn’t smoke, has a high level of education,
and a high income. All of these things are likely to make you live
longer, to make you less subject to morbidity and mortality risks of all
kinds. You pull one thing out of that correlate and it’s going to look
like Vitamin E is terrific because it’s dragging all these other good
things along with it.
This is not, by any means, limited to health issues. A while back, I read a government report in The New York Times on
the safety of automobiles. The measure that they used was the deaths
per million drivers of each of these autos. It turns out that, for
example, there are enormously more deaths per million drivers who drive
Ford F150 pickups than for people who drive Volvo station wagons. Most
people’s reaction, and certainly my initial reaction to it was, "Well,
it sort of figures—everybody knows that Volvos are safe."
Let’s describe two people and you tell me who you think is more
likely to be driving the Volvo and who is more likely to be driving the
pickup: a suburban matron in the New York area and a
twenty-five-year-old cowboy in Oklahoma. It’s obvious that people are
not assigned their cars. We don’t say, "Billy, you’ll be driving a
powder blue Volvo station wagon." Because of this self-selection
problem, you simply can’t interpret data like that. You know virtually
nothing about the relative safety of cars based on that study.
I saw in The New York Times recently an article by a
respected writer reporting that people who have elaborate weddings tend
to have marriages that last longer. How would that be? Maybe it’s just
all the darned expense and bother—you don’t want to get divorced. It’s a
cognitive dissonance thing.
Let’s think about who makes elaborate plans for expensive weddings:
people who are better off financially, which is by itself a good
prognosis for marriage; people who are more educated, also a better
prognosis; people who are richer; people who are older—the later you get
married, the more likelihood that the marriage will last, and so on.
The truth is you’ve learned nothing. It’s like saying men who are a
somebody III or IV have longer-lasting marriages. Is it because of the
suffix there? No, it’s because those people are the types who have a
good prognosis for a lengthy marriage.
A huge range of science projects are done with multiple regression
analysis. The results are often somewhere between meaningless and quite
damaging.
I find that my fellow social psychologists, the very smartest ones,
will do these silly multiple regression studies, showing, for example,
that the more basketball team members touch each other the better the
record of wins.
I hope that in the future, if I’m successful in communicating with people about this, there’ll be a kind of upfront warning in New York Times articles: These data are based on multiple regression analysis.
This would be a sign that you probably shouldn’t read the article
because you’re quite likely to get non-information or misinformation.
Knowing that the technique is terribly flawed and asking
yourself—which you shouldn’t have to do because you ought to be told by
the journalist what generated these data—if the study is subject to
self-selection effects or confounded variable effects, and if it is, you
should probably ignore them. What I most want to do is blow the whistle
on this and stop scientists from doing this kind of thing. As I say,
many of the very best social psychologists don’t understand this point.
I want to do an article that will describe, similar to the way I have
done now, what the problem is. I’m going to work with a statistician
who can do all the formal stuff, and hopefully we’ll be published in
some outlet that will reach scientists in all fields and also act as a
kind of "buyer beware" for the general reader, so they understand when a
technique is deeply flawed and can be alert to the possibility that the
study they're reading has the self-selection or confounded-variable
problems that are characteristic of multiple regression.
Health statistics in general, you should be extremely dubious about,
unless it’s explicitly stated that it’s an experimental study. The
consequences of this junk research are enormous. I’m trying to find ways
to get people to stop doing it and to make the general reader aware
that they have to ask themselves, "Do I think that this is a
correlational study or is it an actual experiment?"...MORE