From The Conversable Economist:
For the uninitiated, the idea of "statistical significance" may seem drier than desert sand. But it's how research in the social sciences and medicine decides what findings are worth paying attention to as plausible true--or not. For that reason, it matters quite a bit. Here, I'll sketch a quick overview for beginners of what statistical significance means, and why there is controversy among statisticians and researchers over what research results should be regarded as meaningful or new.To gain some intuition , consider an experiment to decide whether a coin is equally balanced, or whether it is weighted toward coming up "heads." You toss the coin once, and it comes up heads. Does this result prove, in a statistical sense, that the coin is unfair? Obviously not. Even a fair coin will come up heads half the time, after all.You toss the coin again, and it comes up "heads" again. Do two heads in a row prove that the coin is unfair? Not really. After all, if you toss a fair coin twice in a row, there are four possibilities: HH, HT, TH, TT. Thus, two heads will happen one-fourth of the time with a fair coin, just by chance.
What about three heads in a row? Or four or five or six or more? You can never completely rule out the possibility that a string of heads, even a long string of heads, could happen entirely by chance. But as you get more and more heads in a row, a finding that is all heads, or mostly heads, becomes increasingly unlikely. At some point, it becomes very unlikely indeed.Thus, a researcher must make a decision. At what point are the results sufficiently unlikely to have happened by chance, so that we can declare that the results are meaningful? The conventional answer is that if the observed result had a 5% probability or less of happening by chance, then it is judged to be "statistically significant." Of course, real-world questions of whether a certain intervention in a school will raise test scores, or whether a certain drug will help treat a medical condition, are a lot more complicated to analyze than coin flips. Thus, so practical researchers spend a lot of time trying to figure out whether a given result is "statistically significant" or not.
Several questions arise here.
1) Why 5%? Why not 10%? Or 1%? The short answer is "tradition." A couple of year ago, the American Statistical Association put together a panel to reconsider the 5% standard. The
Ronald L. Wasserstein and Nicole A. Lazar wrote a short article :"The ASA's Statement on p-Values: Context, Process, and Purpose," in The American Statistician (2016, 70:2, pp. 129-132.) (A p-value is an algebraic way of referring to the standard for statistical significance.) They started with this anecdote:
"In February 2014, George Cobb, Professor Emeritus of Mathematics and Statistics at Mount Holyoke College, posed these questions to an ASA discussion forum:Q:Why do so many colleges and grad schools teach p = 0.05?
A: Because that’s still what the scientific community and journal editors use.
Q:Why do so many people still use p = 0.05?
A: Because that’s what they were taught in college or grad school.Cobb’s concern was a long-worrisome circularity in the sociology of science based on the use of bright lines such as p<0 .05:="" because="" blockquote="" do="" e="" it="" nbsp="" s="" teach.="" teach="" we="" what=""> But that said, there's nothing magic about the 5% threshold. It's fairly common for academic papers to report the results that are statistically signification using a threshold of 10%, or 1%. Confidence in a statistical result isn't a binary, yes-or-no situation, but rather a continuum....MORE 0>