by Leland Teschler, Executive Editor, @DW_LeeTeschler
The journal Science had an interesting take on a recent statement by the American Statistical Association (ASA): Imagine the American Physical Society convening a panel of experts to issue a missive to the scientific community on the difference between weight and mass. And imagine that the impetus for such a message was a recognition that engineers and builders had been confusing these concepts for decades, making bridges, buildings, and other components of our physical infrastructure much weaker than previously suspected.
Science is referring to the “ASA statement on statistical significance and p-values.” The ASA claims misunderstandings about the meaning of the P value undermine the credibility of many scientific claims. It further implies that these misunderstandings could explain why so many scientific findings described in journals can’t be replicated by other researchers.
Readers may recall from their undergrad class in basic statistics that the P value is one of the main statistical measures used to decide whether scientific results are statistically significant or not. And worries about its misuse have been circulating for years among statisticians. One of the worriers is William M. Briggs, Adjunct Professor of Statistics at Cornell University. “The technical definition of a P value is so difficult to remember that people just don’t keep it in mind. Even the Wikipedia page on P value has a couple of small errors,” he says. “People treat a P value as a magical thing. If you get a P value less than a magic number then your hypothesis is true. People don’t actually say it is 100% true, but they behave as though it is.”
The trouble is that relying on a P value as an indication of statistical significance can give researchers a false sense of security about their results. Briggs says even peer-reviewed papers from cancer researchers have been known to draw questionable conclusions based on this single metric. “P values can and are used to prove anything and everything. The sole limitation is the imagination of the researcher,” he says. “To the civilian, a small P value says that statistical significance has been found. This, in turn, says that the hypothesis is not just probably, but true.”
The ASA seems to come to the same conclusion but puts it more circumspectly: “The widespread use of ‘statistical significance’ (generally interpreted as ‘p ≤0.05’) as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.”
More proof that the concept of statistical significance is widely misunderstood comes from Steve Ziliak, an economics professor at Roosevelt University who coauthored a book called The Cult of Statistical Significance. Ziliak says he combed through papers published in a number of prestigious economics, operations research, and medical journals. He found numerous instances of researchers who used statistical significance as if it was the same as correlation. “They confuse the probability measure with a measure of correlation-of-effect size. But they are two very different things. It is almost embarrassing because it is such an elementary point.”
The upshot: Many researchers don’t know how to interpret their own data. This situation would be funny if the subject wasn’t so serious.