If you’ve been following this blog for a while, you’ll be no stranger to my views on what I believe is one of the most abused, and therefore now meaningless, words in scientific writing: ‘significance’ and her adjective sister, ‘significant’. I hold that it should be stricken entirely from the language of science writing.
Most science writing has become burdened with archaic language that perhaps at one time meant something, but now given the ubiquity of certain terms in most walks of life and their subsequent misapplication, many terms no longer have a precise meaning. Given that good scientific writing must ideally strive to employ the language of precision, transparency and simplicity, now-useless terminology should be completely expunged from our vocabulary.
‘Significance’ is just such a term.
Most interviews on radio or television, most lectures by politicians or business leaders, and nearly all presentations by academics at meetings of learned societies invoke ‘significant’ merely to add emphasis to the discourse. Usually it involves some sort of comparison – a ‘significant’ decline, a ‘significant’ change or a ‘significant’ number relative to some other number in the past or in some other place, and so on. Rarely is the word quantified: how much has the trend declined, how much did it change and how many is that ‘number’? What is ‘significant’ to a mouse is rather unimportant to an elephant, so most uses are as entirely subjective qualifiers employed to add some sort of ‘expert’ emphasis to the phenomenon under discussion. To most, ‘significant’ just sounds more authoritative, educated and erudite than ‘a lot’ or ‘big’. This is, of course, complete rubbish because it is the practice of using big words to hide the fact that the speaker isn’t quite as clever as he thinks he is.
While I could occasionally forgive non-scientists for not quantifying their use of ‘significance’ because they haven’t necessarily been trained to do so, I utterly condemn scientists who use the word that way. We are specifically trained to quantify, so throwing ‘significant’ around without a very clear quantification (it changed by x amount, it declined by 50 % in two years, etc.) runs counter to the very essence of our discipline. To make matters worse, you can often hear a vocal emphasis placed on the word when uttered, along with a patronising hand gesture, to make that subjectivity even more obvious.
If you are a scientist reading this, then you are surely waiting for my rationale as to why we should also ignore the word’s statistical meaning. While I’ve explained this before, it bears repeating.
Unless you are a rather young scientist who has had the rare privilege of avoiding ‘classical’ statistics training, then you are most certainly aware that ‘significant’ is the term used to describe the probability that a particular phenomenon under investigation could have arisen by random chance.
So-called ‘significance’ tests are known more formally as Neyman-Pearson (Null) Hypothesis tests (NPHT), where a single ‘null’ hypothesis is ‘rejected’ based on an arbitrary threshold probability of observing a value of the metric of choice as extreme as the one observed.
This probability – the well-known ‘P value’ – simply refers to the probability of making a mistake when rejecting the null hypothesis (a ‘Type I error’). Most disciplines still cling doggedly to a threshold probability of 0.05 (1 in 20 chance) below which the null hypothesis can be rejected.
If you’re an indoctrinated scientist, then this might sound perfectly reasonable, but I wager that it might strike non-scientists as being rather silly. Indeed, it is silly.
There is nothing important about 1 in 20. Nothing special happens when your probability of making a Type I error slips below 0.05. In other words, it’s absolutely meaningless. Would you cross a busy road if the probability of dying before reaching the other side was 1 in 20? Unless you were suicidal, I doubt it. Would you cross if it was 1 in 21? 1 in 25? I think you see where I’m going with this.
Whence did this entrenched, silly number come? As I’ve mentioned before, it is a most unfortunate accident of efficiency that we use 0.05 today. It is in fact a holdover from the days when statistical tables had to be printed in the back of textbooks. There was traditionally insufficient space to write all manually calculated rejection probabilities for distribution-specific null-hypothesis tests, so they were often truncated at 0.05. It’s as simple and forehead-slapping as that.
The last remaining supporters of the Neyman-Pearson Hypothesis Testing paradigm might claim there’s a time and a place for it, as long as we avoid arbitrary binary assessments of ‘yes’ and ‘no’, for probability is an infinitely sliced gradient between 0 and 1. However, the other problem with this paradigm is that Neyman-Pearson approaches cannot simultaneously consider other dimensions of the question, namely, evaluating the relative statistical support for an alternative model. Neither can null-hypothsis tests evaluate Type II errors (the probability of making an error when failing to reject the null hypothesis). The alternative – and thankfully growing – paradigm is the multiple working hypotheses approach.
Instead of considering a single (null) hypothesis and testing whether the data can falsify it in favour of some alternative (which is not directly tested), the multiple working hypotheses approach does not restrict the number of models considered. The approach can specifically accommodate the simultaneous comparison of hypotheses in systems where it is common to find multiple factors influencing the observations made.
I won’t go on too much more about multiple working hypotheses because there are some excellent resources to show you how it works (see here, here, here, here, here, and here for more information). Suffice it to say that there are now no more excuses – neither terminological nor statistical – for using ‘significant’ in your writing.
So let’s dig a deep grave and bury significance permanently. It does no one any good any more.
CJA Bradshaw