I went to a library talk, by my friend Jonathan Skinner, reviewing a book, The Seven Pillars of Statistical Wisdom, by Stephen Stigler. Jonathan was a professional statistician. One thing I enjoyed was his quoting Christopher Hitchens: “What can be asserted without evidence can also be dismissed without evidence.”
Of course they tell us heads and tails should be 50-50. But I also remembered a guy who wrote in to Numismatic News, doubting that theory, and reporting his own test. He flipped a coin 600 times and got 496 heads! Of course, the probability of that result is not zero. But I actually calculated it, and the answer is one divided by 6.672 times 10 to the 61st power. For readers not mathematically inclined, that’s an exceedingly tiny probability. Ten to the 61st power means 1 followed by 61 zeroes.
But Skinner related a similar tale, of Frank Weldon who (in 1894) really did try to put the theory to a rigorous test. He rolled a dozen dice 26,306 times, and recorded the results. That huge effort would make him either a martyr to science, or a fool (like the Numismatic News guy) because, after all, what is there to test? Is there any sane reason to doubt what such a simple probability calculation dictates?
Well, guess what. Weldon found the numbers five and six over-represented. With six faces to each die, you should expect any two numbers to come up one-third of the time, or 33.33%. But Weldon got 33.77%. You might think that’s a minor deviation, down to random chance. But statisticians have mathematical tools to test for that, i.e., whether a result is “statistically significant.” And the odds against Weldon’s result were calculated to be 64,499 to one.
So another fool (er, researcher), Zacariah Labby, decided to repeat Weldon’s experiment, but this time using machinery to roll the dice, and a computer to tabulate the results. He got 33.43%, a smaller deviation, but still significantly significant.
How can this be explained? It had been suggested that the small concave “pips” on the die faces denoting the numbers might affect the results. And then Labby measured his die faces with highly accurate equipment and found the dice were not absolutely perfect cubes.