Bernoulli’s Fallacy, Aubrey Clayton, Columbia University Press, 2021
I picked up the book because the last 2 years has seen a flood of mis-used statistics, especially in the medical fields. I can smell badly used statistics, but am clunky at calculating anything past poker hands, multivariable correlation, and the unique way that quantum mechanics uses probability, see here. Plus, I was living it up in Harvard Square, which requires visiting several bookstores.
I highly recommend Bernoulli’s Fallacy. Not to teach you how to calculate statistics or probabilities, but to help get your mind around some fundamentals like
“Statistics” has a clear definition. Does “Probability”?
The difference between the deductive use of statistics and the inductive use of probability.
The “Replication Crisis”, how did we got into it, and what are some ways out of it?
Realizing that all along you were justified for feeling that disproving the Null Hypothesis was a stupid way to use statistics and an even stupider way to do experiments.
Null Hypothesis Significance Testing
Taking these in reverse, let’s start with the sphincter-puckering “Null Hypothesis Significance Testing” (NHST). Imagine I am testing a new drug, I give it to my test subjects. I assume that the drug has no benefit - this is the Null Hypothesis. I analyze my results, see some effect, and calculate what is the likelihood that my drug could truly have no benefit, yet I would see the data that I recorded. Yes, your mind is reeling for good reason. Why would I assume my drug has no benefit? I just spent $50Million developing it. Why can’t I ask the question about how likely is it that my drug is helping people - - because the NHST cannot tell you that. Yup. Read that one again.
To answer this panty twist, Aubrey Clayton does a very good job through telling us the fundamentals of statistics, with its historical development, with many examples of varying complexity. I cannot summarize his answer for you in a few sentences, but I can recall one startling example. In 1968 a survey was given to 57,000 Minnesota residence, asking them all kinds of question: your birth order, views on religion, money habits, exercise habits, views on politics, etc…. dozens of different attributes. Every attribute was “statistically significant” with every other. If you tried applying NHST to, say, your birth-order versus the hours you exercise each week, the two have a p<0.05 - that is, if you assumed that there was no connection between being the youngest child and your exercise habits, then the data said that “Null Hypothesis” was wrong. And this was the case for every attribute. Views about money correlated with what time you woke up in the morning. Opinions about homosexuals correlated with whether you finished all the food on your plate. Everything was statistically connected to everything else. If the only thing you cared about is the likelihood that the Null Hypothesis was false, then you could prove anything you wished from the Minnesota study.
It is useful to know that everything about us is connected to everything else about us. But no medical studies care about that.
Replication Crisis
The Replication Crisis is the fact that most published research studies cannot be reproduced. Of course since about 1950 very few studies get replicated because there is no money to be made that way - not for the journals, not for the drug companies, not for the professors who must do exciting new research to justify their jobs. Yet, somehow, John Ionnidis and his team managed to secure $$$ and launch replication studies. The idea caught some momentum. More people tried. The upshot is that across most disciplines - but especially medicine, psychology, sociology, and economics - about 1/2 of the studies could not be reproduced. And of the ones that could be replicated, most of the replication saw only about 50% of the efficacy that was claimed in the original study. Look at your list of prescriptions, chances are 50/50 that any one of them does not help you. Cancer studies were the worst, only 11% of those could be reproduced. We can only imagine what this means for the Higgs Boson.
How we get out of the Replication Crisis is to abandon most of the statistical methods that are currently in use, the prime offender being the Null Hypothesis Significance Testing, but there is a list of other equally mind-numbing statistical tests which, like the NHST, tell you nothing about how likely your drug is to help people. I was surprised to read that several top journals of statistics, and several top journals in medicine have been advocating for this for years. Aubrey Clayton does a very good job of explaining how bringing back Bayesian Statistics can easily solve the problem, but I don’t know if I will get to explaining that in this post. (Plug to buy Clayton’s book!)
The difference between the deductive use of statistics and the inductive use of probability
I am staring at my glass of Talisker, wondering how to summarize this one….
This took me a while to cotton-on to. Almost all of modern statistics and probability is based upon sampling likelihoods. “If I have an urn filled with thousands of black and white balls, and I take out one ball at a time, and I do this for a very long time…” or, “If I have a standard card deck, and I deal out bridge hands, for a very long time…” In modern statistics we are always starting with a known reservoir. Starting from this known reservoir, and assuming that absolutely nothing else in the universe can affect my sampling, what are the odds of this sampling. This is deductive reasoning. Like logic. Or like starting from F=ma and deducing the mass from a measured force and acceleration.
But in a all actual modern scientific experiments we do not have a known reservoir, and we are not deducing anything. In modern studies we gather data, and wish to infer the likelihood that our hypothesis is correct. I have to admit, from my schooling, I used to think, “just assume your hypothesis is correct, then deduce what sampling frequencies you would see in your data.” But science does not work this way. Math does not work this way. Logic does not work this way. Every frickin’ statistical measure in the standard playbook is deductive, assuming you are dealing from a known deck. When you are testing a new drug you have no known deck. So what are you comparing to?
This gets us to the last
“Statistics” has a clear definition. Does “Probability”?
Clayton gives plenty of examples throughout the book, of absurdities that result from using deductive sampling statistics in place of inductive probabilistic statements. In the example of the card deck, just think of how many experiments you would need to run to induce ( predict, guess ) the cards that are actually in our well-known 52 card deck. This is not an experimentally feasible task. Not even super computers can deal enough decks to get 20 straight flushes in the age of our Universe. I say 20 straight flush to get good enough sampling. The multiverse does not contain enough time for that many deals.
At some point Statistics got munged up with Probability. Statistics is what you expect to see on the poker table, given that you already know the 52 card deck. But what is Probability?
When you get into your car, you already know the statistics. But you are not deciding anything based upon that. You are deciding to drive your car based upon your personal perception of the Probability of you dying by driving this time. If you are a guy, and you are deciding whether or not to walk up to woman and say something, “what did you think of the movie?…” you already know the statistics. But you do not care about the statistics, you are only factoring in the Probability that she will not grimace and walk away. When I sit down at the poker table, I might have years of experience that with this dealer I never do well. I know the statistics of the 52 card deck, but I also know that something different happens with this dealer. Hence, I know something about the Probability of winning with her.
The difference between Bayesian probability and card-dealing-statistics is something like all these examples. We do not live in a Universe of giant urns from which we draw white and black balls. We almost always know much more about the urn. When I get into the car, I know a lot about my level of tiredness, or emotional excitement. When I walk up to the woman I do know if I have an attraction to her. If I develop a PCR test which tests “positive” for 100 people, and 98 of those people have no symptoms of respiratory illness - well, common sense would say there is something wrong with your PC test.
Appendix - Eugenics
One of my favorite aspects of the book was the historical connection to eugenics. The people who pushed that predictive Probability is the same as sampling Statistics were eugenicists. It was terrifying to read the writings of American statisticians who were laying out the math for why Jews were inferior, why negros were inferior. Terrifying to read that with the help of objective statistics we can see that genocide is a reasonable step to improve the races. A reasonable step. Wow. To use math to justify killing millions of children. Yet this is the historical fact. You can read their writings. Clayton gives enough quotes from the literature to send chills down your spine.
A point here is that predictive Probability is never the same as sampling statistics. Your boss is a man from Chennai India. What is the probability that he is a crook? Are all men from Chennai crooks? What about your perceptions of this particular boss? You meet a black man on the street. What are the statistics for incarceration of black men? What is the probability that this particular man will hurt you? If you go by the city statistics on incarceration of black men you will get one answer. If you go by your personal perceptions of this particular individual standing in front of you, you might get a very different conclusion.
This, simply put, is the error that all of modern science is under with its slavish adherence to sampling statistics in place of probabilistic inference.
A friend of mine and I used to debate this while playing Backgammon. He had a degree in statistics and me, I am a self-educated barbarian. He would argue that there is a one in six chance of rolling any number on the dice and I would argue that that is only true for an infinite number of rolls. That within the limited scope of any game, there would be a wave of probability which I would base on what he or I were actually rolling. There is no way to predict what the probability is prior to playing. All one can do is try to witness the tendency of the dice and see if a pattern emerges and quite often there would be a pattern that contradicted the one in six and by utilizing that pattern I would beat him much to his annoyance.
When it comes to math, even finding the mean of a data set distorts the data.
Eugenics and eugenicists. It is not surprising that it was eugenicists were the ones who pushed this method. Early 20th century eugenicists were also behind building the UN, as well as building many of the top universities. Cecil Rhodes for example. What is now the Galton Institute was once the British Eugenics Society and it was one Julian Huxley former chair of the BES for 3 years that started UNESCO. J Huxley was the brother of the author Aldous Huxley who wrote Brave New World. Modern day socialism has its roots in early 20th century eugenics. George Bernard Shaw was a eugenicist who advocated for death councils to whom one must justify their existence. Much of the corporate world back then were all behind the eugenics movement and I believe still are.
Part of these Eugenicists multigenerational agenda was to move society away from its past, to replace the religions of old with science and in the past century they have done a great job of turning science into a faith based dogmatic religion especially when it comes to Cosmology and Quantum Mechanics. Just believe the math, they exhort. In the 1930s Rockefeller Foundation funded Technocracy Inc, which was to research how to develop a society based on science and replacing money with energy credits. Back then they lacked the technology to do so, but with the advent of AI and "smart" devices that technology is available. The Great WEF Reset is nothing new and has been planned for for decades.
Self serving control of the peer review process coupled with worse than useless statistics that can be manipulated any way one likes... not much hope in there for scientific method. The problem I keep running into is that there's no clear way to talk to someone about looking at things differently who has a vested interest in a particular view. The utterly obvious may as well be invisible if it conflicts with a strongly held belief or gives credence to a deep rooted fear... 9-11 and the CDC's Covid "proofs" come to mind...