For the past two years, I have been a TA for the awesome Public Health Biostatistics course taught by Drs. Margaret Taub and Leah Jager. The course is an introduction to statistics course, aimed at a non-mathematical audience. Frequentist confidence intervals (introduced as just confidence intervals), are the first topic that students struggle with in the course, and I don’t blame them. For those that need a refresher in the definition of a (95%) frequentist confidence interval, here you go:
“If we have a 95% confidence interval for a parameter, this means that if we were to repeat the experiment an infinite amount of times and calculate a confidence interval for each experiment, 95% of those confidence intervals would contain the true parameter value”
That’s quite a mouthful, and creates a huge issue when trying to explain this to students–almost all students want to interpret a 95% confidence interval as “there is a 95% probability that our confidence interval contains the true parameter value”. However, that’s the definition of a Bayesian confidence interval, not the frequentist confidence interval. Maybe the only bright side of not teaching Bayesian statistics in an introductory course is that they at least won’t get confused with two confidence interval definitions.
To show students why this is not true, I took out a coin and asked students what the probability of me flipping it and getting a heads is–everyone responded with the correct answer, 50%. I then flipped it, held it in my hands so that no one could see what side was facing up, and slightly reworded my question. I asked “what is the probability that the coin I just flipped is heads?” Again, all the students responded with 50%. However, this is not true, if we’re using the frequentist definition of probability, and not the Bayesian subjective definition of probability. If I have already flipped the coin, the side facing up either is or is not heads–there’s no more randomness! Just because an event is unknown (whether the side facing up is heads) does not mean that it’s random.
To connect this with confidence intervals, I make the analogy that our experiment and corresponding confidence interval is our coin flip. Rather than being interested in whether the coin is heads, the event we’re now interested in is now whether our confidence interval contains the true parameter value. Unfortunately, whether the confidence interval contains the true mean is unknown–either it does, or it doesn’t. This is really unsatisfactory to students (and to me, which is one of the reasons I’ve been drawn to Bayesian statistics). However, the nice thing about a 95% confidence interval is that we’ve stacked the deck in our favor–in the coin toss, there was only a 50% chance that we were going to get a heads. Now, before the experiment, there is a 95% chance our confidence interval will contain the true parameter value.
I’m not sure if this is the most clear way to convey what we mean when we define a frequentist confidence interval, although I think it does an effective job of differentiating between an event being random versus unknown. I also think this post highlights how truly complicated the idea of a frequentist interval is, and how inadequate saying “We are 95% confident that the mean is between a and b” is to convey the idea of a confidence interval. I would guess that many non-statisticians that deal with confidence intervals in their job have either forgotten the right confidence interval definition or never learned the right definition. In my opinion, saying “We are 95% confident…” to a non-statistician collaborator leads them towards thinking that there is a 95% probability our interval contains the true parameter value.
How can confidence intervals be taught better? I think doing simulations is helpful for students to see what we mean by repeating an experiment an infinite amount of times. Daniel Kunin and co have an awesome visualization at their Seeing Theory website that I would highly recommend checking out. Finally, I think we need to be considering teaching Bayesian statistics more frequently (pun intended) in introductory statistics courses.