The BetaBinomial model is the “hello world” of Bayesian statistics. That is, it’s the first model you get to run, often before you even know what you are doing. There are many reasons for this:
 It only has one parameter, the underlying proportion of success, so it’s easy to visualize and reason about.
 It’s easy to come up with a scenario where it can be used, for example: “What is the proportion of patients that will be cured by this drug?”
 The model can be computed analytically (no need for any messy MCMC).
 It’s relatively easy to come up with an informative prior for the underlying proportion.
 Most importantly: It’s fun to see some results before diving into the theory! 😁
That’s why I also introduced the BetaBinomial model as the first model in my DataCamp course Fundamentals of Bayesian Data Analysis in R and quite a lot of people have asked me for the code I used to visualize the BetaBinomial. Scroll to the bottom of this post if that’s what you want, otherwise, here is how I visualized the BetaBinomial in my course given two successes and four failures:
The function that produces these plots is called prop_model
(prop
as in proportion) and takes a vector of TRUE
s and FALSE
s representing successes and failures. The visualization is created using the excellent ggridges
package (previously called joyplot). Here’s how you would use prop_model
to produce the last plot in the animation above:
1 2 

The result is, I think, a quite nice visualization of how the model’s knowledge about the parameter changes as data arrives. At n=0
the model doesn’t know anything and — as the default prior states that it’s equally likely the proportion of success is anything from 0.0 to 1.0 — the result is a big, blue, and uniform square. As more data arrives the probability distribution becomes more concentrated, with the final posterior distribution at n=6
.
Some added features of prop_model
is that it also plots larger data somewhat gracefully and that it returns a random sample from the posterior that can be further explored. For example:
1 2 3 

1


1 2 

So here we calculated that the underlying proportion of success is most likely 0.77 with a 95% CI of [0.68, 0.84] (which nicely includes the correct value of 0.75 which we used to simulate big_data
).
To be clear, prop_model
is not intended as anything serious, it’s just meant as a nice way of exploring the BetaBinomial model when learning Bayesian statistics, maybe as part of a workshop exercise.
The prop_model
function
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 
