Publishable Stuff

Rasmus Bååth's Blog


Bayesian First Aid

2014-01-10

So I have a secret project. Come closer. I’m developing an R package that implements Bayesian alternatives to the most commonly used statistical tests. Yes you heard me, soon your t.testing days might be over! The package aims at being as easy as possible to pick up and use, especially if you are already used to the classical .test functions. The main gimmick is that the Bayesian alternatives will have the same calling semantics as the corresponding classical test functions save for the addition of bayes. to the beginning of the function name. Going from a classical test to the Bayesian version will be as easy as going from t.test(x1, x2, paired=T) to bayes.t.test(x1, x2, paired=T).

The package does not aim at being some general framework for Bayesian inference or a comprehensive collection of Bayesian models. This package should be seen more as a quick fix; a first aid for people who want to try out the Bayesian alternative. That is why I call the package Bayesian First Aid.

Bayesian First Aid logo box

Why Bayesian First Aid?

To paraphrase John D. Cook paraphrasing Anthony O’Hagan: The great benefit with Bayesian inference is that it is a principled and coherent framework with to produce inferences that have a (relatively) straight forward interpretation (it’s all probability). Furthermore, once you’ve got the Bayesian machinery going you have great flexibility when it comes to modeling your data.

This flexibility is a great thing and stands in contrast with how classical statistics is presented; as a collection of prepackaged routines that you can select from but not tinker with. Just pick up any psychology statistics textbook (like me, me or me) and you will find the same canonical list: Binomial test, one sample t-test, two samples t-test, correlation test, chi-square test, etc. In Bayesian statistics you can quickly escape the cookbook solutions, once you get a hang of the basics you are free to tinker with distributional assumptions and functional relationships. In some way, classical statistics (as it is usually presented) is a bit like Playmobile, everything is ready out of the box but if the box contained a house there is no way you’re going to turn that into a pirate ship. Bayesian statistics is more like Lego, once you learn how to stick the blocks together the sky is the limit.

Playdo vs Lego

However, cookbook solutions (recipes?) have some benefits. For one thing, they are often solutions to commonly encountered situations, otherwise there wouldn’t be a need for a prepackaged solution in the first place. They are also easy to get started with, just follow the recipe. In R the statistical cookbook solutions are really easy to use, due to all the handy *.test function many classical analyses are one-liners.

A Bayesian analysis is more difficult to pull off in one line of R and often requires a decent amount of work to get going. The goal of Bayesian First Aid is now to bring in some of the benefits of cookbook solutions and make it easy to start doing Bayesian data analysis. The target audience is people that are curious about Bayesian statistics, that perhaps have some experience with classical statistics and that would want to see what reasonable Bayesian alternatives would be to their favorite statistical tests. For sure, it is good to have a firm understanding of the math and algorithms behind Bayesian inference but sometimes I wish it was as easy to summon a posterior distribution as it is to conjure up a p-value. Easy summoning of posterior distributions would be useful, for example, if you want to try teaching Bayesian stats backwards.

Bayesian First Aid Design Rationale

Bayesian First Aid is foremost inspired by John Kruschke’s Bayesian Estimation Supresedes the T-test (BEST) and the related BEST R package. BEST can be used to analyze data you would classically run a t-test on, but it does not have the same distributional assumptions as the t-test, and what more, it isn’t even a test! Both these differences are a Good Thing™. Instead of assuming that the data is normally distributed BEST assumes a t distribution which, akin to the normal distribution, is symmetric and heap shaped but, unlike the normal distribution, has heavy tails and is therefore relatively robust against outliers. Bayesian First Aid is going for the same approach, that is, similar distributional assumptions as the classical tests but using more robust/flexible distributions when appropriate.

BEST is neither a statistical test in the strict sense as there is no testing against a point null going on. Instead of probing the question whether the mean difference between two groups is zero or not BEST does parameter estimation, effectively probing how large a difference there is. The question of how large or how much is often the more sensible question to ask, to learn that pill A is better than pill B actually tells you very little. What you want to know is how much better pill A is. Check out The Cult of Statistical Significance (both a book and a paper) by Ziliak and McCloskey to convince yourself that parameter estimation is what you really want to do most of the time rather that hypothesis testing. As in BEST, the Bayesian First Aid alternatives to the classical tests will be parameter estimation procedures and not point null tests.

The main gimmick of Bayesian First Aid is that the Bayesian version will have almost the same calling semantics as the corresponding classical test function. To invoke the Bayesian version just prepend bayes. (as in BAYesian EStimation) to the beginning of the function, for example, binom.test becomes bayes.binom.test, cor.test becomes bayes.cor.test and so on. The Bayesian version will try to honor the arguments of the classical test function as far as possible. The following runs a binomial test with a null of 0.75 and an 80 % confidence interval: binom.test(deaths, total_ninjas, p=0.75, conf.level=0.8). The Bayesian version would be called like bayes.binom.test(deaths, total_ninjas, p=0.75, conf.level=0.8) where p=0.75 will be interpreted as a relative frequency of special interest and the resulting report will include the probability that the underlying relative frequency of ninja casualties go below / exceed 0.75. conf.level=0.8 will be used to set the limits of the reported credible interval.

A defining feature of Bayesian data analysis is that you have to specify a prior distribution over the parameters in the model. The default priors in Bayesian First Aid will try to be reasonably objective (if you believe it is silly to call a Bayesian analysis “objective” James Berger gives many god arguments for objective Bayes here). There will be no way of changing the default priors (ghasp!). We only try to give first aid here, not the full medical treatment. Instead it should be easy to start tinkering with the models underlying Bayesian First Aid. The generic function model.code takes a Bayesian First Aid object and prints out the underlying model which is ready to be copy-n-pasted into an R script and tinkered with from there. All models will be implemented using the JAGS modeling language and called from R using the rjags package. In addition to model.code all Bayesian First Aid objects will be plotable and summaryiseable. A call to diagnostics will show some MCMC diagnostics (even if this shouldn’t be necessary to look at for the simple models). For an example of how this would work, see the sneak peek further down.

Base R includes many statistical tests some well known such as cor.test and some more unknown such as the mantelhaen.test. The classical tests I plan to implement reasonable Bayesian alternatives to are to begin with:

The development of Bayesian First Aid can be tracked on GitHub but currently the package is far from being usable in any way. So don’t fork me on GitHub just yet…

I’m grateful for any ideas and suggestion regarding this project. I’m currently struggling to find reasonable Bayesian estimation alternatives to the classical tests so if you have any ideas what to do with, for example, the chisq.test please let me know.

A Sneak Peak at Bayesian First Aid

Just a quick look at how Bayesian First Aid would work. The following simple problem is from a statistical methods course (since publishing this post the page for that statistics course has unfortunately been removed):

A college dormitory recently sponsored a taste comparison between two major soft drinks. Of the 64 students who participated, 39 selected brand A, and only 25 selected brand B. Do these data indicate a significant preference?

This problem is written in order to be tested by a binomial test, so let’s do that:

binom.test(c(39, 25))

	Exact binomial test

data:  c(39, 25)
number of successes = 39, number of trials = 64, p-value = 0.1034
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.4793 0.7290
sample estimates:
probability of success 
                0.6094 

Bummer, seems like there is no statistically significant difference between the two brands. Now we’re going to run a Bayesian alternative simply by prepending bayes. to our function call:

library(BayesianFirstAid)

bayes.binom.test(c(39, 25))

	Bayesian first aid binomial test

data: c(39, 25)
number of successes = 39, number of trials = 64
Estimated relative frequency of success:
  0.606 
95% credible interval:
  0.492 0.727 
The relative frequency of success is more than 0.5 by a probability of 0.959 
and less than 0.5 by a probability of 0.041 

Great, we just estimated the relative frequency $\theta$ of choosing brand A assuming the following model:

$$ x \sim \mathrm{Binom}(\theta,n) $$

$$ \theta \sim \mathrm{Beta}(1,1) $$

In this simple example the estimates from binom.test and bayes.binom.test are close to identical. Still we get to know that The relative frequency of success is more than 0.5 by a probability of 0.956 which indicates that brand A might be more popular. At the end of the day we are not that interested in whether brand A is more popular that brand B but rather how much more popular brand A is. This is easier to see if we look at the posterior distribution of $\theta$:

plot(bayes.binom.test(c(39, 25)))

plot of chunk unnamed-chunk-3

So it seems that brand A might be more popular than brand B but not by too much as the posterior has most of its mass at a relative frequency between 0.5 and 0.7. The college cafeteria should probably keep both brands in stock if possible. If they need to choose one brand, pick A.

Perhaps you want to tinker with the model above, maybe you have some prior knowledge that you want to incorporate. By using the model.code function we get a nice printout of the model that we can copy-n-paste into an R script.

model.code(bayes.binom.test(c(39, 25)))
### Model code for the Bayesian First Aid alternative to the binomial test ###
require(rjags)

# Setting up the data
x <- 39 
n <- 64 

# The model string written in the JAGS language
model_string <-"model {
  x ~ dbinom(theta, n)
  theta ~ dbeta(1, 1)
  x_pred ~ dbinom(theta, n)
}"

# Running the model
model <- jags.model(textConnection(model_string), data = list(x = x, n = n), 
                    n.chains = 3, n.adapt=1000)
samples <- coda.samples(model, c("theta", "x_pred"), n.iter=5000)

#Inspecting the posterior
plot(samples)
summary(samples)  
Posted by Rasmus Bååth | 2014-01-10 | Tags: Statistics, R, Bayesian