# Bayesian First Aid: One Sample and Paired Samples t-test

Student’s t-test is a staple of statistical analysis. A quick search on Google Scholar for “t-test” results in 170,000 hits in 2013 alone. In comparison, “Bayesian” gives 130,000 hits while “box plot” results in only 12,500 hits. To be honest, if I had to choose I would most of the time prefer a notched boxplot to a t-test. The t-test comes in many flavors: one sample, two-sample, paired samples and Welch’s. We’ll start with the two most simple; here follows the Bayesian First Aid alternatives to the one sample t-test and the paired samples t-test. Bayesian First Aid is an attempt at implementing reasonable Bayesian alternatives to the classical hypothesis tests in R. For the rationale behind Bayesian First Aid see the original announcement and the description of the alternative to the binomial test. The development of Bayesian First Aid can be followed on GitHub. Bayesian First Aid is a work in progress and I’m grateful for any suggestion on how to improve it!

## The Model

A straight forward alternative to the t-test would be to assume normality, add some non-informative priors to the mix and be done with it. However, one of great things with Bayesian data analysis is that it is easy to not assume normality. One alternative to the normal distribution, that still will fit normally distributed data well but that is more robust against outliers, is the t distribution. Hang on, you say, isn’t the t-test already using the t-distribution?. Right, the t-test uses the t-distribution as the distribution of the sample mean divided by the sample SD, here the trick is to assume it as the distribution of the data.

Instead of reinventing the wheel I’ll here piggyback on the work of John K. Kruschke who has developed a Bayesian estimation alternative to the t-test called Bayesian Estimation Supersedes the T-test, or BEST for short. The rationale and the assumptions behind BEST are well explained in a paper published 2013 in the Journal of Experimental Psychology (the paper is also a very pedagogical introduction to Bayesian estimation in general). That paper and more information regarding BEST is available on John Kruschkes web page. He has also made a nice video based on the paper (mostly focused on the two sample version though):

All information regarding BEST is given in the paper and the video, here is just a short rundown of the model for the one sample BEST: BEST assumes the data ($x$) is distributed as a t distribution which is more robust than a normal distribution due to its wider tails. Except for the mean ($\mu$) and the scale ($\sigma$) the t has one additional parameter, the degree-of-freedoms ($\nu$), where the lower $\nu$ is the wider the tails become. When $\nu$ gets larger the t distribution approaches the normal distribution. While it would be possible to fix $\nu$ to a single value BEST instead estimates $\nu$ allowing the t-distribution to become more or less normal depending on the data. Here is the full model for the one sample BEST: The prior on $\nu$ is an exponential distribution with mean 29 shifted 1 to the right keeping $\nu$ away from zero. From the JEP 2013 paper: “This prior was selected because it balances nearly normal distributions ($\nu$ > 30) with heavy tailed distributions ($\nu$ < 30)”. The priors on $\mu$ and $\sigma$ are decided by the hyper parameters $M_\mu$, $S_\mu$, $L_\sigma$ and $H_\sigma$. By taking a peek at the data these parameters are set so that the resulting priors are extremely wide. While having a look at the data pre-analysis is generally not considered kosher, in practice this gives the same results as putting $\mathrm{Uniform}(-\infty,\infty)$ distributions on $\mu$ and $\sigma$.

### The Model for Paired Samples

Here I use the simple solution. Instead of modeling the distribution of both groups and the paired differences the Bayesian First Aid alternative uses the same trick as the original paired samples t-test: Take the difference between each paired sample and model just the paired differences using the one sample procedure. Thus the alternative to the paired samples t-test is the same as the one sample alternative, the only difference is in how the data is prepared and the how the result is presented.

## The bayes.t.test Function

The t.test function is used to run all four versions of the t-test. Here I’ll just show the one sample and paired samples alternatives. The bayes.t.test runs the Bayesian First Aid alternative to the t-test and has a function signature that is compatible with t.test function. That is, if you just ran a t-test, say t.test(x, conf.int=0.8, mu=1), just prepend bayes. and it should work out of the box.

The example data I’ll use to show off bayes.t.test is from the 2002 Nature article The Value of Bees to the Coffee Harvest (doi:10.1038/417708a, pdf). In this article David W. Roubik argues that bees are important to the coffee harvest despite that the “self-pollinating African shrub Coffea arabica, a pillar of tropical agriculture, was considered to gain nothing from insect pollinators”. Supporting the argument is a data set of the mean average coffee yield (in kg / 10,000 m) for new world countries in 1961-1980, before the establishment of African honeybees, and in 1981-2001, when African honeybees had been more or less been naturalized. This data shows an increased yield after the introduction of bees and when analyzed using a paired t-test results in p = 0.04. This is compared with the increase in yield in old world countries, where the bees been busy buzzing all along, were a paired t-test gives p = 0.232 interpreted as “no change”. The full dataset is given in the table below and in this csv file (to match the analysis in the paper the csv does not include the Caribbean islands) There are a couple of reasons for why it is not proper to use a t-test to analyze this data set. A t-test does not consider the geographical location of the countries nor is it clear what “population” the sample of countries is drawn from. I also feel tempted to mutter the old cliché “correlation does not imply causation”, surely there must have been many things except for the introduction of bees that changed in Bolivia between 1961 and 2001. Being aware of these objections I’m going to use it to nevertheless show off the paired bayes.t.test.

Let’s first run the original analysis from the paper:

In the paper they used one tailed t-test hence the unhelpfully wide confidence interval. Now the Bayesian First Aid alternative:

So here we get estimates for the mean and SD with credible intervals on the side. We also get to know that the probability that the mean increase is more than zero is 99.4%. We could also take a look at the data from the old world:

So the trend here is also increasing yields, though the parameter estimate is much less precise and the 95% CI includes zero. Also notable is the much larger SD compare to the new world countries.

### Generic Functions

Every Bayesian First Aid test have corresponding plot, summary, diagnostics and model.code functions. Here follows examples of these using the new world data.

Using plot we get to look at the posterior distributions directly. We also get a posterior predictive check in the form of a histogram with a smattering of t-distributions drawn from the posterior. If there is a large discrepancy between the model fit and the data then we need to think twice before proceeding. As it is now I would say the post. pred. check looks OK. summary gives us a detailed summary of the posterior. Note that the number of decimals places in this summary is a bit excessive, due to the posterior being approximated using MCMC the numbers will jump around slightly between runs. If this worries you increase the number of MCMC iterations by using the argument n.iter when calling bayes.t.test.

diagnostics prints and plots MCMC diagnostics (similar to the example in bayes.binom.test). Finally model.code prints out R and JAGS code that replicates the analysis and that can be directly copy-n-pasted into an R script and modified from there. Here goes:

If we believe, for example, that robustness is not such a big issue and would like to assume that the data is normally distributed rather than t distributed we just have to make some small adjustments to this script. In the model code we change dt into dnorm and remove the nu parameter resulting in this model:

We also has to remove the monitoring of the nu parameter.

And that’s all.

The bayes.t.test does not include all the functionality of BEST. Check out the BEST package by John K. Kruschke and Mike Meredith for an R and JAGS implementation of BEST including power analysis and more. T-testable data was not that easy to get by and I browsed through a lot of papers before I found the examples with the bees. Wouldn’t it be nice if people actually published some data, well well… I did find two other fun examples of t-testable data if someone would be interested: 