# Bayesian First Aid: Two Sample t-test

As spring follows winter once more here down in southern Sweden, the two sample t-test follows the one sample t-test. This is a continuation of the Bayesian First Aid alternative to the one sample t-test where I’ll introduce the two sample alternative. It will be a quite short post as the two sample alternative is just more of the one sample alternative, more of using John K. Kruschke’s BEST model, and more of the coffee yield data from the 2002 Nature article The Value of Bees to the Coffee Harvest. Bayesian First Aid is an attempt at implementing reasonable Bayesian alternatives to the classical hypothesis tests in R. For the rationale behind Bayesian First Aid see the original announcement. The development of Bayesian First Aid can be followed on GitHub. Bayesian First Aid is a work in progress and I’m grateful for any suggestion on how to improve it!

## The Model

The model is a straight forward extension of the one sample Bayesian Estimation Supersedes the T-test (BEST), instead of estimating the distribution of one group $x_i$ we now estimate the distributions of two, $x_i$ and $y_j$. As before we use the t-distribution as a robust alternative to the normal distribution and we assume non-informative priors. The degree-of-freedoms parameter $\nu$ governs how fat the tails of the t-distribution are and could be estimated separately for each group. Here we instead assume the two groups share a common $\nu$ resulting in a model with five parameters: $\mu_x, \mu_y, \sigma_x, \sigma_y$, and $\nu$. Here is the full model for the two sample BEST, shown using a Kruschke style diagram: (If you want to know more about Kruschke style diagrams and how you can easily make them check out DIY Kruschke Style Diagrams and How Do You Write Your Model Definitions?)

## The bayes.t.test Function for Paired Samples

The bayes.t.test function can be called exactly as the t.test function in R. If you just ran t.test(x, y, conf.level=0.9), a two sample t-test yielding a 90% CI, prepending bayes. runs the Bayesian First Aid alternative: bayes.t.test(x, y, conf.level=0.9). To showcase bayes.t.test I’ll use the same data as last time, the coffee yield data from the 2002 Nature article The Value of Bees to the Coffee Harvest (doi:10.1038/417708a, pdf). In that article David W. Roubik argues that, while the coffee bush is self pollinating, having African honeybees around still increases the coffee yield. This argument is supported by a data set with by-country coffee yields from the periods 1961 to 1980 and 1981 to 2001. Here is the data set shown using a Slopegraph: The thing is, in the 60s the African honeybee had not yet settled in the New World, something that happened in the period 1981 to 2001. Roubik then argues that the data shows that… Here “substantial increase in Latin America” refers to a statistically significant mean increase of 1470 kg/10,000² (paired samples t-test, p = 0.004) while “no such change in the Old World” refers to a non-significant mean increase of 777 kg/10,000² (p = 0.23). Roubik makes the classical mistake of assuming that the difference between “significant” and “not significant” is itself statistically significant (Gelman & Stern, 2006, explains why it’s not). What should be compared is not the p-values but the increase in coffee yield in the Old and New World. Let’s do just that using the Bayesian First Aid alternative to the two sample t-test.

First I’ll load in the data (available here) and calculate the difference in yield between the 1960 to 1981 period and the 1981 to 2000 period for each country:

The boxplot below shows the distribution of the yield differences. We’ll now compare the increase in yields using the Bayesian First Aid alternative to the two sample t-test.

The estimated difference of the means show that New World type countries increase their coffee yield by 649 kg/10,000² more than Old World type countries. The 95% credible interval is however fairly wide spanning all the way from -1593 to 2997, thus there is no strong evidence that the introduction of African honeybees in the New World resulted in a larger yield increase than in the Old World. The 2002 Nature article contains other arguments for the importance of bees to the coffee harvest, however, the argument based on the difference between New World and Old World coffee yields does not hold in the light of this analysis. (Not to mention other problems with the original analysis such as not considering that something other than the introduction of African bees might have changed in the New World from 1960 to 2001.)

### Generic Functions

Every Bayesian First Aid test have corresponding plot, summary, diagnostics and model.code functions. Here follows examples of these using the same coffee yield data as above. Using plot we get to look at the posterior distributions directly. We also get a posterior predictive check in the form of a histogram with a smattering of t-distributions drawn from the posterior. summary gives us a detailed summary of the posterior of the parameters and derived quantities. For this model there will be a lot of output! One thing to note is that while there is little evidence of a difference in means between New and Old World type countries, there is some evidence for a difference in SD (check out the posterior of sigma_diff)

diagnostics will give us an equally long list of MCMC convergence diagnostics and traceplots. Finally model.code prints out R and JAGS code that replicates the analysis and that can be directly copy-n-pasted into an R script and modified from there. It is also pretty long but here goes:

## Bayesian Estimation Supersedes the T-test (BEST) Online

In addition to using the two sample BEST as the Bayesian First Aid alternative to the t-test I’ve also implemented a javascript version that allows you to estimate this model directly in your browser using MCMC. Head over to https://sumsar.net/best_online/ and try it out with the same coffee yield data as in the examples above (just copy-n-paste from below), it should (hopefully) result in similar estimates.

The javascript implementation uses a difference MCMC algorithm than the R & JAGS version where the burn in period is also used for tuning the sampler. In the case with the coffee data I would recommend increasing the burn in period to 40000 (or decrease it to 1 to see how badly behaving MCMC sampling looks). 