I recently wrapped up a version of my R function for easy Bayesian bootstrappin’ into the package bayesboot
. This package implements a function, also named bayesboot
, which performs the Bayesian bootstrap introduced by Rubin in 1981. The Bayesian bootstrap can be seen as a smoother version of the classical nonparametric bootstrap, but I prefer seeing the classical bootstrap as an approximation to the Bayesian bootstrap :)
The implementation in bayesboot
can handle both summary statistics that works on a weighted version of the data (such as weighted.mean
) and that works on a resampled data set (like median
). As bayesboot
just got accepted on CRAN you can install it in the usual way:
1


You’ll find the source code for bayesboot
on GitHub.
If you want to know more about the model behind the Bayesian bootstrap you can check out my previous blog post on the subject and, of course, the original paper by Rubin (1981).
A simple example of bayesboot
in action
As in a previous post on the Bayesian bootstrap, here is again a Bayesian bootstrap analysis of the mean height of American presidents using the heights of the last ten presidents:
1 2 

The bayesboot
function needs, at least, a vector of data and a function implementing a summary statistic. Here we have the data height
and we’re going with the sample mean
as our summary statistic:
1 2 

The resulting posterior distribution over probable mean heights can now be plot
ted and summary
ized:
1


1 2 3 4 5 6 7 8 9 10 11 12 13 14 

1


A shoutout to Mike Meredith and John Kruschke who implemented the great BEST and HDInterval packages which summary
and plot
utilizes. Note here that the point mean in the summary and plot above refers to the mean of the posterior distribution and not the sample mean of any presidents.
While it is possible to use a summary statistic that works on a resample of the original data, it is more efficient to use a summary statistic that works on a reweighting of the original dataset. So instead of using mean
as above it would be better to use weighted.mean
, like this:
1


The result will be almost the same as before, but the above will be somewhat faster to compute.
The result of a call to bayesboot
will always result in a data.frame
with one column per dimension of the summary statistic. If the summary statistic does not return a named vector the columns will be called V1
, V2
, etc. The result of a bayesboot
call can be further inspected and post processed. For example:
1 2 3 4 

1


Comparing two groups
If we want to compare the means of two groups, we will have to call bayesboot
twice with each dataset and then use the resulting samples to calculate the posterior difference. For example, let’s say we have the heights of the opponents that lost to the presidents in height
the first time those presidents were elected. Now we are interested in comparing the mean height of American presidents with the mean height of presidential candidates that lost.
1 2 3 4 5 6 7 8 9 10 11 12 

So there is some evidence that winning presidents are a couple of cm taller than loosing opponents. (Though, I must add that it is quite unclear what the purpose really is of analyzing the heights of presidents and opponents…)
More information
The README and documentation of bayesboot
contains more examples. If you find any bugs or have suggestions for improvements consider submitting an issue on GitHub.
References
Rubin, D. B. (1981). The Bayesian bootstrap. The annals of statistics, 9(1), 130–134. link to paper