Romantic kissing is a cultural universal, right? Nope! At least not if you are to believe Jankowiak et al. (2015) who surveyed a large number of cultures and found that “sexual-romantic kissing” occurred in far from all of them. For some reasons the paper didn’t include a world map with these kissers and non-kissers plotted out. So, with the help of my colleague Andrey Anikin I’ve now made such a map using R and the excellent leaflet package. Click on the image below to check it out:

Jankowiak, W.R., Volsche, S.L. & Garcia, J.R., 2015. Is the Romantic-Sexual Kiss a Near Human Universal? *American Anthropologist*, 117(3), 535-539. doi: 10.1111/aman.12286 pdf

The BASIC programming language was at one point the most widely spread programming language. Many home computers in the 80s came with BASIC (like the Commodore 64 and the Apple II), and in the 90s both DOS and Windows 95 included a copy of the QBasic IDE. QBasic was also the first programming language I encountered (I used it to write a couple of really horrible text adventures). Now I haven’t programmed in BASIC for almost 20 years and I thought I would revisit this (from my current perspective) really weird language. As I spend a lot of time doing Bayesian data analysis, I though it would be interesting to see what a Bayesian analysis would look like if I only used the tool that I had 20 years ago, that is, BASIC.

This post walks through the implementation of the Metropolis-Hastings algorithm, a standard Markov chain Monte Carlo (MCMC) method that can be used to fit Bayesian models, in BASIC. I then use that to fit a Laplace distribution to the most adorable dataset that I could find: The number of wolf pups per den from a sample of 16 wold dens. Finally I summarize and plot the result, still using BASIC. So, the target audience of this post is the intersection of people that have programmed in BASIC *and* are into Bayesian computation. I’m sure you are out there. Let’s go!

There are many many different version of BASIC, but I’m going for the one that I grew up with: Microsoft QBasic 1.1 . Now, QBasic is a relatively new BASIC dialect that has many advanced features such as user defined types and (gasp!) *functions*. But I didn’t use any fancy functions back in the 90s, and so I’m going to write old school BASIC using line numbers and `GOTO`

, which means that the code should be relatively easy to port to, say, Commodore 64 BASIC.

Getting QBasic is easy as it seems to be freeware, and can be downloaded here. Unless you are still using DOS, the next step would be to install the DOSBox emulator. Once you’ve started up `QBASIC.EXE`

you are greeted with a friendly, bright blue IDE which you can try out by entering this customary *hello world* script that will clear the screen and print “HELLO WORLD”.

Note that in QBasic the line numbers are not strictly necessary, but in older BASICs (like the one for Commodore 64) they were necessary as the program was executed in the order given by the line numbers.

We are going to implement the Metropolis-Hastings algorithm, a classic Markov chain Monte Carlo (MCMC) algorithm which is one of the first you’ll encounter if you study computational methods for fitting Bayesian models. This post won’t explain the actual algorithm, but the Wikipedia article is an ok introduction.

The Bayesian model we are going to implement is simple univariate Laplace distribution (just to once in awhile give the Normal distribution a day off). The Laplace is similar to the Normal distribution in that it is continuous and symmetric, but it is more peaked and has wider tails. It has two parameters: A location $\mu$, which then also defines its mean and median, and a scale $b$ which defines the width of the distribution.

*Image source, Credits: IkamusumeFan*

Like the sample mean is the maximum likelihood estimator for the mean of a Normal distribution, the sample median is the maximum likelihood estimator for the location parameter $\mu$ of a Laplace distribution. That’s why I, somewhat sloppily, think of the Normal distribution as *the* “mean” distribution and of the Laplace distribution as *the* “median” distribution. To turn this into a fully Bayesian models we need prior distributions over the two parameters. Here I’m just going to be sloppy and use a $\text{Uniform}(-\infty, \infty)$ over $\mu$ and $\log(b)$, that is, $\text{P}(\mu),\text{P}(\log(b)) \propto 1$. The full model is then

$$ x \sim \text{Laplace}(\mu, b) \ \mu \sim \text{Uniform}(-\infty, \infty) \ \log(b) \sim \text{Uniform}(-\infty, \infty) \$$

The data we are going to use is probably the cutest dataset I’ve worked with so far. It consists of counts of the number of wolf pups in a sample of 16 wolf dens (source):

*Image Source, Credits: spacebirdy / CC-BY-SA-3.0*

Before delving into BASIC, here is a reference implementation in R of what we hope to achieve using BASIC:

# The wolf pups dataset
x <- c(5, 8, 7, 5, 3, 4, 3, 9, 5, 8, 5, 6, 5, 6, 4, 7)
# The log posterior density of the Laplace distribution model, when assuming
# uniorm/flat priors. The Laplace distribution is not part of base R but is
# available in the VGAM package.
model <- function(pars) {
sum(VGAM::dlaplace(x, pars[1], exp(pars[2]), log = TRUE))
}
# The Metropolis-Hastings algorithm using a Uniform(-0.5, 0.5) proposal distribution
metrop <- function(n_samples, model, inits) {
samples <- matrix(NA, nrow = n_samples, ncol = length(inits))
samples[1,] <- inits
for(i in 2:n_samples) {
curr_log_dens <- model(samples[i - 1, ])
proposal <- samples[i - 1, ] + runif(length(inits), -0.5, 0.5)
proposal_log_dens <- model(proposal)
if(runif(1) < exp(proposal_log_dens - curr_log_dens)) {
samples[i, ] <- proposal
} else {
samples[i, ] <- samples[i - 1, ]
}
}
samples
}
samples <- metrop(n_samples = 1000, model, inits = c(0,0))
# Plotting a traceplot
plot(samples[,1], type = "l", ylab = expression(Location ~ mu), col = "blue")
# Calculating median posterior and 95% CI discarding the first 250 draws as "burnin".
quantile(samples[250:1000,1], c(0.025, 0.5, 0.975))

## 2.5% 50% 97.5%
## 4.489 5.184 6.144

(For the love of Markov, do not actually use this R script as a reference, it is just there to show what the BASIC code will implement. If you want to do MCMC using Metropolis-Hastings in R, check out the `MCMCmetrop1R`

in the MCMCpack package, or this Metropolis-Hastings script.

R is sometimes called “a quirky language”, but the script above is a wonder of clarity and brevity compared to how the BASIC code is going to look…

Let’s start by clearing the screen (`CLS`

) and defining the variables we need. Here `DIM`

defines arrays and matrices.

Already in this short piece of code there are a lot of fun quirkiness going on:

- As we are trying to write old school BASIC there are line numbers everywhere. This is not something that gets added automatically, they have to be written my
*me*, and sometimes I forget a row. Hence a good strategy is to increment line numbers by 10 so that you can add missing line in between afterwards (like`75`

above). As there are no functions in old school BASIC the line numbers will be used to jump around in the program using statements like`GOSUB`

and`GOTO`

. - ALL STATEMENTS ARE IN UPPECASE and if you write a statement in lowercase it WILL GET CORRECTED TO UPPERCASE.
- Why the exclamation mark in
`SAMPLES!`

? In QBasic all variables are integers by default and to change the type you*append*a symbol to the variable name. So`THIS$`

is a string while`THAT!`

is a floating point number. Since our parameters are continuous most of our variables will end with a`!`

. - The
`DATA`

command is a relatively nice way to put data into the program, which can later be retrieved sequentially with the`READ`

function. If you have more than one dataset? Tough luck…

Let’s continue with defining the model. It’s going to be written as a *subroutine* which is an isolated piece of code that ends with a `RETURN`

statement. It can then be called with a `GOSUB <line>`

statement which jumps to `<line>`

, executes the code up to the `RETURN`

statement and then jumps back to the line after the `GOSUB`

. It is a little bit like a poor man’s function as you can’t pass in any arguments or return any values. But that’s not a problem *since all variable names are global anyway*, your subroutine can still access and set any variables as long as you are careful not to use the same variable name for two different things. The subroutine below assumes that the vector `PARAMS!`

is set and will overwrite the `LOGDENS!`

variable. The subroutine also uses `READ X!`

to read in a value from the `DATA`

and `RESTORE`

to reset the `READ`

to the first data point.

Here is now the Metropolis-Hastings algorithm. Nothing new here, except for that we now use `GOSUB 520`

to jump into the model subroutine at line `520`

and that we use the RND statement to generate uniformly random numbers between 0.0 and 1.0.

Now we could piece it all together by putting the following code directly after the initial declarations:

This will fill up `SAMPLES!`

with `1000`

MCMC draws from the posterior distribution of the location ($\mu$) and the log scale ($\log(b)$). But that won’t make us any happier, we still would need to summarize and plot the posterior. The code below takes the `SAMPLES!`

matrix and calculates the mean and SD of the posterior, and a 95% credible interval (I usually prefer the median posterior and a quantile interval, but that would require much more coding, as there is no built in method for sorting arrays in BASIC.)

Some more BASIC strangeness here is that `PRINT "HELLO", "YOU!"`

would produce print `HELLO YOU!`

while `PRINT "HELLO"; "YOU!"`

would print `HELLOYOU!`

. That is, whether the separator is `,`

or `;`

decides whether there will be a space between printed items.

Finally we would like to draw a traceplot to make sure that the Markov chain looks ok. While there is no built in function for sorting, there is great graphics support! The bulk of the code below is mostly just rescaling the sample coordinates so that they fit the screen resolution (320 x 200). Notice also the fun syntax for drawing lines: `LINE (<x1>, <y1>)-(<x2>, <y2>)`

. That many of the BASIC statements feature “custom” syntax might be a symptom of that `LINE`

, `PRINT`

, `IF`

, etc. are *statements* and not functions.

And finally we might want to write the samples to disk:

Now we are finally ready to piece everything together, sampling and plotting!

The last `GOTO`

just jumps to the last line of the program which makes the program exit. If we set DOSBox to emulate the speed of a 33 MHz 386 PC (according to this page), it takes about a minute to generate the 1000 samples and summarize the result. Here is the output:

As drawing the lines of the traceplot is really slow we get the animation above “for free”. :)

So what about the wolf pups? Well, this exercise wasn’t really about data analysis, but looking at the posterior summary above it seems like a good guess is that the median number of wolf pups in a den is around 4.5 to 6.0. I should also point out that both the BASIC code and the “reference” implementation result in similar posteriors:

Here is the full program in text format (MCMCDEMO.BAS), or, if you prefer, as a screenshot:

All in all, it seems like I could have done pretty advanced Bayesian statistics using the computational tools I had access to 20 years ago! (Of course, one could easily have implemented a similar program in C or FORTRAN much much earlier…) If this post has inspired you, and you wish to use BASIC as your main tool for scientific computing (please don’t!), you’ll find a host of QBasic tutorials here. You can also dive directly into the wonderful world of BASIC using this excellent javascript emulation of Apple II BASIC.

- Programming with line numbers,
`GOTO`

and`GOSUB`

just feels*very*old school. - It is nice to not have to type a lot of brackets and parentheses all the time. (But then, on the other hand, there are a lot of verbose
`END IF`

s and`NEXT I`

s instead.) - Running the program above on a 386 PC was pretty quick, but it would be cool to know if it would have been possible to implement and run Metropolis-Hastings on a much slower computer, say a Commodore 64…
- It is really difficult to program when all variables live in the same global namespace! When programming this short program I still produced many bugs where I used the same variable name twice.
- BASIC isn’t
*one*language but a set of similar languages, and while I’ve tried to write the code above using fairly old school BASIC I don’t think it would run on any other BASIC than QBasic without any modification. For example, Commodore 64 BASIC only allows you have one command following an`IF`

-statement, and there was no`ELSE`

! So if you wanted to execute many commands as the result of an`IF`

statement you would have to fake it with`GOTO`

statements instead. - While the QBasic language might seem a bit quirky if you are used to modern languages, the QBasic IDE is actually surprisingly friendly! Just like many modern IDEs (like, for example, Rstudio) it has:
- Automatic syntax checking, you get direct feedback when you’ve written something wrong.
- A built in help system that works great!
- A debugger!
- An interactive prompt.
- Automatic code formatting.
`print 5*5+ 7`

→`PRINT 5 * 5 + 7`

This is a screencast of my UseR! 2015 presentation: Tiny Data, Approximate Bayesian Computation and the Socks of Karl Broman. Based on the original blog post it is a *quick’n’dirty* introduction to approximate Bayesian computation (and is also, in a sense, an introduction to Bayesian statistics in general). Here it is, if you have 15 minutes to spare:

The video is short and makes a lot of simplifications/omissions, some which are:

- There are not just one, but many many different algorithms for doing approximate Bayesian computation, where the algorithm outlined in the video is called ABC rejection sampling. What make these
(and not just**approximate**Bayesian computational methods*Bayesian computational methods*) is that they require, what I have called, a generative model and an acceptance criterion. What I callin the video (but which is normally just called**standard**Bayesian computation*Bayesian computation*) instead requires a function that calculates*the likelihood*given the data and some fixed parameter values. - I mention in the video that approximate Bayesian computation is the slowest way you can fit a statistical model, and for many common statistical models this is the case. However for some models it might be very expensive to evaluate the likelihood, and in that case approximate Bayesian computation can actually be faster. As usual, it all depends on the context…
- I mention “drawing random parameter values from the prior”, or something similar, in the video. Invoking “randomness” always makes me a bit uneasy, and I just want to mention that the purpose of “drawing random parameters” is just to get a vector/list of parameter values that is a good enough representation of the prior probability distribution. It just happens to be the case that random number generators (like
`rnbinom`

and`rbeta`

) are a convenient way of creating such representative distributions.

For a Slower’n’Cleaner introduction to approximate Bayesian computation I would actually recommend the Wikipedia page, which is pretty good!

]]>A while back I wrote about how the classical non-parametric bootstrap can be seen as a special case of the Bayesian bootstrap. Well, one difference between the two methods is that, while it is straightforward to roll a classical bootstrap in R, there is no easy way to do a Bayesian bootstrap. This post, in an attempt to change that, introduces a `bayes_boot`

function that should make it pretty easy to do the Bayesian bootstrap for any statistic in R. If you just want a function you can copy-n-paste into R go to *The bayes_boot function* below. Otherwise here is a quick example of how to use the function, followed by some details on the implementation.

So say you scraped the heights of all the U.S. Presidents off Wikipedia (american_presidents.csv) and you want to run a Bayesian bootstrap analysis on the mean height of U.S. Presidents (don’t ask me *why* you would want to do this). Then, using the `bayes_boot`

function found below, you can run the following:

presidents <- read.csv("american_presidents.csv")
bb_mean <- bayes_boot(presidents$height_cm, mean, n1 = 1000)

Here is how to get a 95% credible interval:

quantile(bb_mean, c(0.025, 0.975))

## 2.5% 97.5%
## 177.8 181.8

And, of course, we can also plot this:

(Here, and below, I will save you from the slightly messy plotting code, but if you really want to see it you can check out the full script here.)

Now, say we want run a linear regression on presidential heights over time, and we want to use the Bayesian bootstrap to gauge the uncertainty in the regression coefficients. Then we will have to do a little more work, as the second argument to `bayes_boot`

should be a function that takes the data as the first argument and that returns a vector of parameters/coefficients:

bb_linreg <- bayes_boot(presidents, function(data) {
lm(height_cm ~ order, data)$coef
}, n1 = 1000)

Ok, so it is not really over *time*, as we use the `order`

of the president as the predictor variable, but close enough. Again, we can get a 95% credible interval of the slope:

quantile(bb_linreg$order, c(0.025, 0.975))

## 2.5% 97.5%
## 0.03979 0.34973

And here is a plot showing the mean posterior regression line with a smatter of lines drawn from the posterior to visualize the uncertainty:

Given the model and the data, the average height of American presidents increases by around 0.2 cm for each president elected to office. So, either we have that around the 130th president the average height of presidents will be around 2 meters (≈ 6’7’’), or perhaps a linear regression isn’t really a reasonable model here… Anyhow, it was easy to do the Bayesian bootstrap! :)

It is possible to characterize the statistical *model* underlying the Bayesian bootstrap in a couple of different ways, but all can be implemented by the same *computational* procedure:

To generate a Bayesian bootstrap sample of size `n1`

, repeat the following `n1`

times:

- Draw weights from a uniform Dirichlet distribution with the same dimension as the number of data points.
- Calculate the statistic, using the Dirichlet draw to weight the data, and record it.

One way to characterize drawing from an *n*-dimensional uniform Dirichlet distribution is as drawing a vector of length *n* where the values are positive, sum to 1.0, and where any combination of values is equally likely. Another way to characterize a uniform Dirichlet distribution is as a uniform distribution over the *unit simplex*, where a unit simplex is a generalization of a triangle to higher dimensions, with sides that are 1.0 long (hence the *unit*). The figure below pictures the one, two, three and four-dimensional unit simplex:

*Image source: Introduction to Discrete Differential Geometry by Peter Schröder*

Drawing from an *n*-dimensional uniform Dirichlet distribution can be done by drawing $\text{Gamma(1,1)}$ distributed numbers and normalizing these to sum to 1.0 (source). As a $\text{Gamma(1,1)}$ distribution is the same as an $\text{Exponential}(1)$ distribution, the following two lines of R code implements drawing `n1`

draws from an `n`

dimensional uniform Dirichlet distribution:

dirichlet_sample <- matrix( rexp(n * n1, 1) , ncol = n, byrow = TRUE)
dirichlet_sample <- dirichlet_sample / rowSums(dirichlet_sample)

With `n <- 4`

and `n1 <- 3`

you could, for example, get:

## [,1] [,2] [,3] [,4]
## [1,] 0.61602 0.06459 0.2297 0.08973
## [2,] 0.05384 0.12774 0.4685 0.34997
## [3,] 0.17419 0.42458 0.1649 0.23638

Here is where, if you were doing a classical non-parametric bootstrap, you would use your resampled data to calculate a statistic (say a mean). Instead, we will want to calculate our statistic of choice using the Dirichlet draw to *weight* the data. This is completely straightforward if the statistic can be calculated using weighted data, which is the case for `weighted.mean(x, w)`

and `lm(..., weights)`

. For the many statistics that do not accept weights, such as `median`

and `cor`

, we will have to perform a second sampling step where we (1) sample from the data according to the probabilities defined by the Dirichlet weights, and (2) use this resampled data to calculate the statistic. It is important to notice that we here want to draw an as large sample as possible from the data, and not a sample of the same size as the original data. The point is that the proportion of times a datapoint occurs in this resampled dataset should be roughly proportional to that datapoint’s weight.

*Note that doing this second resampling step won’t work if the statistic changes with the sample size! An example of such a statistic would be the sample standard deviation ( sd), population standard deviation would be fine, however*

Below is a small example script that takes the `presidents`

dataset and does a Bayesian Bootstrap analysis of the median height. Here `n1`

is the number of bootstrap draws and `n2`

is the size of the resampled data used to calculate the `median`

for each Dirichlet draw.

n1 <- 3000
n2 <- 1000
n_data <- nrow(presidents)
# Generating a n1 by n_data matrix where each row is an n_data dimensional
# Dirichlet draw.
weights <- matrix( rexp(n_data * n1, 1) , ncol = n_data, byrow = TRUE)
weights <- weights / rowSums(weights)
bb_median <- rep(NA, n1)
for(i in 1:n1) {
data_sample <- sample(presidents$height_cm, size = n2, replace = TRUE, prob = weights[i,])
bb_median[i] <- median(data_sample)
}
# Now bb_median represents the posterior median height, and we can do all
# the usual stuff, like calculating a 95% credible interval.
quantile(bb_median, c(0.025, 0.975))

## 2.5% 97.5%
## 178 183

If we were interested in the mean instead, we could skip resampling the data and use the weights directly, like this:

bb_mean <- rep(NA, n1)
for(i in 1:n1) {
bb_mean[i] <- weighted.mean(presidents$height_cm, w = weights[i,])
}
quantile(bb_mean, c(0.025, 0.975))

## 2.5% 97.5%
## 177.8 181.9

If possible, you will probably want to use the weight method; it will be *much* faster as you skip the costly resampling step. What size of the bootstrap samples (`n1`

) and size of the resampled data (`n2`

) to use? The boring answers are: “As many as you can afford” and “Depends on the situation”, but you’ll probably want at least 1000 of each.

`bayes_boot`

functionHere follows a handy function for running a Bayesian bootstrap that you can copy-n-paste directly into your R-script. It should accept any type of data that comes as a vector, matrix or data.frame and allows you to use both statistics that can deal with weighted data (like `weighted.mean`

) and statistics that don’t (like `median`

). See above and below for examples of how to use it.

*Caveat: While I have tested this function for bugs, do keep an eye open and tell me if you find any. Again, note that doing the second resampling step ( use_weights = FALSE) won’t work if the statistic changes with the sample size!*

# Performs a Bayesian bootstrap and returns a sample of size n1 representing the
# posterior distribution of the statistic. Returns a vector if the statistic is
# one-dimensional (like for mean(...)) or a data.frame if the statistic is
# multi-dimensional (like for the coefs. of lm).
# Parameters
# data The data as either a vector, matrix or data.frame.
# statistic A function that accepts data as its first argument and possibly
# the weights as its second, if use_weights is TRUE.
# Should return a numeric vector.
# n1 The size of the bootstrap sample.
# n2 The sample size used to calculate the statistic each bootstrap draw.
# use_weights Whether the statistic function accepts a weight argument or
# should be calculated using resampled data.
# weight_arg If the statistic function includes a named argument for the
# weights this could be specified here.
# ... Further arguments passed on to the statistic function.
bayes_boot <- function(data, statistic, n1 = 1000, n2 = 1000 , use_weights = FALSE, weight_arg = NULL, ...) {
# Draw from a uniform Dirichlet dist. with alpha set to rep(1, n_dim).
# Using the facts that you can transform gamma distributed draws into
# Dirichlet draws and that rgamma(n, 1) <=> rexp(n, 1)
dirichlet_weights <- matrix( rexp(NROW(data) * n1, 1) , ncol = NROW(data), byrow = TRUE)
dirichlet_weights <- dirichlet_weights / rowSums(dirichlet_weights)
if(use_weights) {
stat_call <- quote(statistic(data, w, ...))
names(stat_call)[3] <- weight_arg
boot_sample <- apply(dirichlet_weights, 1, function(w) {
eval(stat_call)
})
} else {
if(is.null(dim(data)) || length(dim(data)) < 2) { # data is a list type of object
boot_sample <- apply(dirichlet_weights, 1, function(w) {
data_sample <- sample(data, size = n2, replace = TRUE, prob = w)
statistic(data_sample, ...)
})
} else { # data is a table type of object
boot_sample <- apply(dirichlet_weights, 1, function(w) {
index_sample <- sample(nrow(data), size = n2, replace = TRUE, prob = w)
statistic(data[index_sample, ,drop = FALSE], ...)
})
}
}
if(is.null(dim(boot_sample)) || length(dim(boot_sample)) < 2) {
# If the bootstrap sample is just a simple vector return it.
boot_sample
} else {
# Otherwise it is a matrix. Since apply returns one row per statistic
# let's transpose it and return it as a data frame.
as.data.frame(t(boot_sample))
}
}

*Update:* Ilanfri has now ported the bayes_boot function to Python.

`bayes_boot`

Let’s start by drawing some fake data from an exponential distribution with mean 1.0 and compare using the following methods to infer the mean:

- The classical non-parametric bootstrap using
`boot`

from the`boot`

package. - Using
`bayes_boot`

with “two level sampling”, that is, sampling both weights and then resampling the data according to those weights. - Using
`bayes_boot`

with weights (`use_weights = TRUE`

) - Assuming an exponential distribution (the “correct” distribution since we know where the data came from), with a flat prior over the mean.

First generating some data:

set.seed(1337)
exp_data <- rexp(8, rate = 1)
exp_data

```
## [1] 0.15 0.13 2.26 0.92 0.17 1.55 0.13 0.02
```

Then running the four different methods:

library(boot)
b_classic <- boot(exp_data, function(x, i) { mean(x[i])}, R = 10000)
bb_sample <- bayes_boot(exp_data, mean, n1 = 10000, n2 = 1000)
bb_weight <- bayes_boot(exp_data, weighted.mean, n1 = 10000, use.weights = TRUE, weight_arg = "w")
# Just a hack to sample from the posterior distribution when
# assuming an exponential distribution with a Uniform(0, 10) prior
prior <- seq(0.001, 10, 0.001)
post_prob <- sapply(prior, function(mean) { prod(dexp(exp_data, 1/mean)) })
post_samp <- sample(prior, size = 10000, replace = TRUE, prob = post_prob)

Here are the resulting posterior/sampling distributions:

This was mostly to show off the syntax of `bayes_boot`

, but some things to point out in the histograms above are that:

- Using the Bayesian bootstrap with two level sampling or weights result in very similar posterior distributions, which should be the case when the size of the resampled data is large (here set to
`n2 = 1000`

). - The classical non-parametric bootstrap is pretty similar to the Bayesian bootstrap (as we would expect).
- The bootstrap distributions are somewhat similar to the posterior mean assuming an exponential distribution, but completely misses out on the uncertainty in the right tail. This is due to the “somewhat peculiar model assumptions” of the bootstrap as critiqued by Rubin (1981)

Finally, a slightly more complicated example, where we do Bayesian bootstrap analysis of LOESS regression applied to the `cars`

dataset on the speed of cars and the resulting distance it takes to stop. The `loess`

function returns, among other things, a vector of `fitted`

*y* values, one value for each *x* value in the data. These *y* values define the smoothed LOESS line and is what you would usually plot after having fitted a LOESS. Now we want to use the Bayesian bootstrap to gauge the uncertainty in the LOESS line. As the `loess`

function accepts weighted data, we’ll simply create a function that takes the data with weights and returns the `fitted`

*y* values. We’ll then plug that function into `bayes_boot`

:

boot_fn <- function(cars, weights) {
loess(dist ~ speed, cars, weights = weights)$fitted
}
bb_loess <- bayes_boot(cars, boot_fn, n1 = 1000, use_weights = TRUE, weight_arg = "weights")

To plot this takes a couple of lines more:

# Plotting the data
plot(cars$speed, cars$dist, pch = 20, col = "tomato4", xlab = "Car speed in mph",
ylab = "Stopping distance in ft", main = "Speed and Stopping distances of Cars")
# Plotting a scatter of Bootstrapped LOESS lines to represent the uncertainty.
for(i in sample(nrow(bb_loess), 20)) {
lines(cars$speed, bb_loess[i,], col = "gray")
}
# Finally plotting the posterior mean LOESS line
lines(cars$speed, colMeans(bb_loess, na.rm = TRUE), type ="l",
col = "tomato", lwd = 4)

Fun fact: The `cars`

dataset is from the 20s! Which explains why the fastest car travels at 25 mph. It would be interesting to see a comparison with stopping times for modern cars!

Rubin, D. B. (1981). The Bayesian Bootstrap. *The annals of statistics*, 9(1), 130-134. pdf