On the 21st of February, 2015, my wife had not had her period for 33 days, and as we were trying to conceive, this was good news! An average period is around a month, and if you are a couple trying to go triple, then a missing period is a good sign something is going on. But at 33 days, this was not yet a missing period, just a late one, so *how* good news was it? Pretty good, *really* good, or just *meh*?

To get at this I developed a simple Bayesian model that, given the number of days since your last period and your history of period onsets, calculates the probability that you are going to be pregnant this period cycle. In this post I will describe what data I used, the priors I used, the model assumptions, and how to fit it in R using importance sampling. And finally I show you why the result of the model really didn’t matter in the end. Also I’ll give you a handy script if you want to calculate this for yourself. :)

During the last part of 2014 my wife kept a journal of her period onsets, which was good luck for me, else I would end up with tiny data again. In total we had the dates for eight period onsets, but the data I used was not the onsets but the number of days between the onsets:

period_onset <- as.Date(c("2014-07-02", "2014-08-02", "2014-08-29", "2014-09-25",
"2014-10-24", "2014-11-20", "2014-12-22", "2015-01-19"))
days_between_periods <- as.numeric(diff(period_onset))

So the onsets occur pretty regularly, hovering around a cycle of 28 days. The last onset was on the 19th of January, so on the 21st of February there had been 33 days since the last onset.

I was constructing a model covering period cycles, pregnancies and infertility, and as such it was obviously going to make *huge* simplifications. Some general assumptions I made were:

- The woman and the man have no prior reasons for being infertile as a couple.
- The woman has regular periods.
- The couple trying to conceive are
*actively*trying to conceive. Say, two three times a week as recommended by Wilcox et al. (2000). - If there is pregnancy, there are no more periods.

Now to the specific assumptions I made:

- The number of days between periods (
`days_between_periods`

) is assumed to be normally distributed with unknown mean (`mean_period`

) and standard deviation (`sd_period`

). - The probability of getting pregnant during a cycle is assumed to be
`0.19`

(more about where this number comes from below)*if*you are fertile as a couple (`is_fertile`

). Unfortunately not all couples are fertile, and if you are not then the probability of getting pregnant is 0. If fertility is coded as 0-1 then this can be compactly written as`0.19 * is_fertile`

. - The probability of failing to conceive for a certain number of periods (
`n_non_pregnant_periods`

) is then`(1 - 0.19 * is_fertile)^n_non_pregnant_periods`

- Finally, if you are not going to be pregnant this cycle, the number of days from your last to your next period (
`next_period`

) is going to be more than the current number of days since the last period (`days_since_last_period`

). That is, the probability of`next_period < days_since_last_period`

is zero. This sounds strange because it is so obvious, but we’re going to need it in the model.

That was basically it! But in order to fit this I was going to need a *likelihood function*, a function that, given fixed parameters and some data, calculates the probability of the data given those parameters or, more commonly, something proportional to a probability, that is, a *likelihood*. And as this *likelihood* can be extremely tiny I needed to calculate it on the log scale to avoid numerical problems. When crafting a log likelihood function in R, the general pattern is this:

- The function will take the data and the parameters as arguments.
- You initialize the likelihood to 1.0, corresponding to 0.0 on the log scale (
`log_like <- 0.0`

). - Using the probability density functions in R (such as
`dnorm`

,`dbinom`

and`dpois`

) you calculate the likelihoods of the different the parts of the model. You then multiply these likelihoods together. On the log scale this corresponds to adding the log likelihoods to`log_like`

. - To make the
`d*`

functions return log likelihoods just add the argument`log = TRUE`

. Also remember that a likelihood of 0.0 corresponds to a log likelihood of`-Inf`

.

So, a log likelihood function corresponding to the model above would then be:

calc_log_like <- function(days_since_last_period, days_between_periods,
mean_period, sd_period, next_period,
is_fertile, is_pregnant) {
n_non_pregnant_periods <- length(days_between_periods)
log_like <- 0
if(n_non_pregnant_periods > 0) {
log_like <- log_like + sum( dnorm(days_between_periods, mean_period, sd_period, log = TRUE) )
}
log_like <- log_like + log( (1 - 0.19 * is_fertile)^n_non_pregnant_periods )
if(!is_pregnant && next_period < days_since_last_period) {
log_like <- -Inf
}
log_like
}

Here the data is the scalar `days_since_last_period`

and the vector `days_between_periods`

, and the rest of the arguments are the parameters to be estimated. Using this function I could now get the log likelihood for any data + parameter combination. However, I still only had half a model, I also needed priors!

To complete this model I needed priors on all the parameters. That is, I had to specify what information the model has before seeing the data. Specifically, I needed priors on `mean_period`

, `sd_period`

, `is_fertile`

, and `is_pregnant`

(while `next_period`

is also a parameter, I didn’t need to give it an explicit prior as its distribution is completely specified by `mean_period`

and `sd_period`

). I also needed to find a value for the probability of becoming pregnant in a cycle (which I set to `0.19`

above). Did I use vague, “objective” priors here? No, I went looking in the fertility literature to something more informative!

For the distribution of the `days_between_periods`

the parameters were `mean_period`

and `sd_period`

. Here I used estimates from the article *The normal variabilities of the menstrual cycle*
(Cole et al, 2009) which measured the regularity of periods in 184 women aged 18-36 years. The grand mean number of days between periods was here 27.7 days, with the SD of the per participant mean being 2.4. The group SD of the number of days between periods was 1.6. Given these estimates I then decided to put a Normal(27.7, 2.4) distribution over `mean_period`

and a Half-Normal distribution with mean 1.6 over `sd_period`

, corresponding to a Half-Normal with a SD of 2.05. Here they are:

For the parameters `is_fertile`

and `is_pregnant`

I based the priors on *frequencies*. The proportion of couples that are fertile is tricky to define, as there different definitions of *infertility*. Van Geloven et al. (2013) made a small literature review and got that between 2% and 5% of all couples could be considered infertile. As I’ve seen numbers as high as 10%, I decided to go with the higher end of this range and put a prior probability of 100% - 5% = 95% that a couple is fertile.

`is_pregnant`

is a binary parameter standing for whether the couple are going get (or already are) pregnant the current cycle. The prior I used here was the probability of getting pregnant in a cycle. This probability is obviously 0.0 if the couple is infertile, but how large a proportion of active, fertile couples get pregnant in a period cycle? Unfortunately I didn’t find a source that explicitly stated this, but I found something close. On page 53 in *Increased Infertility With Age in Men and Women* Dunson et al. (2004) give the proportion of couples trying to conceive who did not get pregnant within 12 cycles, stratified by the age of the woman:

prop_not_preg_12_cycles <- c("19-26 years" = 0.08,
"27-34 years" = 0.13,
"35-39 years" = 0.18)

Using some back-of-the-R-script calculations I calculated the probability to conceive in a cycle: As these proportions presumably include infertile couples I started by subtracting 0.05, the proportion of couples that I assumed are infertile. My wife was in the *27-34 years* bracket so the probability of us not conceiving within 12 cycles, given that we are fertile, was then 0.13 - 0.05. If *p* is is the probability of not getting pregnant during one cycle, then $p^{12} = 0.13 - 0.05$ is the probability of not getting pregnant during twelve cycles and, as *p* is positive, we have that $p = (0.135 - 0.05)^{1/12}$. The probability of getting pregnant in one cycle is then *1 - p* and the probabilities for the three age groups are:

1 - (prop_not_preg_12_cycles - 0.05)^(1/12)

## 19-26 years 27-34 years 35-39 years
## 0.25 0.19 0.16

So that’s where the 19% percent probability of conceiving came from in the log likelihood function above, and 19% is what I used as a prior for `is_pregnant`

. Now I had priors for all parameters and I could construct a function that returned samples from the prior:

sample_from_prior <- function(n) {
prior <- data.frame(mean_period = rnorm(n, 27.7, 2.4),
sd_period = abs(rnorm(n, 0, 2.05)),
is_fertile = rbinom(n, 1, 0.95))
prior$is_pregnant <- rbinom(n, 1, 0.19 * prior$is_fertile)
prior$next_period <- rnorm(n, prior$mean_period, prior$sd_period)
prior$next_period[prior$is_pregnant == 1] <- NA
prior
}

It takes one argument (`n`

) and returns a `data.frame`

with `n`

rows, each row being a sample from the prior. Let’s try it out:

sample_from_prior(n = 4)

## mean_period sd_period is_fertile is_pregnant next_period
## 1 29 1.24 1 0 30
## 2 29 3.73 1 0 28
## 3 27 1.29 1 1 NA
## 4 27 0.57 0 0 27

Notice that `is_pregnant`

can only be `1`

if `is_fertile`

is `1`

, and that there is no `next_period`

if the couple `is_pregnant`

.

I had now collected the triforce of Bayesian statistics: The prior, the likelihood and the data. There are many algorithms I could have used to fit this model, but here a particularly convenient method was to use *importance sampling*. I’ve written about importance sampling before, but let’s recap: Importance sampling is a Monte Carlo that is *very easy* to setup and that can work well if (1) the parameters space is small and (2) the priors are not too dissimilar from the posterior. As my parameter space *was* small and because I used pretty informative priors I though importance sampling would suffice here. The three basic steps in importance sampling are:

- Generate a large sample from the prior. (This I could do using
`sample_from_prior`

.) - Assign a weight to each draw from the prior that is proportional to the likelihood of the data given those parameters. (This I could do using
`calc_log_like`

.) - Normalize the weights to sum to one so that they now form a probability distribution over the prior sample. Finally, resample the prior sample according to this probability distribution. (This I could do using the R function
`sample`

.)

(Note that there are some variations to this procedure, but when used to fit a Bayesian model this is a common version of importance sampling.)

The result of using importance sampling is a new sample which, if the importance sampling worked OK, can be treated as a sample from the posterior. That is, it represents what the model knows after having seen the data. Since I already had defined `sample_from_prior`

and `calc_log_like`

, defining a function in R doing importance sampling was trivial:

sample_from_posterior <- function(days_since_last_period, days_between_periods, n_samples) {
prior <- sample_from_prior(n_samples)
log_like <- sapply(1:n_samples, function(i) {
calc_log_like(days_since_last_period, days_between_periods,
prior$mean_period[i], prior$sd_period[i], prior$next_period[i],
prior$is_fertile[i], prior$is_pregnant[i])
})
posterior <- prior[ sample(n_samples, replace = TRUE, prob = exp(log_like)), ]
posterior
}

So, on the 21st of February, 2015, my wife had not had her period for 33 days. Was this good news? Let’s run the model and find out!

post <- sample_from_posterior(33, days_between_periods, n_samples = 100000)

`post`

is now a long data frame where the distribution of the parameter values represent the posterior information regarding those parameters.

head(post)

## mean_period sd_period is_fertile is_pregnant next_period
## 33231 28 2.8 0 0 37
## 22386 27 2.4 1 1 NA
## 47489 27 2.1 1 1 NA
## 68312 28 2.3 1 1 NA
## 37341 29 2.9 1 1 NA
## 57957 30 2.6 1 0 36

Let’s start by looking at the mean and standard deviation of the number of days between each period:

As expected the posteriors are more narrow than the priors and, looking at the posteriors, it’s probable that the mean period cycle is around 29 days with a SD of 2-3 days. Now to the important questions: What’s the probability that we are a fertile couple and what’s the probability that we were pregnant on the 21st of February? To calculate this we can just take `post$is_fertile`

and `post$is_pregnant`

and calculate the proportion of `1`

s in these vectors. A quick way of doing this is just to take the `mean`

:

```
mean(post$is_fertile)
```

```
## [1] 0.97
```

```
mean(post$is_pregnant)
```

```
## [1] 0.84
```

So it *was* pretty good news: It’s very probable that we are a fertile couple and the probability that we were pregnant was 84%! Using this model I could also see how the probability of us being pregnant would change if the period onset would stay away a couple of days more:

post <- sample_from_posterior(34, days_between_periods, n_samples = 100000)
mean(post$is_pregnant)

```
## [1] 0.92
```

post <- sample_from_posterior(35, days_between_periods, n_samples = 100000)
mean(post$is_pregnant)

```
## [1] 0.96
```

Yeah, while we are at it, why not see how the probability of us being fertile and pregnant changed during the months we were trying to conceive:

So, this make sense. As the time since the last period gets longer the probability that we are going to be pregnant the current cycle increases, but as soon as there is a period onset that probability falls down to baseline again. We see the same pattern for the probability of being fertile, but for every period cycle we didn’t get pregnant the probability of us being fertile gets slightly lower. Both these graphs are a bit jagged, but this is just due to the variability of the importance sampling algorithm. Also note that, while the graphs above are pretty, there is no real *use* looking at the probability over time, the only thing that’s informativ and matters is the *current* probability.

- It’s of course possible to get much better priors than my back-of-the-envelope calculation here. There are also many more predictors that one could include, like the age of the man, health factors, and so on.
- The probability of getting pregnant each month could/should be made uncertain rather than giving it a fixed value, as I did. But I thought that would be one parameter to many given the little data I had.
- Nothing is
*really*normally distributed and neither is the length between periods. Here I think that assumption works well enough, but there are much more complex models of period cycle length, for example, that by Bortot et al (2010). - My model is so simple I suspect that you could almost fit it analytically. But I’m lazy and my computer is not, so importance sampling it was!

But all of this doesn’t really matter now. And the probabilities I calculated do not matter either. Before-the-fact probabilities reflect uncertainty regarding an outcome or a statement, but after-the-fact there is no uncertainty left. What uncertainty the probabilities represented is gone. I’m certain that my wife and I *were* pregnant on the 21st of February, and I know this because one week ago, on the 29th of October, we received this little guy:

If probabilities matter to *you* you can find a script implementing this model here which you can run on your own data, or a friend’s. In this post I left out the code creating the plots, but all of that can be found here. And, as this post was really just an excuse to post baby photos, here are some more of me and my son checking out the statistical literature:

Doing Bayesian Data Analysis makes him a bit sleepy, but that’s OK, he’ll come around! Looking at Fisher’s Statistical Methods for Research Workers on the other hand makes him *furious*…

Bortot, P., Masarotto, G., & Scarpa, B. (2010). Sequential predictions of menstrual cycle lengths. *Biostatistics*, 11(4), 741-755. doi: 10.1093/biostatistics/kxq020

Cole, L. A., Ladner, D. G., & Byrn, F. W. (2009). The normal variabilities of the menstrual cycle. *Fertility and sterility*, 91(2), 522-527. doi: 10.1016/j.fertnstert.2007.11.073

Dunson, D. B., Baird, D. D., & Colombo, B. (2004). Increased infertility with age in men and women. *Obstetrics & Gynecology*, 103(1), 51-56. doi: 10.1097/01.AOG.0000100153.24061.45

Van Geloven, N., Van der Veen, F., Bossuyt, P. M. M., Hompes, P. G., Zwinderman, A. H., & Mol, B. W. (2013). Can we distinguish between infertility and subfertility when predicting natural conception in couples with an unfulfilled child wish?. *Human Reproduction*, 28(3), 658-665. doi: 10.1093/humrep/des428

Wilcox, A. J., Dunson, D., & Baird, D. D. (2000). The timing of the “fertile window” in the menstrual cycle: day specific estimates from a prospective study. *Bmj*, 321(7271), 1259-1262. doi: 10.1136/bmj.321.7271.1259, pdf

]]>

Romantic kissing is a cultural universal, right? Nope! At least not if you are to believe Jankowiak et al. (2015) who surveyed a large number of cultures and found that “sexual-romantic kissing” occurred in far from all of them. For some reasons the paper didn’t include a world map with these kissers and non-kissers plotted out. So, with the help of my colleague Andrey Anikin I’ve now made such a map using R and the excellent leaflet package. Click on the image below to check it out:

Jankowiak, W.R., Volsche, S.L. & Garcia, J.R., 2015. Is the Romantic-Sexual Kiss a Near Human Universal? *American Anthropologist*, 117(3), 535-539. doi: 10.1111/aman.12286 pdf

The BASIC programming language was at one point the most widely spread programming language. Many home computers in the 80s came with BASIC (like the Commodore 64 and the Apple II), and in the 90s both DOS and Windows 95 included a copy of the QBasic IDE. QBasic was also the first programming language I encountered (I used it to write a couple of really horrible text adventures). Now I haven’t programmed in BASIC for almost 20 years and I thought I would revisit this (from my current perspective) really weird language. As I spend a lot of time doing Bayesian data analysis, I though it would be interesting to see what a Bayesian analysis would look like if I only used the tool that I had 20 years ago, that is, BASIC.

This post walks through the implementation of the Metropolis-Hastings algorithm, a standard Markov chain Monte Carlo (MCMC) method that can be used to fit Bayesian models, in BASIC. I then use that to fit a Laplace distribution to the most adorable dataset that I could find: The number of wolf pups per den from a sample of 16 wold dens. Finally I summarize and plot the result, still using BASIC. So, the target audience of this post is the intersection of people that have programmed in BASIC *and* are into Bayesian computation. I’m sure you are out there. Let’s go!

There are many many different version of BASIC, but I’m going for the one that I grew up with: Microsoft QBasic 1.1 . Now, QBasic is a relatively new BASIC dialect that has many advanced features such as user defined types and (gasp!) *functions*. But I didn’t use any fancy functions back in the 90s, and so I’m going to write old school BASIC using line numbers and `GOTO`

, which means that the code should be relatively easy to port to, say, Commodore 64 BASIC.

Getting QBasic is easy as it seems to be freeware, and can be downloaded here. Unless you are still using DOS, the next step would be to install the DOSBox emulator. Once you’ve started up `QBASIC.EXE`

you are greeted with a friendly, bright blue IDE which you can try out by entering this customary *hello world* script that will clear the screen and print “HELLO WORLD”.

Note that in QBasic the line numbers are not strictly necessary, but in older BASICs (like the one for Commodore 64) they were necessary as the program was executed in the order given by the line numbers.

We are going to implement the Metropolis-Hastings algorithm, a classic Markov chain Monte Carlo (MCMC) algorithm which is one of the first you’ll encounter if you study computational methods for fitting Bayesian models. This post won’t explain the actual algorithm, but the Wikipedia article is an ok introduction.

The Bayesian model we are going to implement is simple univariate Laplace distribution (just to once in awhile give the Normal distribution a day off). The Laplace is similar to the Normal distribution in that it is continuous and symmetric, but it is more peaked and has wider tails. It has two parameters: A location $\mu$, which then also defines its mean and median, and a scale $b$ which defines the width of the distribution.

*Image source, Credits: IkamusumeFan*

Like the sample mean is the maximum likelihood estimator for the mean of a Normal distribution, the sample median is the maximum likelihood estimator for the location parameter $\mu$ of a Laplace distribution. That’s why I, somewhat sloppily, think of the Normal distribution as *the* “mean” distribution and of the Laplace distribution as *the* “median” distribution. To turn this into a fully Bayesian models we need prior distributions over the two parameters. Here I’m just going to be sloppy and use a $\text{Uniform}(-\infty, \infty)$ over $\mu$ and $\log(b)$, that is, $\text{P}(\mu),\text{P}(\log(b)) \propto 1$. The full model is then

$$ x \sim \text{Laplace}(\mu, b) \ \mu \sim \text{Uniform}(-\infty, \infty) \ \log(b) \sim \text{Uniform}(-\infty, \infty) \$$

The data we are going to use is probably the cutest dataset I’ve worked with so far. It consists of counts of the number of wolf pups in a sample of 16 wolf dens (source):

*Image Source, Credits: spacebirdy / CC-BY-SA-3.0*

Before delving into BASIC, here is a reference implementation in R of what we hope to achieve using BASIC:

# The wolf pups dataset
x <- c(5, 8, 7, 5, 3, 4, 3, 9, 5, 8, 5, 6, 5, 6, 4, 7)
# The log posterior density of the Laplace distribution model, when assuming
# uniorm/flat priors. The Laplace distribution is not part of base R but is
# available in the VGAM package.
model <- function(pars) {
sum(VGAM::dlaplace(x, pars[1], exp(pars[2]), log = TRUE))
}
# The Metropolis-Hastings algorithm using a Uniform(-0.5, 0.5) proposal distribution
metrop <- function(n_samples, model, inits) {
samples <- matrix(NA, nrow = n_samples, ncol = length(inits))
samples[1,] <- inits
for(i in 2:n_samples) {
curr_log_dens <- model(samples[i - 1, ])
proposal <- samples[i - 1, ] + runif(length(inits), -0.5, 0.5)
proposal_log_dens <- model(proposal)
if(runif(1) < exp(proposal_log_dens - curr_log_dens)) {
samples[i, ] <- proposal
} else {
samples[i, ] <- samples[i - 1, ]
}
}
samples
}
samples <- metrop(n_samples = 1000, model, inits = c(0,0))
# Plotting a traceplot
plot(samples[,1], type = "l", ylab = expression(Location ~ mu), col = "blue")
# Calculating median posterior and 95% CI discarding the first 250 draws as "burnin".
quantile(samples[250:1000,1], c(0.025, 0.5, 0.975))

## 2.5% 50% 97.5%
## 4.489 5.184 6.144

(For the love of Markov, do not actually use this R script as a reference, it is just there to show what the BASIC code will implement. If you want to do MCMC using Metropolis-Hastings in R, check out the `MCMCmetrop1R`

in the MCMCpack package, or this Metropolis-Hastings script.

R is sometimes called “a quirky language”, but the script above is a wonder of clarity and brevity compared to how the BASIC code is going to look…

Let’s start by clearing the screen (`CLS`

) and defining the variables we need. Here `DIM`

defines arrays and matrices.

Already in this short piece of code there are a lot of fun quirkiness going on:

- As we are trying to write old school BASIC there are line numbers everywhere. This is not something that gets added automatically, they have to be written my
*me*, and sometimes I forget a row. Hence a good strategy is to increment line numbers by 10 so that you can add missing line in between afterwards (like`75`

above). As there are no functions in old school BASIC the line numbers will be used to jump around in the program using statements like`GOSUB`

and`GOTO`

. - ALL STATEMENTS ARE IN UPPECASE and if you write a statement in lowercase it WILL GET CORRECTED TO UPPERCASE.
- Why the exclamation mark in
`SAMPLES!`

? In QBasic all variables are integers by default and to change the type you*append*a symbol to the variable name. So`THIS$`

is a string while`THAT!`

is a floating point number. Since our parameters are continuous most of our variables will end with a`!`

. - The
`DATA`

command is a relatively nice way to put data into the program, which can later be retrieved sequentially with the`READ`

function. If you have more than one dataset? Tough luck…

Let’s continue with defining the model. It’s going to be written as a *subroutine* which is an isolated piece of code that ends with a `RETURN`

statement. It can then be called with a `GOSUB <line>`

statement which jumps to `<line>`

, executes the code up to the `RETURN`

statement and then jumps back to the line after the `GOSUB`

. It is a little bit like a poor man’s function as you can’t pass in any arguments or return any values. But that’s not a problem *since all variable names are global anyway*, your subroutine can still access and set any variables as long as you are careful not to use the same variable name for two different things. The subroutine below assumes that the vector `PARAMS!`

is set and will overwrite the `LOGDENS!`

variable. The subroutine also uses `READ X!`

to read in a value from the `DATA`

and `RESTORE`

to reset the `READ`

to the first data point.

Here is now the Metropolis-Hastings algorithm. Nothing new here, except for that we now use `GOSUB 520`

to jump into the model subroutine at line `520`

and that we use the RND statement to generate uniformly random numbers between 0.0 and 1.0.

Now we could piece it all together by putting the following code directly after the initial declarations:

This will fill up `SAMPLES!`

with `1000`

MCMC draws from the posterior distribution of the location ($\mu$) and the log scale ($\log(b)$). But that won’t make us any happier, we still would need to summarize and plot the posterior. The code below takes the `SAMPLES!`

matrix and calculates the mean and SD of the posterior, and a 95% credible interval (I usually prefer the median posterior and a quantile interval, but that would require much more coding, as there is no built in method for sorting arrays in BASIC.)

Some more BASIC strangeness here is that `PRINT "HELLO", "YOU!"`

would produce print `HELLO YOU!`

while `PRINT "HELLO"; "YOU!"`

would print `HELLOYOU!`

. That is, whether the separator is `,`

or `;`

decides whether there will be a space between printed items.

Finally we would like to draw a traceplot to make sure that the Markov chain looks ok. While there is no built in function for sorting, there is great graphics support! The bulk of the code below is mostly just rescaling the sample coordinates so that they fit the screen resolution (320 x 200). Notice also the fun syntax for drawing lines: `LINE (<x1>, <y1>)-(<x2>, <y2>)`

. That many of the BASIC statements feature “custom” syntax might be a symptom of that `LINE`

, `PRINT`

, `IF`

, etc. are *statements* and not functions.

And finally we might want to write the samples to disk:

Now we are finally ready to piece everything together, sampling and plotting!

The last `GOTO`

just jumps to the last line of the program which makes the program exit. If we set DOSBox to emulate the speed of a 33 MHz 386 PC (according to this page), it takes about a minute to generate the 1000 samples and summarize the result. Here is the output:

As drawing the lines of the traceplot is really slow we get the animation above “for free”. :)

So what about the wolf pups? Well, this exercise wasn’t really about data analysis, but looking at the posterior summary above it seems like a good guess is that the median number of wolf pups in a den is around 4.5 to 6.0. I should also point out that both the BASIC code and the “reference” implementation result in similar posteriors:

Here is the full program in text format (MCMCDEMO.BAS), or, if you prefer, as a screenshot:

All in all, it seems like I could have done pretty advanced Bayesian statistics using the computational tools I had access to 20 years ago! (Of course, one could easily have implemented a similar program in C or FORTRAN much much earlier…) If this post has inspired you, and you wish to use BASIC as your main tool for scientific computing (please don’t!), you’ll find a host of QBasic tutorials here. You can also dive directly into the wonderful world of BASIC using this excellent javascript emulation of Apple II BASIC.

- Programming with line numbers,
`GOTO`

and`GOSUB`

just feels*very*old school. - It is nice to not have to type a lot of brackets and parentheses all the time. (But then, on the other hand, there are a lot of verbose
`END IF`

s and`NEXT I`

s instead.) - Running the program above on a 386 PC was pretty quick, but it would be cool to know if it would have been possible to implement and run Metropolis-Hastings on a much slower computer, say a Commodore 64…
- It is really difficult to program when all variables live in the same global namespace! When programming this short program I still produced many bugs where I used the same variable name twice.
- BASIC isn’t
*one*language but a set of similar languages, and while I’ve tried to write the code above using fairly old school BASIC I don’t think it would run on any other BASIC than QBasic without any modification. For example, Commodore 64 BASIC only allows you have one command following an`IF`

-statement, and there was no`ELSE`

! So if you wanted to execute many commands as the result of an`IF`

statement you would have to fake it with`GOTO`

statements instead. - While the QBasic language might seem a bit quirky if you are used to modern languages, the QBasic IDE is actually surprisingly friendly! Just like many modern IDEs (like, for example, Rstudio) it has:
- Automatic syntax checking, you get direct feedback when you’ve written something wrong.
- A built in help system that works great!
- A debugger!
- An interactive prompt.
- Automatic code formatting.
`print 5*5+ 7`

→`PRINT 5 * 5 + 7`

This is a screencast of my UseR! 2015 presentation: Tiny Data, Approximate Bayesian Computation and the Socks of Karl Broman. Based on the original blog post it is a *quick’n’dirty* introduction to approximate Bayesian computation (and is also, in a sense, an introduction to Bayesian statistics in general). Here it is, if you have 15 minutes to spare:

The video is short and makes a lot of simplifications/omissions, some which are:

- There are not just one, but many many different algorithms for doing approximate Bayesian computation, where the algorithm outlined in the video is called ABC rejection sampling. What make these
(and not just**approximate**Bayesian computational methods*Bayesian computational methods*) is that they require, what I have called, a generative model and an acceptance criterion. What I callin the video (but which is normally just called**standard**Bayesian computation*Bayesian computation*) instead requires a function that calculates*the likelihood*given the data and some fixed parameter values. - I mention in the video that approximate Bayesian computation is the slowest way you can fit a statistical model, and for many common statistical models this is the case. However for some models it might be very expensive to evaluate the likelihood, and in that case approximate Bayesian computation can actually be faster. As usual, it all depends on the context…
- I mention “drawing random parameter values from the prior”, or something similar, in the video. Invoking “randomness” always makes me a bit uneasy, and I just want to mention that the purpose of “drawing random parameters” is just to get a vector/list of parameter values that is a good enough representation of the prior probability distribution. It just happens to be the case that random number generators (like
`rnbinom`

and`rbeta`

) are a convenient way of creating such representative distributions.

For a Slower’n’Cleaner introduction to approximate Bayesian computation I would actually recommend the Wikipedia page, which is pretty good!

]]>