Anscombe’s quartet is a collection of four datasets that look radically different yet result in the same regression line when using ordinary least square regression. The graph below shows Anscombe’s quartet with imposed regression lines (taken from the Wikipedia article).
While least square regression is a good choice for dataset 1 (upper left plot) it fails to capture the shape of the other three datasets. In a recent post John Kruschke shows how to implement a Bayesian model in JAGS that captures the shape of both data set 1 and 3 (lower left plot). Here I will expand that model to capture the shape of all four data sets. If that sounds interesting start out by reading Kruschke’s post and I will continue where that post ended…
Ok, using a wide tailed t distribution it was possible to down weight the influence of the outlier in dataset 3 while still capturing the shape of dataset 1. It still fails to capture datasets 2 and 4 however. Looking at dataset 2 (upper right plot) it is clear that we would like to model this as a quadratic curve and what we would like to do is to allow the model to include a quadratic term when the data supports it (as dataset 2 does) and refrain from including a quadratic term when the data supports a linear trend (as in datasets 1 and 3). A solution is to include a quadratic term (b2) with a spike and slab prior distribution which is a mixture between two distributions one thin (spiky) distribution and one wide (slab) distribution. By centering the spike over zero we introduce a bit of automatic model selection into our model, that is, if there is evidence for a quadratic trend in the data then the slab will allow this trend to be included in the model, however, if there is little evidence for a quadratic trend then the spike will make the estimate of b2 practically equivalent to zero. In JAGS such a prior can be implemented as follows:
1 2 3 4
The argument to the dbern function indicates our prior belief that the data includes a quadratic term, in this case we think the odds are 50/50. The resulting prior looks like this: