# Jeffreys’ Substitution Posterior for the Median: A Nice Trick to Non-parametrically Estimate the Median

While reading up on quantile regression I found a really nice hack described in Bayesian Quantile Regression Methods (Lancaster & Jae Jun, 2010). It is called Jeffreys’ substitution posterior for the median, first described by Harold Jeffreys in his Theory of Probability, and is a non-parametric method for approximating the posterior of the median. What makes it cool is that it is really easy to understand and pretty simple to compute, while making no assumptions about the underlying distribution of the data. The method does not strictly produce a posterior distribution, but has been shown to produce a conservative approximation to a valid posterior (Lavine, 1995). In this post I will try to explain Jeffreys’ substitution posterior, give R-code that implements it and finally compare it with a classical non-parametric test, the Wilcoxon signed-rank test. But first a picture of Sir Harold Jeffreys:

Say we have a vector $x$ and we are interested in the median of the underlying distribution. The property of the median is that it sits smack in the middle of the distribution. We would expect that, on average, half of the datapoints of a vector generated from a distribution would fall to the left of the median (and so the other half would fall to the right). Now, given the vector $x$ of length $n$ and a candidate for the median $m$, let $n_l$ be the number of values that fall to the left of $m$ and $n_r$ be the number of values that fall to the right. If $m$ was the actual median the probability of getting the $n_l$ and $n_r$ that we actually got could be calculated as

Or using notation:

Here is an example:

Jeffreys proposed to use the probability ${{n}\choose{n_l}} \big / 2^n$ as a substitution for the likelihood, $p(x \mid m)$, and then calculate $p(m \mid x) \propto p(x \mid m) \cdot p(m)$.

By doing this we get something posterior-like we can do the usual stuff with, calculate credible intervals, calculate the probability that the median is smaller than this and that, etc. Before doing this we will just rewrite the substitution likelihood $p(x \mid m)$ to make it easier to work with. We can rewrite the $n_l$-combinations statement as

resulting in

If we change the candidate for the median $m$ the only numbers that are going to change are $n_l!$ and $n_r!$, $n!$ and $2^n$ stays the same. So, if we are interested in an expression that is proportional to the likelihood we could just use

If we assume a uniform prior over $m$ we get that

Great! Now we have a method to go from a vector of data, $x$, to a posterior probability for the median, $p(m \mid x)$.

## Doing it in R

Using Jeffreys’ substitution posterior in R is pretty simple, but we have to be cautious not to run into numerical trouble. Directly calculating $1 / (n_l!n_r!)$ will only work for small datasets, already factorial(180) results in values so large that you’ll get an error in R. To overcome this problem we’ll do all the calculations on the log-scale as far as possible, using lfactorial instead. We’ll also use the fact that all candidate $m$ between two datapoints have the same likelihood, and so it suffices to calculate the likelihood once for each interval between two datapoints. First, here is the code for plotting the posterior:

Nice, we got a plot of the posterior which looks sort of like a skyscraper, but this is alright, the posterior is supposed to be a step function. Looking at the plot we can see that, most likely, the median is between -0.5 and 1.0 (which seems alright as we actually know that the true median is 0.0). However, the representation of the posterior that we currently have in R is not very useful if we would want to calculate credible intervals or the probability of the median being higher than, for example 0.0. A more useful format would be to have a vector of random samples drawn from the posterior. This is also pretty easy to do in R, but first we need a helper function:

Then we calculate the proportional likelihood of the median being in each interval, transforms this into a probability, and finally use these probabilities to sample from the posterior. I’m going to be slightly sloppy and disregard the posterior that lies outside the range of the data, if we have more than a dozen datapoints this sloppiness is negligible.

Now that we have a couple of samples in s we can do all the usual stuff such as calculating a 95% credible interval:

or plotting the posterior as a histogram with a superimposed 95% CI:

## A Handy Function to Copy-n-Paste

Here is a handy function you should be able to copy-n-paste directly into R. It packages all the calculations done above into one function that plots the posterior, calculates a 95% CI and returns a vector of samples that can be further inspected. I haven’t tested this function extensively so use it at your own risk and please tell me if you find any bugs.

## jeffreys_median vs wilcox.test

Finally, I thought I would compare Jeffreys’ substitution posterior with one of the most common classical non-parametric tests used, Wilcoxon signed-rank test. From the documentation for wilcox.test:

So wilcox.test gives us a pretty small p-value, which could be taken to support that the distribution of y - x is not symmetric about 0.0, but not much more. Let’s rerun the analysis but using the R function we defined earlier.

Looking at the posterior we see that the most likely median difference is around -0.5, but the difference is also likely to be as negative as -1 and as positive as 0.1. Using the samples s we can calculate the probability that the median difference is less than 0:

So, given the data and Jeffreys’ substitution posterior there is more than 95% probability that the median difference is less than 0. Compared with Wilcoxon signed-rank test I think Jeffreys’ substitution posterior is a good alternative when you are interested in the median but otherwise want to make as few assumptions as possible. However, I don’t want to generally recommend Jeffreys’ substitution posterior, as it doesn’t make strong assumptions almost any other model based on reasonable assumptions will be more powerful and will make better use of the data. Still, I think Jeffreys’ substitution posterior is a cool method with a clean, intuitive definition, and I just wonder: Why isn’t it better known?

## References

Jeffreys, H. (1961). Theory of Probability. Clarendon Press: Oxford. Amazon link

Lancaster, T., & Jae Jun, S. (2010). Bayesian quantile regression methods. Journal of Applied Econometrics, 25(2), 287-307. pdf

Lavine, M. (1995). On an approximate likelihood for quantiles. Biometrika, 82(1), 220-222. link (unfortunately behind paywall)