Bayesian First Aid: Pearson Correlation Test

Correlation does not imply causation, right but, as Edward Tufte writes, “it sure is a hint.” The Pearson product-moment correlation coefficient is perhaps one of the most common ways of looking for such hints and this post describes the Bayesian First Aid alternative to the classical Pearson correlation test. Except for being based on Bayesian estimation (a good thing in my book) this alternative is more robust to outliers and comes with a pretty nice default plot. :)

Bayesian Fist Aid with Anscombe’s quartet

Bayesian First Aid is an attempt at implementing reasonable Bayesian alternatives to the classical hypothesis tests in R. For the rationale behind Bayesian First Aid see the original announcement. The development of Bayesian First Aid can be followed on GitHub. Bayesian First Aid is a work in progress and I’m grateful for any suggestion on how to improve it!

The Model

The classical test of Pearson product-moment correlation coefficient between two paired variables $x_i$ and $y_i$ assumes bivariate normality. It assumes that the relation is linear and that both $x_i$ and $y_i$ are normally distributed. I’ve already written about a Bayesian alternative to the correlation test here and about how that model can be made more robust here. The Bayesian First Aid alternative is basically the robustified version with slightly different priors.

Instead of a bivariate normal distribution we’ll assume a bivariate t distribution. This is the same trick as in the Bayesian First Aid alternative to the t-test, compared to the normal distribution the wider tails of the t will downweight the influence of stray data points. The model has six parameters: the means of the two marginal distributions ($\mu_x,\mu_y$), the SDs ($\sigma_x,\sigma_y$), the degree-of-freedoms parameter that influences the heaviness of the tails ($\nu$), and finally the correlation ($\rho$). The SDs and $\rho$ are then used to define the covariance matrix of the t distribution as:

$$ \Sigma = \begin{bmatrix} \sigma_x^2 & \rho\sigma_y\sigma_x \\ \rho\sigma_y\sigma_x & \sigma_y^2 \end{bmatrix} $$

The priors on $\mu_x,\mu_y, \sigma_x,\sigma_y$ and $\nu$ are also the same as in the alternative to the two sample t-test, that is, by peeking at the data we set the hyperparameters $M_\mu, S_\mu, L_\sigma$ and $H_\sigma$ resulting in a, for all practical purposes, flat prior. When modeling correlations it is common to directly put a prior distribution on the covariance matrix (the Inverse-Wishart distribution for example). Here we instead do as described by Barnard, McCulloch & Meng (2000) and put separate priors on $\sigma_x,\sigma_y$ and $\rho$, where $\rho$ is given a uniform prior. The advantage with separate priors is that it gives you greater flexibility and makes it easy to add prior information into the mix.

So, here is the final model for the Bayesian First Aid alternative to the correlation test:

Bayesian First Aid Correlation Model

The `bayes.cor.test` Function

The bayes.cor.test function can be called exactly as the cor.test function in R. If you just ran cor.test(x, y), prepending bayes. (like bayes.cor.test(x, y)) runs the Bayesian First Aid alternative and prints out a summary of the model result. By saving the output, for example, like fit <- bayes.cor.test(x, y) you can inspect it further using plot(fit), summary(fit) and diagnostics(fit).

To showcase bayes.cor.test I will use data from the article 2D:4D ratios predict hand grip strength (but not hand grip endurance) in men (but not in women) by Hone & McCullough (2012). The ratio here referred to is that between one’s index finger (2D) and ring finger (4D) which is nicely visualized in the Wikipedia entry for digit ratio:

2d4d example

So why is the 2D:4D ratio an interesting measure? It is believed that the 2D:4D ratio is affected by the prenatal exposure to androgens (hormones that control the development of male characteristics) with a lower 2D:4D ratio being indicative of higher androgen exposure. Stated in a sloppy way, the working hypothesis is that the 2D:4D ratio is a proxy variable for prenatal androgen exposure and could therefore be related to a host of other traits related to “manliness” such as aggression, prostate cancer risk, sperm count, etc. (see Wikipedia for a longer list). Hone & McCullough was interested in the relation between the 2D:4D ratio and a typical “manly” attribute, muscle strength. They therefore measured the grip strength (in kg) and the 2D:4D ratio in 222 psychology students (100 male, 122 female) and found that:

2D:4D ratios significantly predicted [grip strength] in men, but not in women. […] We conclude furthermore that the association of prenatal exposure to testosterone (as indexed by 2D:4D ratios) and strength pertains only to men, and not to women."

Two points regarding their analysis, before moving on:

They support the conclusion above by comparing the regression p-values between the male group (p < 0.001) and the female group (p = 0.09) and notice that while the regression coefficient is less than 0.05 for the male group it is more than 0.05 for female group. It is not clear to me how p = 0.09 is strong evidence that prenatal exposure to testosterone does not influence strength in women at all.
Hone & McCullough tests the null hypothesis that there is no correlation whatsoever between 2D:4D ratio and strength. This question could be answered without any data, of course there is some correlation between 2D:4D ratio and strength (no matter how tiny). In complex systems, such as humans, it would be extremely unlikely that any given trait (such as hair color, shoe size, movie taste, running speed, no of tweets, etc.) does not correlate at all with any other trait. More interesting is to what degree 2D:4D ratio and strength correlates (which Hone & McCullough also estimated but did not base the final conclusion on).

We’ll leave Hone & McCullough’s analysis behind us (their analysis included many more variables, for one thing) and will just look at the strength and digit ratio data. The data was scraped from two scatter plots using the great and free online tool WebPlotDigitizer (so it might differ slightly from the original data, though I was pretty thorough) and can be downloaded here: 2d4d_hone_2012.csv. Here is how it looks:

d <- read.csv("2d4d_hone_2012.csv")
qplot(ratio_2d4d, grip_kg, data = d, shape = I(1)) + facet_grid(sex ~ ., scales = "free")

plot of chunk unnamed-chunk-1

Let’s first estimate the correlation for the male group using the classical cor.test:

cor.test( ~ ratio_2d4d + grip_kg, data = d[d$sex == "male", ])

## 
## 	Pearson's product-moment correlation
## 
## data:  ratio_2d4d and grip_kg
## t = -3.612, df = 95, p-value = 0.0004879
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5115 -0.1591
## sample estimates:
##     cor 
## -0.3475

And now using the Bayesian First Aid version:

fit_male <- bayes.cor.test( ~ ratio_2d4d + grip_kg, data = d[ d$sex == "male",])
fit_male

## 
## 	Bayesian First Aid Pearson's Correlation Coefficient Test
## 
## data: ratio_2d4d and grip_kg (n = 97)
## Estimated correlation:
##   -0.34 
## 95% credible interval:
##   -0.52 -0.15 
## The correlation is more than 0 by a probability of 0.001 
## and less than 0 by a probability of 0.999

Both estimates are very similar and both indicate a slight negative correlation. Let’s look at the Bayesian First Aid plot:

plot(fit_male)

plot of chunk unnamed-chunk-5

This plot shows lots of useful stuff! At the top we have the posterior distribution for the correlation $\rho$ with a 95% highest density interval. At the bottom we see the original data with superimposed posterior predictive distributions (that is, the distribution we would expect a new data point to have). This is useful when assessing how well the model fits the data. The two ellipses show the 50% (darker blue) and 95% (lighter blue) highest density regions. The red histograms show the marginal distributions of the data with a smatter of marginal densities drawn from the posterior. Looking at this plot we see that the model fits quite well, however, we could be concerned with the right skewness of the ratio_2d4d marginal which is not captured by the model.

Now let’s look at the corresponding plot for the female data:

fit_female <- bayes.cor.test( ~ ratio_2d4d + grip_kg, data = d[ d$sex == "female",])
plot(fit_female)

plot of chunk unnamed-chunk-7

Indeed, 2D:4D ratio and strength seems to be slightly less correlated than for the male group, but to claim there is evidence for no correlation seems a bit unfounded. We could, however, take a look at the posterior difference in correlation between the male and the female group. To do this we first extract the MCMC samples from the Bayesian First Aid fit object using the as.data.frame function.

female_mcmc <- as.data.frame(fit_female)
male_mcmc <- as.data.frame(fit_male)
head(female_mcmc)

##         rho    mu1   mu2  sigma1 sigma2     nu
## 1  0.024912 0.9758 22.84 0.03196  6.209  33.14
## 2 -0.002173 0.9718 22.24 0.03336  6.082  57.15
## 3  0.013787 0.9773 21.39 0.03577  5.712  66.23
## 4  0.017139 0.9751 22.10 0.03779  5.687 129.93
## 5  0.014113 0.9779 21.75 0.03482  6.245 149.52
## 6  0.025849 0.9766 21.13 0.03525  6.320 114.16

We’ll then plot the difference in $\rho$ and calculate the probability that $\rho$ is more negative for men:

hist(male_mcmc$rho - female_mcmc$rho, 30, xlim = c(-1, 1), main = "", yaxt = "n")

plot of chunk unnamed-chunk-9

mean(male_mcmc$rho - female_mcmc$rho < 0)

## [1] 0.9321

Even though we can’t claim that 2D:4D ratio and strength is not correlated in women at all, there is some evidence that 2D:4D ratio and strength is more negatively correlated in men than in women. Given that I have a relatively high (less “manly”) 2D:4D ratio of 1.0 does it mean I have an excuse for my subpar arm strength? Nope, I should just go to the gym more often…

`bayes.cor.test` Model Code

A great thing with Bayesian data analysis is that it is simple to step away from the well trodden model and explore alternatives. The Bayesian First Aid function that makes this easy is model.code, if you just ran fit <- bayes.cor.test(exposure_to_puppies, rated_happiness) a call to model.code(fit) prints out corresponding R and JAGS code that can be directly copy-n-pasted into an R script and modified from there.

model.code(fit)

## Model code for the Bayesian First Aid alternative to Pearson's correlation test. ##
require(rjags)

# Setting up the data
x <- exposure_to_puppies 
y <- rated_happieness 
xy <- cbind(x, y)

# The model string written in the JAGS language
model_string <- "model {
  for(i in 1:n) {
    xy[i,1:2] ~ dmt(mu[], prec[ , ], nu) 
  }

  # JAGS parameterizes the multivariate t using precision (inverse of variance) 
  # rather than variance, therefore here inverting the covariance matrix.
  prec[1:2,1:2] <- inverse(cov[,])

  # Constructing the covariance matrix
  cov[1,1] <- sigma[1] * sigma[1]
  cov[1,2] <- sigma[1] * sigma[2] * rho
  cov[2,1] <- sigma[1] * sigma[2] * rho
  cov[2,2] <- sigma[2] * sigma[2]

  ## Priors  
  rho ~ dunif(-1, 1)
  sigma[1] ~ dunif(sigmaLow, sigmaHigh) 
  sigma[2] ~ dunif(sigmaLow, sigmaHigh)
  mu[1] ~ dnorm(mean_mu, precision_mu)
  mu[2] ~ dnorm(mean_mu, precision_mu)
  nu <- nuMinusOne+1
  nuMinusOne ~ dexp(1/29)
}"

# Initializing the data list and setting parameters for the priors
# that in practice will result in flat priors on mu and sigma.
data_list = list(
  xy = cbind(x, y), 
  n = length(x),
  mean_mu = mean(c(x, y), trim=0.2) ,
  precision_mu = 1 / (max(mad(x), mad(y))^2 * 1000000),
  sigmaLow = max(mad(x), mad(y)) / 1000 ,
  sigmaHigh = min(mad(x), mad(y)) * 1000)

# Initializing parameters to sensible starting values helps the convergence
# of the MCMC sampling. Here using robust estimates of the mean (trimmed)
# and standard deviation (MAD).
inits_list = list(mu=c(mean(x, trim=0.2), mean(y, trim=0.2)), rho=cor(x, y, method="spearman"), 
                  sigma = c(mad(x), mad(y)), nuMinusOne = 5)

# The parameters to monitor.
params <- c("rho", "mu", "sigma", "nu")
  
# Running the model
model <- jags.model(textConnection(model_string), data = data_list,
                    inits = inits_list, n.chains = 3, n.adapt=1000)
update(model, 500) # Burning some samples to the MCMC gods....
samples <- coda.samples(model, params, n.iter=5000)

# Inspecting the posterior
plot(samples)
summary(samples)

One thing that we might want to change is the prior on rho which is currently $\text{Uniform}(-1,1)$. As described by Barnard, McCulloch & Meng (2000), the beta distribution stretched to the interval [-1,1] is a flexible alternative that facilitates the construction of a more informative prior. To use this prior we’ll have to replace

rho ~ dunif(-1, 1)

rho_half_with ~ dbeta(1, 1)
# shifting and streching rho_half_with from [0,1] to [-1,1]
rho ~ 2 * rho_half_with - 1

and replace the initialization of rho with the initialization of rho_half_width:

inits_list = list(mu = c(mean(x, trim=0.2), mean(y, trim=0.2)), 
                  rho_half_width = (cor(x, y, method="spearman") + 1) / 2, 
                  sigma = c(mad(x), mad(y)), nuMinusOne = 5)

Now, by changing the parameters of dbeta we can specify a range of different priors:

plot of chunk unnamed-chunk-15

Here the top left prior is the uniform, representing little prior information on the correlation between measures of puppy exposure and happiness. The top right prior is skeptical of there being a large correlation but is agnostic with respect to the direction. The lower left prior is just slightly in favor of a positive correlation, with the lower right being quite optimistic putting more than 99% of the prior probability over rho > 0.

Posted by Rasmus Bååth | 2014-03-17 | Tags: Statistics, R, Bayesian

Publishable Stuff

Rasmus Bååth's Blog

Bayesian First Aid: Pearson Correlation Test

2014-03-17

The Model

The `bayes.cor.test` Function

`bayes.cor.test` Model Code

Bayesian First Aid: Pearson Correlation Test

2014-03-17

The Model

The bayes.cor.test Function

bayes.cor.test Model Code

The `bayes.cor.test` Function

`bayes.cor.test` Model Code