This year’s UseR! conference was held at the University of California in Los Angeles. Despite the great weather and a nearby beach, most of the conference was spent in front of projector screens in 18° c (64° f) rooms because there were so many interesting presentations and tutorials going on. I was lucky to present my R package Bayesian First Aid and the slides can be found here:
There was so much great stuff going on at UseR! and here follows a random sample:
John Chambers on Interfaces, Efficiency and Big Data. One of the creators of S (the predecessor of R) talked about the history of R and exiting new developments such as Rcpp11. He was also kind enough to to sign my copy of S: An Interactive Environment for Data Analysis and Graphics, the original S book from 1984 :)
Yihui Xie the Knitr Ninja. Yihui held the most amazing presentation about how to be a Knitr ninja using only an R script and sound effects. The “anime sword” sound effect used by Yihui is just now available in the development version of beepr
and can be played by running beep("sword")
.
Romain François held both a tutorial and a presentation on the Rcpp11 package, a most convenient way of connecting R and C++.
Dirk Eddelbuettel held a keynote on the topic of R, C++ and Rcpp, another convenient way of connecting R and C++. Do we see a theme here? He also talked about Docker which I never heard of before, which allows sort of light-weight virtual machines which can be easily built and distributed (this is my interpretation, which might be a bit off).
Rstudio was otherwise running the show with great presentation with Winston Chang on ggvis, Joe Cheng on Shiny, J.J. Allaire and Kevin Ushey on Packrat - A Dependency Management System for R, Jeff Allen on The Next Generation of R Markdown and, of course, Hadley Wickham on dplyr: a grammar of data manipulation.
Dieter De Mesmaeker presented a poster on Rdocumentation.org a really nice web-interface to the documentation of R.
All in all, a great conference! I’m already looking forward to next years UseR! conference which will be held at Aalborg University, not too far from where I live (at least compared to LA).
]]>Even though I said it would never happen, my silly package with the sole purpose of playing notification sounds is now on CRAN. Big thanks to the CRAN maintainers for their patience! For instant gratification run the following in R to install beepr
and make R produce a notification sound:
install.packages("beepr")
library(beepr)
beep()
This package was previously called pingr
and included a ping()
function. By request from the CRAN maintainers it has been renamed in order to not be confused with the Unix tool ping. Consequently it is now called beepr
and includes a beep()
function instead. Other things that have changed since the original announcement is that it is now possible to play a custom wav-file by running beep("path/to/my_sound.wav")
and that a facsimile of the Facebook notification sound has been added and which can be played by running beep("facebook")
(thanks Romain Francois for the suggestion!).
For fun I made a little animation of the actual “ping” sound that plays when you run beep()
using the audio package and the animation package. Sure, the function is now called beep
but I still like the original sound :)
Here is the code:
library(audio)
library(animation)
# You would have to change this path to point to a valid wav-file
w <- load.wave("inst/sounds/microwave_ping_mono.wav")
w <- w[1000:7000] # Trim both the start and the end of the ping sound
plot_frame <- function(sample_i) {
old_par=par(mar=rep(0.1, 4));
plot(w[seq(1, sample_i)], type="l", xaxt="n", yaxt="n", ylim=c(-0.3, 0.3), col="darkblue")
text(x=3400, y=0.2, labels="beepr (former pingr)", cex=1.5)
text(x=3900, y=-0.2, labels="- now on CRAN!", cex=1.5)
par(old_par)
}
saveGIF(interval = 0.1, ani.width = 200, ani.height = 100, expr = {
# The animation
for(sample_i in seq(1, length(w), length.out=40)) {
plot_frame(sample_i)
}
# Just repeating the last image a couple of times...
for(i in 1:15) {
plot_frame(length(w))
}
})
]]>Does pill A or pill B save the most lives? Which web design results in the most clicks? Which in vitro fertilization technique results in the largest number of happy babies? A lot of questions out there involves estimating the proportion or relative frequency of success of two or more groups (where success could be a saved life, a click on a link, or a happy baby) and there exists a little known R function that does just that, prop.test
. Here I’ll present the Bayesian First Aid version of this procedure. A word of caution, the example data I’ll use is mostly from the Journal of Human Reproduction and as such it might be slightly NSFW :)
Bayesian First Aid is an attempt at implementing reasonable Bayesian alternatives to the classical hypothesis tests in R. For the rationale behind Bayesian First Aid see the original announcement. The development of Bayesian First Aid can be followed on GitHub. Bayesian First Aid is a work in progress and I’m grateful for any suggestion on how to improve it!
This is a straight forward extension of the Bayesian First Aid alternative to the binomial test which can be used to estimate the underlying relative frequency of success given a number of trials and, out of them, a number of successes. The model for bayes.prop.test
is just more of the same thing, we’ll just estimate the relative frequencies of success for two or more groups instead. Below is the full model where $\theta_i$ is the relative frequency of success estimated given $x_i$ successes out of $n_i$ trials:
bayes.prop.test
FunctionThe bayes.prop.test
function accepts the same arguments as the original prop.test
function, you can give it two vectors one with counts of successes and one with counts of trials or you can supply the same data as a matrix with two columns. If you just ran prop.test(successes, trials)
, prepending bayes.
(like bayes.prop.test(successes, trials)
) runs the Bayesian First Aid alternative and prints out a summary of the model result. By saving the output, for example, like fit <- bayes.prop.test(successes, trials)
you can inspect it further using plot(fit)
, summary(fit)
and diagnostics(fit)
.
To demonstrate the use of bayes.prop.test
I will use data from the Kinsey Institute for Research in Sex, Gender and Reproduction as described in the article Genital asymmetry in men by Bogaert (1997). The data consists of survey answers from 6544 “postpubertal males with no convictions for felonies or misdemeanours” where the respondents, among other things, were asked two questions:
I don’t know about you, but the first question I had was are right handed people more like to have it on the right, or perhaps the oposite? Here is the raw data given by Bogaert:
Seems like we have 4794 complete cases. Just looking at those with right or leftward disposition leaves us 1624 cases with 275 right leaners out of 1454 right-handers and 43 right leaners out of 170 non-right-handers. Bogaert uses a chi-square test to analyze this data but since we are interested in comparing proportions we’ll start with using prop.test
instead:
# Below, the data from the right handers are on the right, logical right? :)
n_right_leaners <- c(43, 275)
n_respondents <- c(170, 1454)
prop.test(n_right_leaners, n_respondents)
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: n_right_leaners out of n_respondents
## X-squared = 3.541, df = 1, p-value = 0.05989
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.007852 0.135468
## sample estimates:
## prop 1 prop 2
## 0.2529 0.1891
Not too bad, we get both a confidence interval on the difference between the groups and maximum likelihood estimates. Let’s compare this with the Bayesian First Aid version:
bayes.prop.test(n_right_leaners, n_respondents)
##
## Bayesian First Aid propotion test
##
## data: n_right_leaners out of n_respondents
## number of successes: 43, 275
## number of trials: 170, 1454
## Estimated relative frequency of success [95% credible interval]:
## Group 1: 0.26 [0.19, 0.32]
## Group 2: 0.19 [0.17, 0.21]
## Estimated group difference (Group 1 - Group 2):
## 0.07 [-0.0025, 0.13]
## The relative frequency of success is larger for Group 1 by a probability
## of 0.977 and larger for Group 2 by a probability of 0.023 .
Pretty similar estimates (and they should be) but now we also get estimates [and credible intervals] for both groups and the group difference. Looking at the estimates it seems like both left and right-handers tend to lean to the left much more often than to the right. We also get to know that the probability that left-handers lean more often to the right compared to right-handers is 97.7%. Interesting! Let’s look at this relation further by plotting the posterior distribution:
fit <- bayes.prop.test(n_right_leaners, n_respondents)
plot(fit)
So, it is most likely that left-handers lean more to the right by around 6-7 percentage points (and Bogaert discusses some reasons why this might be the case). Still the posterior of the group difference is pretty wide (and the credible interval kisses the 0.0) and even though the analysis is leaning (he he) towards there being a difference it would be nice to have a few more data points to get a tighter estimate.
bayes.prop.test
as a Replacement for chisq.test
I don’t like Pearson’s chi-squared test, it is used as a catch-all analysis for any tables of counts and what you get back is utterly uninformative: a p-value relating to the null hypothesis that the row variable is completely independent of the column variable (which is anyway known to be false a priori most of the time, see here and here for some discussion). If you have counts of successes for many groups and are interested in actually estimating group differences bayes.prop.test
can also be used as a replacement for chisq.test
. Let’s look at an example, again with data from the Journal of Human Reproduction.
When doing in vitro fertilization the egg is fertilized by sperm outside the body and later, if successfully fertilized, reinserted into the uterus. The egg and the sperms are usually left together to co-incubate for more than an hour before the egg is separated from the sperm and left to incubate for itself. Bungum et al. (2006) was interested in comparing if an ultra-short co-incubation period of 30 s. would work as well as a more conventional co-incubation period of 90 m. Bungum et al. used a 30 s. co-incubation period on 389 eggs and a 90 min. period on another batch of 388 eggs and looked at a number of measures such as the number of fertilized eggs and the number of resulting embryos graded as high quality. Their analysis of the result is summarized in the table below:
Unfortunately they use chi-square tests to analyze these counts and they don’t even report the full p-values, for all but one of the measures all we get to know is NS. Just looking at the raw data it seems like there is little difference between the proportions of fertilized eggs in the two groups, but there seems to be a difference in embryo quality with more embryos in the 90 min. group being of high quality (defined as grade 0 and 1). But the chi-square analysis says NS which is interpreted in the result section as: “the two groups were comparable”. Let’s analyze the data with bayes.prop.test
:
no_good_grade <- c(134, 152)
no_embryos <- c(228, 225)
fit <- bayes.prop.test(no_good_grade, no_embryos)
plot(fit)
Looking at the posterior it seems like there is actually some evidence for that the 30 s. procedure results in fewer embryos of good quality as most of the posterior probability is centered around a difference of 6-12% percentage points. Sure, the credible interval kisses zero, but the evidence for a small difference, which was hinted at in the original article, is definitely not strong.
Using the concept of a region of practical equivalence (ROPE) we can calculate the probability that the difference between the two procedures is small. First we have to decide what would count as a small enough difference to be negligible. I have no strong intuition about what would be a small difference in this particular case, so I’m arbitrarily going to go with 5 percentage points, yielding a ROPE of [-5, 5] percentage points (for more about ROPEs see Kruschke, 2011). To calculate the probability that the difference between the two groups is within the ROPE I’ll extract the MCMC samples generated when the model was fit using as.data.frame
and then I’ll use them to calculate the probability “by hand”:
s <- as.data.frame(fit)
mean(abs((s$theta1 - s$theta2)) < 0.05)
## [1] 0.201
The probability that the relative frequency of high quality embryos is practically equivalent between the two procedures is only 20%, thus the probability that there is a substantial difference is 80%. There is definitely weak evidence for “no difference” here but we would need more data to be able state the magnitude of the difference with reasonable certainty.
Caveat: I know very little about in vitro fertilization and this is definitely not a critique of the study in any way. I don’t know what would be considered a region of practical equivalence in this case and I don’t know if embryo quality is considered an important outcome. However, I still believe that the analysis would have been more informative if they would have used something better than chi-square tests and p-values!
Like prop.test
, bayes.prop.test
can be used to compare more than two groups. Here is an example with a dataset from the prop.test
help file on the number of smokers in four groups of patients with lung cancer.
smokers <- c( 83, 90, 129, 70 )
patients <- c( 86, 93, 136, 82 )
fit <- bayes.prop.test(smokers, patients)
fit
##
## Bayesian First Aid propotion test
##
## data: smokers out of patients
## number of successes: 83, 90, 129, 70
## number of trials: 86, 93, 136, 82
## Estimated relative frequency of success [95% credible interval]:
## Group 1: 0.96 [0.91, 0.99]
## Group 2: 0.96 [0.92, 0.99]
## Group 3: 0.94 [0.90, 0.98]
## Group 4: 0.85 [0.77, 0.92]
## Estimated pairwise group differences (row - column) with 95 % cred. intervals:
## Group
## 2 3 4
## 1 0 0.01 0.11
## [-0.063, 0.058] [-0.05, 0.068] [0.023, 0.2]
## 2 0.02 0.11
## [-0.042, 0.071] [0.023, 0.2]
## 3 0.1
## [0.013, 0.19]
plot(fit)
As is shown both in the print out and in the plot, group 4 seems to differ slightly from the rest. While bayes.prop.test
can be used to compare more than four groups both the printouts and the plots start to get a bit overwhelming when there are too many groups. A remedy for this is to use the model.code
function that prints out JAGS and R code that replicates the model you have fitted with bayes.prop.test
and to customize this code further to make the plots and comparisons you are interested in.
model.code(fit)
### Model code for the Bayesian First Aid ###
### alternative to the test of proportions ###
require(rjags)
# Setting up the data
x <- c(83, 90, 129, 70)
n <- c(86, 93, 136, 82)
# The model string written in the JAGS language
model_string <- "model {
for(i in 1:length(x)) {
x[i] ~ dbinom(theta[i], n[i])
theta[i] ~ dbeta(1, 1)
x_pred[i] ~ dbinom(theta[i], n[i])
}
}"
# Running the model
model <- jags.model(textConnection(model_string), data = list(x = x, n = n),
n.chains = 3, n.adapt=1000)
samples <- coda.samples(model, c("theta", "x_pred"), n.iter=5000)
# Inspecting the posterior
plot(samples)
summary(samples)
# You can extract the mcmc samples as a matrix and compare the thetas
# of the groups. For example, the following shows the median and 95%
# credible interval for the difference between Group 1 and Group 2.
samp_mat <- as.matrix(samples)
quantile(samp_mat[, "theta[1]"] - samp_mat[, "theta[2]"], c(0.025, 0.5, 0.975))
Another reason to modify the code that is printed out by model.code
is in order to change the assumptions of the model. The current model does not assume any dependency between the groups and if this is an unreasonable assumption you might want to modify the model code to include such a dependency. A nice example of how to extend the model to assume a hierarchical dependency between the relative frequencies of success of each group can be found on the LingPipe blog. Hierarchical binomial models are also discussed in chapter 9 in Kruschke’s Doing Bayesian Data Analysis and in section 5.3 in Gelman et al.’s Bayesian Data Analysis.
Bogaert, A. F. (1997). Genital asymmetry in men. Human reproduction, 12(1), 68-72. doi: 10.1093/humrep/12.1.68 . pdf
Bungum, M., Bungum, L., & Humaidan, P. (2006). A prospective study, using sibling oocytes, examining the effect of 30 seconds versus 90 minutes gamete co-incubation in IVF. Human Reproduction, 21(2), 518-523. doi: 10.1093/humrep/dei350
Kruschke, J. K. (2011). Bayesian assessment of null values via parameter estimation and model comparison. Perspectives on Psychological Science, 6(3), 299-312. doi: 10.1177/1745691611406925 . pdf
]]>As I’m more or less an autodidact when it comes to statistics, I have a weak spot for books that try to introduce statistics in an accessible and pedagogical way. I have therefore collected what I believe are all books that introduces statistics using comics (at least those written in English). What follows are highly subjective reviews of those four books. If you know of any other comic book on statistics, please do tell me!
I’ll start with a tl;dr version of the reviews, but first here are the four books:
Written in 1993 by Larry Gonick and Woollcott Smith this is still the book, out of the four reviewed, that feels most up to date. It covers a wide range of topics starting with summary statistics and basic probability and working its way through probability distributions, experimental design, confidence intervals and linear regression. It even touches on more advanced subjects such as resampling methods. The book is not written as a standard comic book with panels and a story line, rather it is a well written, easy going text that is accompanied by the fun and lively sketches by Larry Gonick.
The book doesn’t skimp on the mathematical notation, which might put some off, but the presentation is still more accessible than most of the introductory stats books I’ve come across. If you are interested in the statistical programming language R, a thing to note is that many of the graphs in the book are made with S, the precursor to R, like the scatter plot below:
Like all the books reviewed here the Cartoon Guide to Statistics is focused on frequentist statistics, but at least the Bayesian perspective gets a mention:
This is a great book which is witty and pleasant to read (+ it is pretty cheap). Highly recommended!
Amazon link: The Cartoon Guide to Statistics
This book, written by Shin Takahashi and illustrated by Iroha Inoue, is one in a long series of Manga Guides translated and published by No Starch Press. Like the Manga Guide to Databases or the Manga Guide to Calculus, the Manga Guide to Statistics takes a subject with a reputation for being difficult and technical and adds a cliché story full of huge eyes, naive schoolgirls and geeky geeks without any social ability.
As opposed to the Cartoon Guide to Statistics the Manga Guide reads more like a standard comic book with panels and a story line. The story centers around the schoolgirl Rui who wants to learn statistics to impress the handsome Mr. Igarashi. To her rescue comes Mr. Yamamoto, a stats nerd with thick glasses. The story and the artwork is archetypal manga (including very stereotype gender roles) but if you can live with that it is a pretty fun story.
Quite a lot of space in the book is devoted to the storyline rather that to teaching statistics and to compensate for this there are text passages interspersed between the manga passages. Still the book doesn’t cover that much ground and except for basic graphs and summary statistics the book is focused on classical hypothesis testing.
The book also describes the basic probability distributions but mostly from a hypothesis testing perspective, which feels a bit odd. I could imagine that without any background in statistics a reader of the Manga Guide would find it quite hard to understand the rational for actually performing hypothesis tests. Maybe a good companion will be the soon to be released Manga Guide to Regression Analysis…
Amazon link: The Manga Guide to Statistics
A book by Grady Klein and Alan Dabney with a similar aim and a deceptively similar name to The Cartoon Guide to Statistics. I, however, found the Cartoon Introduction lacking in almost all aspects compared to the Cartoon Guide. The artwork didn’t click with me, somehow it looks like the characters lack outlines and the illustrations distracts from the text rather than enhances it. The most irritating… aspect of the… book is that… most sentences are spread out… over many panels.
This “dilution” of the text makes it hard to follow and for a book of ~200 pages it covers relatively few concepts. I’ve also got a gripe with what the book covers as it is extremely focused on normal sampling distributions. Sure it is an important concept in classical statistics but it takes up a really large part of the book while other important concepts, such as probability, get little attention. This being a new book (2013) it is strange that it does not mention Bayesian statistics at all, it is actually the most frequentist book of all the books reviewed here. However, when looking at how confidence intervals are characterized it seems like the book actually is describing a Bayesian credible interval…
Amazon link: The Cartoon Introduction to Statistics
See also this review of the Cartoon Introduction to Statistics by Christian P. Robert.
Wow! This is an odd book that might be one of the worst introductions to statistics you can get as a beginner. The artwork by the surrealist painter Borin van Loon is truly surreal. Perhaps it is hip and cool (and perhaps I’m not cool enough), but it definitely does not add clarity to the subject the the book aims to introduce. The worst part of the book is the text which introduces statistical concepts and historical facts in a completely haphazard order. The book is very focused on early statistical scientists such as Galton, Pearson, Gosset and Fisher, and here is one of the many surreal versions of Pearson’s head:
Notice the speech balloon, coming out of Pearson’s mouth, which does not make sense. What does it mean to measure a group and why would the coefficient of variation help when measuring groups with “a great deal of variation”? In general there are a lot of statements in this book that does not make sense or that are plain wrong. A random sample:
“Vital statistics is concerned with averages whereas mathematical statistics deals with variation.” When characterizing the two “types” of statistics.
“There are two types of statistical distributions: probability distributions, which describe the possible events in a sample and the frequency with which each occur; and frequency distributions.”
“The Poisson distribution […] is a discrete probability distribution used to describe the occurrence of unlikely events in a large number of of independent repeated trials.” Why unlikely? Why a large number of trials?
“The standard deviation shows the deviation from the mean and the frequency of this deviation.” The frequency of a deviation?
“The variance is also a measure of variation, but [as opposed to the standard deviation,] it is used for random variables and indicates the extent to which its values are spread around the expected values.” The many expected values? So the variance is only for random variables…
The list could go on, so why not let it?
“[The Bayesian approach] is a means of calculating from the number of times that an event has not occurred to determine the probability that it will occur in future trials.” I can’t even begin to understand what this means…
”[Relative frequency] is a more scientific and objective approach than the other types of probability, and is used for finding out about the world and assessing actual existing objects. One can flip a coin 100 times and record the number of heads and tails and the ratio of the number of heads to the total number of flips.” Assessing actual existing objects? More scientific? What does this has to do with flipping a coin 100 times?
“[The chi-square distribution and the chi-square goodness of fit test’s] overriding significance was that statisticians could now use statistical methods that did not depend on the normal distribution to interpret their findings.”
“The expected values represent the average amount one ‘expects’ as the outcome of the random trial when identical odds are repeated many times.”
Or what about this panel on “higher mathematics”:
If it is not clear by now, this is not a book I recommend unless you really are looking for a surreal introduction to statistics…
Amazon link: Introducing Statistics, a graphic guide
All images and quotes included in this review are copyrighted by their respective copyrighted holders, however I believe that the inclusion of these quotes and images in in this review constitutes fair use.
]]>