Publishable Stuff

Rasmus Bååth's Blog


The source of the cake dataset

2024-01-28

In statistics, there are a number of classic datasets that pop up in examples, tutorials, etc. There’s the infamous iris dataset (just type iris in your nearest R prompt), the Palmer penguins (the modern iris replacement), the titanic dataset(s) (I hope you’re not a guy in 3rd class!), etc. While looking for a dataset to illustrate a simple hierarchical model I stumbled upon another one: The cake dataset in the lme4 package which is described as containing “data on the breakage angle of chocolate cakes made with three different recipes and baked at six different temperatures [as] presented in Cook (1938)1”. For me, this raised a lot of questions: Why measure the breakage angle of chocolate cakes? Why was this data collected? And what were the recipes?

I assumed the answers to my questions would be found in Cook (1938)1 but, after a fair bit of flustered searching, I realized that this scholarly work, despite its obvious relevance to society, was nowhere to be found online. However, I managed to track down that there existed a hard copy at Iowa State University, accessible only to faculty staff.

The tl;dr: After receiving help from several kind people at Iowa State University, I received a scanned version of Frances E. Cook’s Master’s thesis, the source of the cake dataset. Here it is:

Cook, Frances E. (1938). Chocolate cake: I. Optimum baking temperature. (Master’s thesis, Iowa State College).

It contains it all, the background, the details, and the cake recipes! Here’s some more details on the cake dataset, how I got help finding its source, and, finally, the cake recipes.

The cake dataset

The cake dataset can be found in the lme4 package with the following description:

Data on the breakage angle of chocolate cakes made with three different recipes and baked at six different temperatures. This is a split-plot design with the recipes being whole-units and the different temperatures being applied to sub-units (within replicates). The experimental notes suggest that the replicate numbering represents temporal ordering.

So for each of the $3 \times 6 = 18$ recipe and temperature combinations, Cook made 15 (!) replicates, resulting in a total of $3 \times 6 \times 15 = 270$ cakes/datapoints. Here’s the first couple of rows:

replicate recipe angle temperature
1 A 42 175
1 A 46 185
1 A 47 195
1 A 39 205
1 A 53 215
1 A 42 225
1 B 39 175

If you want the full dataset without getting lme4 here’s the cake dataset as a CSV file. Plotting this dataset we can quickly conclude that the cake breakage angle increases as a function of baking temperature:

While the cake dataset is found in lme4, the original source is Cochran and Cox’s book Experimental designs2. But what’s the original original source? Any why measure the cake breakage angle?

The hunt for the source of the cake dataset

From the lme4 documentation I knew that the cake dataset came from the study by Cook (1938)1 but no amount of Googling, Binging, nor Google Scholaring resulted in any trace of a digital copy. I did find that physical copies existed at Iowa State University and at Cornell, which presented a problem for me, being physically in Sweden. There was an option to request that the copy would be digitized, an option available to Iowa State faculty only.

Twitter to the rescue, I thought, and fired away a tweet that got a tumbleweed response. But, final proof for me that Twitter is dying, the same request on Mastodon ( come join me!) was an astounding success!

I got many helpful responses, with several pointing me directly at Iowa State staff that might help me out. Like this one from Karl Broman:

A quick e-mail later and I got this very encouraging e-mail from Dan Nettleton at the Department of Statistics, Iowa State:

He recruited the help of Philip M. Dixon, Department of Statistics, and Megan O’Donnell, Research Data Services Lead, and after a couple of days more I got this from Megan:

She (the busy Research Data Services Lead with a looming deadline) is apologizing to me (the random Swede with an eccentric cake thesis digitization request) that it took a few days to get me everything I asked for!? Still, the feeling of shame for having wasted Megan’s time was overshadowed by joy. Attached to the e-mail was, of course, also the full Master’s thesis of Frances E. Cook from 1938: Chocolate cake: I. Optimum baking temperature..

Highlights from Chocolate cake: I. Optimum baking temperature

Reading the thesis, it’s immediately clear that the breakage angle of cakes wasn’t the main focus. Instead, Cook was after some “accurate scientific information” on the optimum baking temperature for chocolate cake.

To figure out what was the best chocolate cake, she needed a battery of measures of cake goodness, such as cake tenderness, as measured objectively by its breaking angle. There were also several subjective measures, as found in the “Score Card for Cake” on page 50.

But how was the breaking angle of the cakes measured? In the thesis, we learn that “The tenderness of the cake was tested with the breaking angle apparatus as described by Myers (1936)3”, but there are no images that show us how it functioned. While I can’t find an online trace of Myers (1936)3 I do believe I’ve found a description of this very apparatus in Lowe and Nelson (1939)4!

From an outsider perspective, not being active in the field of culinary research myself, the thesis of Cook comes off as being fantastically serious about cake. I especially adore that it includes photographs of all the cakes:

But, to be fair, in the photos above, you can clearly see how the baking temperature influences the volume of the cake.

The cake recipes

Like in a food blog that has been SEOed to death, here, finally, at the very end, are the cake recipes. I might not be the most experienced cake maker, but this is by far the most complicated chocolate cake recipe I’ve ever seen.

Now, for the baking time and temperature above you get a matrix of options. The answer for which option to pick can be found a bit further down in table XV, which displays the total scores for each option.

The winner, when considering the dimensions texture, tenderness, velvetiness and eating quality, was Recipe C with a baking temperature of 225 C° (437 F°) for 24 minutes. I’m no cake scientist, but if a linear model is to be believed when extrapolating outside of the range of the dataset (always a good idea) this cake would be delicious when baked in a pizza oven!


  1. Cook, Frances E. (1938). Chocolate cake: I. Optimum baking temperature. (Master’s thesis, Iowa State College). ↩︎ ↩︎ ↩︎

  2. Cochran, W. G., and Cox, G. M. (1957) Experimental designs, 2nd Ed. New York, John Wiley & Sons. ↩︎

  3. Myers, Elizabeth. (1936). Plain Cake X. Effect of two temperatures of ingredients at time of combining on fat distribution as determined by microscopical examination. (Unpublished thesis, Iowa State College) ↩︎ ↩︎

  4. Lowe, Belle and Nelson, P. Mabel (1939) The physical and chemical characteristics of lards and other fats in relation to their culinary value. II. Use in plain cake. Iowa Agrigultural Research Bulletin 255. ↩︎

Posted by Rasmus Bååth | 2024-01-28 | Tags: R, Statistics