<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>R on Publishable Stuff</title>
    <link>https://sumsar.net/tags/r/</link>
    <description>Recent content in R on Publishable Stuff</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Mon, 04 Nov 2024 00:00:00 +0100</lastBuildDate><atom:link href="https://sumsar.net/tags/r/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A template for creating card sorting games in R</title>
      <link>https://sumsar.net/blog/card-sorting-games-template/</link>
      <pubDate>Mon, 04 Nov 2024 00:00:00 +0100</pubDate>
      
      <guid>https://sumsar.net/blog/card-sorting-games-template/</guid>
      <description>&lt;p&gt;Last week I made the small card sorting game called 
  &lt;a href=&#34;https://sumsar.net/blog/climate-impact-sorting-challenge/&#34;&gt;The Climate Impact Sorting Challenge&lt;/a&gt; where the challenge is to sort cards with different foods in the order of their climate impact. But then the thought hit me: Any time you find yourself with a dataset with labels (say, types of foods) mapped to numbers (say, climate impact in CO2e) you could turn that into a card sorting game! So, I created a template to facilitate this, and in this post, I’ll show you how to make card sorting games like these using R (or really any data-savvy language):&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/card-sorting-games-template/sorting-games.gif&#34;    width = &#34;477&#34; /&gt;&lt;/p&gt;
&lt;p&gt;But first, here are the games I’ve made so far:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It&amp;rsquo;s the &lt;a href=&#34;https://sumsar.net/blog/card-sorting-games-template/calories-sorting-challenge.html&#34;&gt;Calories Sorting Challenge&lt;/a&gt;! Place the cards in order of increasing calorie content. How many can you get right before you make a mistake? (
  &lt;a href=&#34;https://github.com/rasmusab/sorting-challenges-template/tree/main/sorting-challenges/calories-sorting-challenge&#34;&gt;source code&lt;/a&gt; )&lt;/li&gt;
&lt;li&gt;It&amp;rsquo;s the &lt;a href=&#34;https://sumsar.net/blog/card-sorting-games-template/countries-population-sorting-challenge.html&#34;&gt;Countries Population Sorting Challenge&lt;/a&gt;! Place the cards in order of increasing population. How many can you get right before you make a mistake? (
  &lt;a href=&#34;https://github.com/rasmusab/sorting-challenges-template/tree/main/sorting-challenges/countries-population-sorting-challenge&#34;&gt;source code&lt;/a&gt; )&lt;/li&gt;
&lt;li&gt;It&amp;rsquo;s the &lt;a href=&#34;https://sumsar.net/blog/card-sorting-games-template/presidents-sorting-challenge.html&#34;&gt;U.S. Presidents Sorting Challenge&lt;/a&gt;! Place the cards in order of when each U.S. president first took office. How many can you get right before you make a mistake? (
  &lt;a href=&#34;https://github.com/rasmusab/sorting-challenges-template/tree/main/sorting-challenges/presidents-sorting-challenge&#34;&gt;source code&lt;/a&gt; )&lt;/li&gt;
&lt;li&gt;It&amp;rsquo;s the &lt;a href=&#34;https://sumsar.net/blog/card-sorting-games-template/climate-impact-sorting-challenge.html&#34;&gt;Climate Impact Sorting Challenge&lt;/a&gt;! Place the cards in order of increasing climate impact. How many can you get right before you make a mistake? (
  &lt;a href=&#34;https://github.com/rasmusab/sorting-challenges-template/tree/main/sorting-challenges/climate-impact-sorting-challenge&#34;&gt;source code&lt;/a&gt; )&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;how-to-make-a-card-sorting-game-using-the-template&#34;&gt;How to make a card sorting game using the template?&lt;/h2&gt;
&lt;p&gt;In the 
  &lt;a href=&#34;https://github.com/rasmusab/sorting-challenges-template&#34;&gt;sorting-challenges-template GitHub repo&lt;/a&gt; you&amp;rsquo;ll find a file called 
  &lt;a href=&#34;https://github.com/rasmusab/sorting-challenges-template/blob/main/template-sorting-challenge.html&#34;&gt;template-sorting-challenge.html&lt;/a&gt; which is the whole game packaged into one stand-alone HTML file, except that it contains a number of placeholder variables that need to be filled in. Let&amp;rsquo;s define those in R:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(jsonlite)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(tidyverse)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(glue)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Define the placeholder variables to be inserted into the template&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# You can see a variable is a placeholder, as it uses a weird UpperCamelCase name.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;GameTitle &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Calorie Content Sorting Challenge&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;GameDescription &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Place the foods in order of increasing calorie content per 100g.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;AuthorName &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Rasmus Bååth&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Instructions &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  Drag and drop the food items to arrange them in order of increasing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  calorie content per 100g. See how many you can get right before making a mistake!
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;LeftGuidanceText &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;← Less calories&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;RightGuidanceText &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;More calories →&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;InfoText &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;&amp;lt;p&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  &amp;lt;b&amp;gt;About this game&amp;lt;/b&amp;gt;&amp;lt;br&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  This game helps you learn about the calorie content of different foods.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;lt;/p&amp;gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here we&amp;rsquo;re creating a small example game that&amp;rsquo;s about sorting foods according to their calorie content.
There&amp;rsquo;s one placeholder variable left, and that&amp;rsquo;s the data for the cards. Here we&amp;rsquo;ll fill that in with some made-up example calorie values:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Create the candidates cards, here with made up with food items, &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# but generally here&amp;#39;s where you would read in some data and munge it into &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# the target formar with the following columns:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# description: The name of the card&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# value: The numeric value of the card&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# display: The string that will be revealed on the card&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;candidate_cards &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;tribble&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;~&lt;/span&gt;description,       &lt;span style=&#34;color:#666&#34;&gt;~&lt;/span&gt;value, &lt;span style=&#34;color:#666&#34;&gt;~&lt;/span&gt;display,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Apple&amp;#34;&lt;/span&gt;,            &lt;span style=&#34;color:#40a070&#34;&gt;52&lt;/span&gt;,     &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;52 kcal per 100g&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Banana&amp;#34;&lt;/span&gt;,           &lt;span style=&#34;color:#40a070&#34;&gt;96&lt;/span&gt;,     &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;96 kcal per 100g&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Broccoli&amp;#34;&lt;/span&gt;,         &lt;span style=&#34;color:#40a070&#34;&gt;34&lt;/span&gt;,     &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;34 kcal per 100g&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Cheddar Cheese&amp;#34;&lt;/span&gt;,   &lt;span style=&#34;color:#40a070&#34;&gt;403&lt;/span&gt;,    &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;403 kcal per 100g&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Chicken Breast&amp;#34;&lt;/span&gt;,   &lt;span style=&#34;color:#40a070&#34;&gt;165&lt;/span&gt;,    &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;165 kcal per 100g&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;White Rice&amp;#34;&lt;/span&gt;,       &lt;span style=&#34;color:#40a070&#34;&gt;130&lt;/span&gt;,    &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;130 kcal per 100g&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Avocado&amp;#34;&lt;/span&gt;,          &lt;span style=&#34;color:#40a070&#34;&gt;160&lt;/span&gt;,    &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;160 kcal per 100g&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Salmon&amp;#34;&lt;/span&gt;,           &lt;span style=&#34;color:#40a070&#34;&gt;208&lt;/span&gt;,    &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;208 kcal per 100g&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Almonds&amp;#34;&lt;/span&gt;,          &lt;span style=&#34;color:#40a070&#34;&gt;579&lt;/span&gt;,    &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;579 kcal per 100g&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Dark Chocolate&amp;#34;&lt;/span&gt;,   &lt;span style=&#34;color:#40a070&#34;&gt;546&lt;/span&gt;,    &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;546 kcal per 100g&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Convert the candidate_cards dataframe to JSON format for the template&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;CandidateCardsArray &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;toJSON&lt;/span&gt;(candidate_cards, auto_unbox &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally we replace the placeholders in the template, write it to a file&amp;hellip;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;sorting_challenge_template &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;read_file&lt;/span&gt;(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;template-sorting-challenge.html&amp;#34;&lt;/span&gt; ) 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;example_sorting_game &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;glue&lt;/span&gt;(sorting_challenge_template, .open &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;{{&amp;#34;&lt;/span&gt;, .close &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;}}&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;write_file&lt;/span&gt;(example_sorting_game, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;example-sorting-game.html&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and presto, &lt;a href=&#34;https://sumsar.net/blog/card-sorting-games-template/example-sorting-game.html&#34;&gt;a game&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://sumsar.net/blog/card-sorting-games-template/example-sorting-game.html&#34;&gt;

&lt;img src=&#34;https://sumsar.net/blog/card-sorting-games-template/example-sorting-game.png&#34;    width = &#34;484&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s 
  &lt;a href=&#34;https://github.com/rasmusab/sorting-challenges-template/blob/main/create-example-sorting-game.R&#34;&gt;the full R script for creating this example game&lt;/a&gt;. Do let me know if you make something fun with it!&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>The Climate Impact Sorting Challenge</title>
      <link>https://sumsar.net/blog/climate-impact-sorting-challenge/</link>
      <pubDate>Sat, 19 Oct 2024 00:00:00 +0200</pubDate>
      
      <guid>https://sumsar.net/blog/climate-impact-sorting-challenge/</guid>
      <description>&lt;p&gt;Try out 
  &lt;a href=&#34;https://sumsar.net/climate-impact-sorting-challenge/&#34;&gt;The Climate Impact Sorting Challenge&lt;/a&gt;!
A quick game I just made that teaches you about the climate impact of different kinds of food.&lt;/p&gt;
&lt;p&gt;
  &lt;a href=&#34;https://sumsar.net/climate-impact-sorting-challenge/&#34;&gt;

&lt;img src=&#34;https://sumsar.net/blog/climate-impact-sorting-challenge/climate-impact-sorting-challenge.webp&#34;    width = &#34;600&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This game can be played in two different ways. You can play it by yourself and try to beat your own high score (mine is 9).
If you&amp;rsquo;re in a group, you can play &amp;ldquo;last man standing&amp;rdquo; style, where you take turns placing the cards. When someone misplaces a card, that person is out! (Strategy tip: If it&amp;rsquo;s beef, it&amp;rsquo;s bad).&lt;/p&gt;
&lt;h2 id=&#34;qa&#34;&gt;Q&amp;amp;A&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Where is this data from?&lt;/strong&gt;
The emission factors data is from 
  &lt;a href=&#34;https://denstoreklimadatabase.dk/en/background&#34;&gt;The Big Climate Database v1.2&lt;/a&gt;, specifically the Danish emission factors. I chose the Danish emission factors as they were the most complete, as this database is of Danish origin, but these emission factors will be roughly applicable to other European countries as well.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How was this made?&lt;/strong&gt;
This was put together mostly in a single day, mostly by me shouting at the computer until it did what I wanted (a.k.a. AI-driven development). I started out knowing I wanted to make a single HTML page app built on the 
  &lt;a href=&#34;https://github.com/SortableJS/Sortable&#34;&gt;SortableJS&lt;/a&gt; JavaScript library, which makes it easy to add drag-and-droppable lists to a webpage. Then, with some judicious prompting of the ChatGPT o1-preview model, I got a working game in ~1 hour. Here&amp;rsquo;s &lt;a href=&#34;https://sumsar.net/blog/climate-impact-sorting-challenge/initial-prompt.html&#34;&gt;the full transcript of this initial prompting history&lt;/a&gt;. I then whipped up a small R script to parse and insert the emission factors from The Big Climate Database into the game, and with some final tweaks and fixes, that was basically it. The full code is available 
  &lt;a href=&#34;https://github.com/rasmusab/climate-impact-sorting-challenge&#34;&gt;here on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Any caveats?&lt;/strong&gt;
Yes, lots. Notably, these emission factors are &lt;em&gt;averages&lt;/em&gt;, and the specific climate impact of any type of food can vary significantly depending on how the food is produced. Another thing to consider is that the emission factors are per kg of food. This does not take into account that different foods have different nutritional values. For example, a kg of butter has a much higher climate impact than a kg of lettuce, but a kg of butter also has a much higher energy content than a kg of lettuce.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>A Bayesian Plackett-Luce model in Stan applied to pinball championship data</title>
      <link>https://sumsar.net/blog/bayesian-plackett-luce-model-pinball-competition/</link>
      <pubDate>Sun, 22 Sep 2024 00:00:00 +0200</pubDate>
      
      <guid>https://sumsar.net/blog/bayesian-plackett-luce-model-pinball-competition/</guid>
      <description>&lt;p&gt;Sometimes it feels a bit silly when a simple statistical model has a
fancy-sounding name. But it also feels good to drop the following in
casual conversation: “Ah, then I recommend a Plackett-Luce model, a
straightforward generalization of the Bradley–Terry model, you know”,
when a friend wonders how they could model their, say, pinball
championship dataset. Incidentally, in this post we’re going to model
the result of the IFPA 18 World Pinball Championship using a
Plackett-Luce model, implemented in Stan as a generalization of the
Bradley–Terry model, you know.&lt;/p&gt;
&lt;p&gt;I know neither who Bradley, Terry, Plackett, nor Luce were, but I know
when their models could be useful:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A Bradley-Terry model can be used to model data where you have
&lt;em&gt;pairwise comparisons&lt;/em&gt; between different items, and you are interested
in the underlying “fitness” of the items. A concrete example is sports
where each match is a “pairwise comparison” between two players or
teams, and you assume each player or team has an underlying skill or
ability.&lt;/li&gt;
&lt;li&gt;A Plackett-Luce model can be useful when you have several &lt;em&gt;rankings&lt;/em&gt;
between items and you’re, again, interested in the “fitness” of each
item. This model could be used to assess the quality of different
products when each participant has ranked the items from best to
worst. Or, in a sports setting, it can be used to model the underlying
skills of each player when the outcome isn’t wins or losses, but
rankings. Just like you have in pinball championships.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, both models can model the skills of a number of players/teams, the
only difference being that a Bradley-Terry works with wins/losses and a
Plackett-Luce model works with rankings (1st/2nd/3rd/etc).&lt;/p&gt;
&lt;p&gt;Now we’re going to grab some data with rankings from the IFPA 18 World
Pinball Championship, implement a Bayesian Plackett-Luce model in Stan,
and then take it for a spin.&lt;/p&gt;
&lt;h2 id=&#34;ifpa-18-world-pinball-championship-dataset&#34;&gt;IFPA 18 World Pinball Championship dataset&lt;/h2&gt;
&lt;p&gt;Despite the name, 
  &lt;a href=&#34;https://www.ifpapinball.com/ifpa18/&#34;&gt;the IFPA &lt;strong&gt;18&lt;/strong&gt; World Pinball
Championship&lt;/a&gt; took place in
20&lt;strong&gt;23&lt;/strong&gt;. The IFPA Championship is generally considered the most
prestigious pinball competition, but the main reason why we’re going to
analyze it here is because the results of all the 480 pinball matches
that went down between the 80 competing players are available in 
  &lt;a href=&#34;https://docs.google.com/spreadsheets/d/1S0RnMooGOMkrUCdltDQGxe0I7Vc34p-1T2OChRCB2Hc/&#34;&gt;a
single
spreadsheet&lt;/a&gt;!
It’s not a very tidy dataset, however, and it will need some tidying up.
I won’t bore you with the details, unless you really want to know:&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;How the sausage gets tidied up&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(readxl)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(tidyverse)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# This data is NOT in a tidy format, and so the code to tidy it up&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# will also be fairly messy...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# The result of IFPA 18 World Pinball Championship was downloaded &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# from here: https://www.ifpapinball.com/ifpa18/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;ifpa_xlsx_path &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;IFPA 18 World Pinball Championship live results.xlsx&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Reading and tidying the sheet with the pinball machine names used in the tournament&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;machines_wide &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;read_xlsx&lt;/span&gt;(ifpa_xlsx_path, sheet &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Machines&amp;#34;&lt;/span&gt;, range &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;B2:F22&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;machines_long &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; machines_wide &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;select&lt;/span&gt;(old &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; OLD, mid &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; MID, new &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; NEW) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;pivot_longer&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    cols &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;everything&lt;/span&gt;(), names_to &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;game_category&amp;#34;&lt;/span&gt;, values_to &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;game_name&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# The two machines were missing from the original data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;add_row&lt;/span&gt;(game_category &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;old&amp;#34;&lt;/span&gt;, game_name &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Dodge City&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;add_row&lt;/span&gt;(game_category &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;old&amp;#34;&lt;/span&gt;, game_name &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;8 Ball&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;arrange&lt;/span&gt;(game_name)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Reading in the match data which is spread over multiple Session sheets, &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# each containing several sub-tables with the results of the games.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;ifpa_xlsx_info &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;expand_grid&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    session &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;paste&lt;/span&gt;(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Session&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#40a070&#34;&gt;8&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#06287e&#34;&gt;tibble&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      group &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#40a070&#34;&gt;20&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      group_pos &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;seq&lt;/span&gt;(&lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#40a070&#34;&gt;96&lt;/span&gt;, &lt;span style=&#34;color:#40a070&#34;&gt;5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;rowwise&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;mutate&lt;/span&gt;(session_df &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;list&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#06287e&#34;&gt;read_xlsx&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      ifpa_xlsx_path, sheet &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; session, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      range &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;paste0&lt;/span&gt;(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;A&amp;#34;&lt;/span&gt;, group_pos, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;:K&amp;#34;&lt;/span&gt;, group_pos &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;4&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      col_names &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  )) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;ungroup&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# A function to pivot the sub-tables into a tidy data frame&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pivot_session_df &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;function&lt;/span&gt;(s) {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  player_names &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; s&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;...1[2&lt;span style=&#34;color:#666&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#40a070&#34;&gt;5&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  game1 &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; s&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;...3[2&lt;span style=&#34;color:#666&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#40a070&#34;&gt;5&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  score1 &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; s&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;...5[2&lt;span style=&#34;color:#666&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#40a070&#34;&gt;5&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  game2 &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; s&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;...6[2&lt;span style=&#34;color:#666&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#40a070&#34;&gt;5&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  score2 &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; s&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;...8[2&lt;span style=&#34;color:#666&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#40a070&#34;&gt;5&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  game3 &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; s&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;...9[2&lt;span style=&#34;color:#666&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#40a070&#34;&gt;5&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  score3 &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; s&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;...11[2&lt;span style=&#34;color:#666&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#40a070&#34;&gt;5&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;tibble&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    player_name &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;rep&lt;/span&gt;(player_names, &lt;span style=&#34;color:#40a070&#34;&gt;3&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    round &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;rep&lt;/span&gt;(&lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#40a070&#34;&gt;3&lt;/span&gt;, each &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;4&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    game_name &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;c&lt;/span&gt;(game1, game2, game3),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    score &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;c&lt;/span&gt;(score1, score2, score3)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  )  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Stitching the data together, and turning it into long format.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Also, adding unique numerical identifiers for everything &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# as it&amp;#39;s going to make things easier when writing the Stan model.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;match_results &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ifpa_xlsx_info &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;mutate&lt;/span&gt;(session_df &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;map&lt;/span&gt;(session_df, pivot_session_df)) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;unnest&lt;/span&gt;(session_df) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;left_join&lt;/span&gt;( machines_long, by &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;game_name&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;mutate&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    session &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;str_remove&lt;/span&gt;(session, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Session &amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    player_id &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;as.integer&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;factor&lt;/span&gt;(player_name)),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    game_id &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;as.integer&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;factor&lt;/span&gt;(game_name)),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    game_category_id &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;recode&lt;/span&gt;(game_category, `old` &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;, `mid` &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;2&lt;/span&gt;, `new` &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;3&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    rank &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;recode&lt;/span&gt;(score, `7` &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;, `5` &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;2&lt;/span&gt;, `3` &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;3&lt;/span&gt;, `1` &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;4&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;group_by&lt;/span&gt;(session, group, round) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;mutate&lt;/span&gt;(round_id &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;cur_group_id&lt;/span&gt;()) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;select&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    session, group, round, round_id, player_name, player_id, score, rank,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    game_name, game_id, game_category, game_category_id
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;write_csv&lt;/span&gt;(match_results, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;ifpa-18-world-pinball-championship-match-results.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;The final &lt;a href=&#34;https://sumsar.net/blog/bayesian-plackett-luce-model-pinball-competition/ifpa-18-world-pinball-championship-match-results.csv&#34;&gt;tidy IFPA 18 Pinball Championship
dataset&lt;/a&gt; includes
the results from each of the eight qualifying sessions before the final
tournament:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(tidyverse)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;match_results &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;read_csv&lt;/span&gt;(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;ifpa-18-world-pinball-championship-match-results.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;match_results
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;# A tibble: 1,920 × 12
   session group round round_id player_name    player_id score  rank game_name  
     &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;    &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;              &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;      
 1       1     1     1        1 Michael Trepp         57     1     4 Little Joe 
 2       1     1     1        1 Bob Matthews          12     7     1 Little Joe 
 3       1     1     1        1 Mark Pearson          54     3     3 Little Joe 
 4       1     1     1        1 Escher Lefkoff        28     5     2 Little Joe 
 5       1     1     2        2 Michael Trepp         57     5     2 Jokerz     
 6       1     1     2        2 Bob Matthews          12     1     4 Jokerz     
 7       1     1     2        2 Mark Pearson          54     7     1 Jokerz     
 8       1     1     2        2 Escher Lefkoff        28     3     3 Jokerz     
 9       1     1     3        3 Michael Trepp         57     3     3 Indianapol…
10       1     1     3        3 Bob Matthews          12     1     4 Indianapol…
# ℹ 1,910 more rows
# ℹ 3 more variables: game_id &amp;lt;dbl&amp;gt;, game_category &amp;lt;chr&amp;gt;,
#   game_category_id &amp;lt;dbl&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In each round during these sessions, four players compete on the same
pinball machine. The player that comes 1st gets 7 points, the second one
gets 5 points, and so on. For example, looking at the first couple of
rows above, we can see that when competing on the Little Joe (1972)
machine, Bob Matthews won and Michael Trepp ended up last. During the
pre-tournament sessions, players rotate to face off against most other
players and compete on many different pinball machines. At the end of
the sessions, each player’s score is tallied up, and the top 32 players
proceed to the final tournament (not included in this dataset).&lt;/p&gt;
&lt;p&gt;We’re now ready to model this data using a Plackett-Luce model!&lt;/p&gt;
&lt;h2 id=&#34;the-plackett-luce-model&#34;&gt;The Plackett-Luce model&lt;/h2&gt;
&lt;p&gt;Despite the somewhat involved names, both the Bradley–Terry model and
the Plackett-Luce model are fairly straightforward. In the simplest
case, without covariates, both models assume that each player (or
team/item/product) has an underlying skill. In themselves, these skill
parameters don’t have any meaning. They only become meaningful when used
to come up with the probability of each player winning. Let’s say &lt;em&gt;n&lt;/em&gt;
players compete, each with their own skill parameter
&lt;em&gt;skill&lt;/em&gt;&lt;sub&gt;1, 2, &amp;hellip;, &lt;em&gt;n&lt;/em&gt;&lt;/sub&gt;. Then the probability of each
player winning is calculated as&lt;/p&gt;
&lt;p&gt;$$p_1 = \frac{\exp({skill}_1)}{\sum(\exp({skill}_{1, 2, &amp;hellip;, n}))}, \\[0.7em]
p_2 = \frac{\exp({skill}_2)}{\sum(\exp({skill}_{1, 2, &amp;hellip;, n}))}, \\[0.7em]
&amp;hellip; \\[0.5em]
p_n = \frac{\exp({skill}_n)}{\sum(\exp({skill}_{1, 2, &amp;hellip;, n}))}$$&lt;/p&gt;
&lt;p&gt;The exp () makes sure that the result is always positive, even for
negative skills, and by dividing by
∑(exp(&lt;em&gt;skill&lt;/em&gt;&lt;sub&gt;1, 2, &amp;hellip;, &lt;em&gt;n&lt;/em&gt;&lt;/sub&gt;)) we make the transformed
skills sum to one like good probabilities should (this transformation is
known as 
  &lt;a href=&#34;https://mc-stan.org/docs/functions-reference/matrix_operations.html#softmax-1&#34;&gt;the softmax
function&lt;/a&gt;).
For example, if we have four players with skills
&lt;code&gt;c(0.8, 1.0, -1.0, 0.0)&lt;/code&gt; the probability distribution &lt;code&gt;p&lt;/code&gt; over each
player winning would be:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;skills &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;c&lt;/span&gt;(&lt;span style=&#34;color:#40a070&#34;&gt;0.8&lt;/span&gt;, &lt;span style=&#34;color:#40a070&#34;&gt;1.0&lt;/span&gt;, &lt;span style=&#34;color:#40a070&#34;&gt;-1.0&lt;/span&gt;, &lt;span style=&#34;color:#40a070&#34;&gt;0.0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;p &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;exp&lt;/span&gt;(skills) &lt;span style=&#34;color:#666&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;exp&lt;/span&gt;(skills))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;barplot&lt;/span&gt;(p, col &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;salmon&amp;#34;&lt;/span&gt;, ylab &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Probability of winning&amp;#34;&lt;/span&gt;, names.arg &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;paste&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Player&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#40a070&#34;&gt;4&lt;/span&gt;, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;\nskill:&amp;#34;&lt;/span&gt;, skills, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;\nexp(skill):&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#06287e&#34;&gt;round&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;exp&lt;/span&gt;(skills), &lt;span style=&#34;color:#40a070&#34;&gt;2&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;\np: &amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#06287e&#34;&gt;round&lt;/span&gt;(p, &lt;span style=&#34;color:#40a070&#34;&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/bayesian-plackett-luce-model-pinball-competition/index_files/figure-commonmark/unnamed-chunk-3-1.png&#34;    width = &#34;800&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The sole difference between the Bradley–Terry model and the
Plackett-Luce model is in what data they model: Bradley-Terry models the
winner of a competition between two players, while Plackett-Luce models
the rankings in a competition with several players. It does this by
assuming that the performance of each player isn’t influenced by the
other players, which allows for modeling rankings as a series of
competitions. Perhaps it’s easiest to explain by showing the “generative
model” in R, for a competition with our four players from above:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;players &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#40a070&#34;&gt;4&lt;/span&gt; &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Our four competing players&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Sampling the winner of the bunch&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;p &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;exp&lt;/span&gt;(skills[players]) &lt;span style=&#34;color:#666&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;exp&lt;/span&gt;(skills[players]))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;first &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sample&lt;/span&gt;(players, prob &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; p, size &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Excluding the player who came first, who will win second place?&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;players &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; players[players &lt;span style=&#34;color:#666&#34;&gt;!=&lt;/span&gt; first] 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;p &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;exp&lt;/span&gt;(skills[players]) &lt;span style=&#34;color:#666&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;exp&lt;/span&gt;(skills[players]))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;second &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sample&lt;/span&gt;(players, prob &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; p, size &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Excluding the players who came first and second, who will win third place?&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;players &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; players[players &lt;span style=&#34;color:#666&#34;&gt;!=&lt;/span&gt; second]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;p &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;exp&lt;/span&gt;(skills[players]) &lt;span style=&#34;color:#666&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;exp&lt;/span&gt;(skills[players]))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;third &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sample&lt;/span&gt;(players, prob &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; p, size &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# The player who&amp;#39;s left gets fourth place&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;fourth &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; players[players &lt;span style=&#34;color:#666&#34;&gt;!=&lt;/span&gt; third]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;ranking &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;c&lt;/span&gt;(first, second, third, fourth)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;ranking
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;[1] 4 1 2 3
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That’s basically the simple version of the Plackett-Luce model. In our
case we, of course, know the rankings of all the competitions in the
IFPA 18 Pinball Championship so we don’t want to simulate the data using
fixed skill parameters. Instead, we’re now ready to do the Bayesian
trick where we assume the player skills are unknown parameters and use
the data to figure them out.&lt;/p&gt;
&lt;h2 id=&#34;a-plackett-luce-model-in-stan&#34;&gt;A Plackett-Luce model in Stan&lt;/h2&gt;
&lt;p&gt;While we could work with the tidied &lt;code&gt;match_results&lt;/code&gt; in Stan, the model
becomes easier to implement if we extract only the parts of the data
that we need. Here, that’s the number of players (&lt;code&gt;n_players&lt;/code&gt;), the
number of rounds (&lt;code&gt;n_round&lt;/code&gt;), and a matrix with the 1st, 2nd, 3rd, and
4th &lt;code&gt;player_id&lt;/code&gt; for each round (&lt;code&gt;player_ranks&lt;/code&gt;).&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;stan_data &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;list&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  n_players &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;max&lt;/span&gt;(match_results&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;player_id),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  n_rounds &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;max&lt;/span&gt;(match_results&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;round_id),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  player_ranks &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; match_results &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#06287e&#34;&gt;select&lt;/span&gt;(round_id, rank, player_id) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#06287e&#34;&gt;pivot_wider&lt;/span&gt;(names_from &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; rank, values_from &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; player_id) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#06287e&#34;&gt;arrange&lt;/span&gt;(round_id) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#06287e&#34;&gt;select&lt;/span&gt;(`1`, `2`, `3`, `4`) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#06287e&#34;&gt;as.matrix&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;head&lt;/span&gt;(stan_data&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;player_ranks)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;      1  2  3  4
[1,] 12 28 54 57
[2,] 54 57 28 12
[3,] 28 54 57 12
[4,] 64 37  2 62
[5,] 64 62 37  2
[6,] 62  2 37 64
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For example, looking at the first couple of rows in
&lt;code&gt;stan_data$player_ranks&lt;/code&gt; above, we can (again) see that in round one, on
the first row, &lt;code&gt;player_id: 12&lt;/code&gt; (that is, Bob Matthews) won and
&lt;code&gt;player_id: 57&lt;/code&gt; (Michael Trepp) came last.&lt;/p&gt;
&lt;p&gt;Also, while we could implement the likelihood part of this model
directly using &lt;code&gt;sum()&lt;/code&gt;s and &lt;code&gt;exp()&lt;/code&gt;s, a shortcut is to use the
&lt;code&gt;categorical_logit()&lt;/code&gt; distribution. The parameter to this distribution
is a vector which will be

  &lt;a href=&#34;https://mc-stan.org/docs/functions-reference/matrix_operations.html#softmax-1&#34;&gt;softmax&lt;/a&gt;
transformed (just what we want in the Plackett-Luce model) into the
probability that each of the categories represented by this vector will
be selected. For example, this sampling statement defines the likelihood
that the first player, with skill &lt;code&gt;0.8&lt;/code&gt; would win:
&lt;code&gt;1 ~ categorical_logit([0.8, 1.0, -1.0, 0.0])&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Given that we’ve got our data nicely packaged up in &lt;code&gt;stan_data&lt;/code&gt; and that
we can use the &lt;code&gt;categorical_logit()&lt;/code&gt; distribution, the Stan model
definition becomes fairly straightforward:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;plackett_luce_model_code &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;data {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  int n_players; 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  int n_rounds;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  int player_ranks[n_rounds, 4];
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;parameters {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  vector[n_players] skills;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  real&amp;lt;lower=0&amp;gt; skills_sigma;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;model {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  // A vector to hold the skills for each round
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  // Just to limit the amunt of indexing we&amp;#39;ll need to do
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  vector[4] round_skills; 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  // It&amp;#39;s important that the distribution over skills is anchored at a 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  // fixed value, here 0.0. Otherwise, as skills are relative to each other,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  // they wouldn&amp;#39;t be identifiable.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  skills ~ normal(0, skills_sigma);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  skills_sigma ~ cauchy(0, 1);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  for (round_i in 1:n_rounds) {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;    round_skills = skills[player_ranks[round_i]];
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;    // The likelihood of the winner winning out of the 4 players
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;    1 ~ categorical_logit(round_skills[1:4]);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;    // The likelihood of the 2nd place winning out of the 3 remaining players
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;    1 ~ categorical_logit(round_skills[2:4]);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;    // The likelihood of the 3rd place winning out of the 2 remaining players
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;    1 ~ categorical_logit(round_skills[3:4]);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;    // And the remaining 4th place is guaranteed to &amp;#39;win&amp;#39; against themselves...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;  }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In the above model, I’ve placed a hierarchical distribution on the
&lt;code&gt;skills&lt;/code&gt; parameters. This works well for this dataset, where there are a
lot of players and plenty of data. With less data, one might want to
just put a fixed prior here (say, &lt;code&gt;skills ~ normal(0, 1.0)&lt;/code&gt;). The model
above is somewhat inflexible in that it only works for the case where
there are exactly four players in each round, but I hope you can see
that it’s straightforward to tweak it to allow for other number of
players, as well.&lt;/p&gt;
&lt;p&gt;Now we can finally go ahead and fit this model!&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(rstan)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;model_fit &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;stan&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  model_code &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; plackett_luce_model_code, data &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; stan_data, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  chains &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;4&lt;/span&gt;, iter &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;20000&lt;/span&gt;, cores &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Skimping a bit on checking model convergence, we can, at least, see that
both the trace plots and the number of effective samples look
reasonable.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;traceplot&lt;/span&gt;(model_fit, pars &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;c&lt;/span&gt;(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;skills_sigma&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;skills[1]&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;skills[2]&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;skills[3]&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/bayesian-plackett-luce-model-pinball-competition/index_files/figure-commonmark/unnamed-chunk-8-1.png&#34;    width = &#34;800&#34; /&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;model_fit &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;extract&lt;/span&gt;(pars &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;c&lt;/span&gt;(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;skills_sigma&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;skills[1]&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;skills[2]&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;skills[3]&amp;#34;&lt;/span&gt;), permuted &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;FALSE&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;monitor&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;Inference for the input samples (4 chains: each with iter = 10000; warmup = 5000):

               Q5  Q50 Q95 Mean  SD  Rhat Bulk_ESS Tail_ESS
skills_sigma  0.2  0.3 0.4  0.3 0.1     1     2252     2456
skills[1]    -0.4  0.0 0.3  0.0 0.2     1    37109    14113
skills[2]    -0.5 -0.1 0.2 -0.1 0.2     1    29867    13139
skills[3]    -0.8 -0.4 0.0 -0.4 0.2     1     8267    13065

For each parameter, Bulk_ESS and Tail_ESS are crude measures of 
effective sample size for bulk and tail quantities respectively (an ESS &amp;gt; 100 
per chain is considered good), and Rhat is the potential scale reduction 
factor on rank normalized split chains (at convergence, Rhat &amp;lt;= 1.05).
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That was a simple Plackett-Luce model in Stan. But finally, let’s have a
look at the fitted Plackett-Luce model of the IFPA 18 World Pinball
Championship.&lt;/p&gt;
&lt;h2 id=&#34;the-fitted-ifpa-18-world-pinball-championship-model&#34;&gt;The fitted IFPA 18 World Pinball Championship model&lt;/h2&gt;
&lt;p&gt;First, let’s extract the player skill parameters. For simplicity I’ll
use the median point estimates and 95% probability intervals here,
rather than working with the full distributions.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;skills_summary &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;summary&lt;/span&gt;(model_fit, pars &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;skills&amp;#34;&lt;/span&gt;)&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;summary
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;player_skill &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; match_results &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;summarise&lt;/span&gt;(score &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;(score), .by &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;c&lt;/span&gt;(player_id, player_name)) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;arrange&lt;/span&gt;(player_id) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;mutate&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    median_skill &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; skills_summary[, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;50%&amp;#34;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    lower_skill &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; skills_summary[, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;2.5%&amp;#34;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    upper_skill &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; skills_summary[, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;97.5%&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;arrange&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;desc&lt;/span&gt;(median_skill))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;player_skill
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;# A tibble: 80 × 6
   player_id player_name              score median_skill lower_skill upper_skill
       &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;                    &amp;lt;dbl&amp;gt;        &amp;lt;dbl&amp;gt;       &amp;lt;dbl&amp;gt;       &amp;lt;dbl&amp;gt;
 1        39 Johannes Ostermeier        138        0.669      0.134        1.28 
 2        28 Escher Lefkoff             130        0.570      0.0811       1.15 
 3        54 Mark Pearson               114        0.362     -0.0724       0.864
 4        47 Keith Elwin                116        0.338     -0.0973       0.851
 5        21 Daniele Celestino Accia…   114        0.327     -0.111        0.835
 6        78 Viggo Löwgren              118        0.307     -0.119        0.804
 7        80 Zach Sharpe                116        0.294     -0.137        0.786
 8        36 Jason Zahler               106        0.240     -0.183        0.722
 9        48 Keri Wing                  108        0.230     -0.184        0.692
10        45 Josh Sharpe                110        0.226     -0.191        0.701
# ℹ 70 more rows
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Not surprisingly, we find that Johannes Ostermeier, who ended up winning
the whole tournament, also got the highest skill estimate. We can now
also plot all the skill estimates:&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Plot code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;player_skill &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;mutate&lt;/span&gt;(player_name &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;fct_reorder&lt;/span&gt;(player_name, median_skill)) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;ggplot&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; median_skill, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; player_name)) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_point&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_errorbarh&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(xmin &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; lower_skill, xmax &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; upper_skill), height &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;0&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;labs&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    title &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Player skill estimates for IFPA 18 World Pinball Championship&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    subtitle &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Posterior medians with 95% probablilty intervals&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Skill&amp;#34;&lt;/span&gt;, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;theme_minimal&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;theme&lt;/span&gt;(axis.text.y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;element_text&lt;/span&gt;(size &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;6&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/bayesian-plackett-luce-model-pinball-competition/index_files/figure-commonmark/unnamed-chunk-11-1.png&#34;    width = &#34;800&#34; /&gt;&lt;/p&gt;
&lt;p&gt;It’s somewhat hard to interpret these skill estimates on their own, but
two things to note here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Despite the large number of rounds played (480), the uncertainty in
the skill parameters is large.&lt;/li&gt;
&lt;li&gt;Overall, players have very similar skill. This is not so surprising,
as these are all top pinball players. If I had competed here, my skill
estimate would be far &lt;em&gt;far&lt;/em&gt; to the left.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To verify that these skill parameters make some sense, we could also
plot them against each player’s final score from the qualifying
sessions:&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Plot code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;player_skill &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;ggplot&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; median_skill, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; score)) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_point&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;labs&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    title &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Player skill vs final score for IFPA 18 World Pinball Championship&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Median player skill&amp;#34;&lt;/span&gt;, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Final Score&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;theme_minimal&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/bayesian-plackett-luce-model-pinball-competition/index_files/figure-commonmark/unnamed-chunk-12-1.png&#34;    width = &#34;600&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The final score and the estimated skill are so strongly correlated that
one might wonder if there was any point at all in going through the
trouble of fitting a Plackett-Luce model. However, we can do much more
interesting stuff with the skill estimates than we can with just the
scores. For example, we can calculate the probability of players winning
in different matchups. Say the top player played against the three
players with the lowest skill:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;highest_and_lowest_players &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;union&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;slice_max&lt;/span&gt;(player_skill, median_skill, n &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;slice_min&lt;/span&gt;(player_skill, median_skill, n &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;highest_and_lowest_players
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;# A tibble: 4 × 6
  player_id player_name         score median_skill lower_skill upper_skill
      &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;               &amp;lt;dbl&amp;gt;        &amp;lt;dbl&amp;gt;       &amp;lt;dbl&amp;gt;       &amp;lt;dbl&amp;gt;
1        39 Johannes Ostermeier   138        0.669       0.134      1.28  
2        77 Vid Kuklec             58       -0.574      -1.12      -0.107 
3        24 Didier Dujardin        56       -0.565      -1.11      -0.109 
4         7 Artur Natorski         62       -0.466      -0.979     -0.0334
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;skills &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; highest_and_lowest_players&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;median_skill
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;p &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;exp&lt;/span&gt;(skills) &lt;span style=&#34;color:#666&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;exp&lt;/span&gt;(skills))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;barplot&lt;/span&gt;(p, col &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;aquamarine&amp;#34;&lt;/span&gt;, ylab &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Probability of winning&amp;#34;&lt;/span&gt;, names.arg &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;paste&lt;/span&gt;(highest_and_lowest_players&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;player_name, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;\np: &amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#06287e&#34;&gt;round&lt;/span&gt;(p, &lt;span style=&#34;color:#40a070&#34;&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/bayesian-plackett-luce-model-pinball-competition/index_files/figure-commonmark/unnamed-chunk-14-1.png&#34;    width = &#34;800&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Here, according to this model, Johannes would have a 53% probability of
winning when facing Vid, Didier, and Artur. Again, this shows that in a
single game of pinball, even when the top player meets the lowest
scoring players in the World Pinball Championship, it’s still far from
guaranteed that the top player would win. But at the IFPA 18 World
Pinball Championship Johannes did go all the way and won 
  &lt;a href=&#34;https://youtu.be/A4M5hcAPCaI?si=-pkt3qG_YXTnNVWG&amp;amp;t=8456&#34;&gt;the final game
as shown in this live
stream&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://sumsar.net/blog/bayesian-plackett-luce-model-pinball-competition/index.qmd&#34;&gt;All code for this post can be found in this Quarto markdown file&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: Bob Carpenter has published 
  &lt;a href=&#34;https://github.com/bob-carpenter/case-studies/blob/master/sushi-rating/sushi-rating.pdf&#34;&gt;a case study analyzing sushi rating data with an alternative (and faster) Plackett-Luce model in Stan&lt;/a&gt;.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>CopenhagenR, the 2024 spring season</title>
      <link>https://sumsar.net/blog/copenhagenr-2024-spring-season/</link>
      <pubDate>Sun, 04 Aug 2024 00:00:00 +0200</pubDate>
      
      <guid>https://sumsar.net/blog/copenhagenr-2024-spring-season/</guid>
      <description>&lt;p&gt;This is just a post to brag about that 
  &lt;a href=&#34;https://www.meetup.com/CopenhagenR-useR-Group/&#34;&gt;the CopenhagenR useR group&lt;/a&gt; is alive and kicking, again.&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/copenhagenr-2024-spring-season/images/copenhagen-r-stats.gif&#34;    width = &#34;738&#34; /&gt;&lt;/p&gt;
&lt;p&gt;After COVID-19, the group (like so many other meetups) was on hiatus for a couple of years and without an organizer. In 2023, I thought I would try starting it again and, while it took a little while, I&amp;rsquo;m happy that I got together five great meetups for the spring 2024 season! Here&amp;rsquo;s a little bit about what went down.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;CopenhagenR gratefully acknowledges 
  &lt;a href=&#34;https://www.r-consortium.org/&#34;&gt;the R Consortium&lt;/a&gt; as a sponsor. Also, a great thanks to 
  &lt;a href=&#34;https://www.prosa.dk/arrangementer&#34;&gt;Prosa&lt;/a&gt;, who generously provide a location for most meetups this spring 2024 season&lt;/em&gt;.&lt;/p&gt;
&lt;h3 id=&#34;-state-of-the-r-the-new-stuff---_niels-ole-dam_httpswwwmeetupcomcopenhagenr-user-groupevents297284545&#34;&gt;• 
  &lt;a href=&#34;https://www.meetup.com/copenhagenr-user-group/events/297284545/&#34;&gt;State of the R, the New Stuff - &lt;em&gt;Niels Ole Dam&lt;/em&gt;&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;First out was 
  &lt;a href=&#34;https://thingsinflow.dk/&#34;&gt;Niels Ole Dam&lt;/a&gt; who, very appropriately, set out to cover what has happened in the R world the last couple of years. His highlights included 
  &lt;a href=&#34;https://quarto.org/&#34;&gt;Quarto&lt;/a&gt;, 
  &lt;a href=&#34;https://docs.r-wasm.org/&#34;&gt;webR&lt;/a&gt;, the 
  &lt;a href=&#34;https://gt.rstudio.com/&#34;&gt;GT package&lt;/a&gt;, and many other things. Check out his very sleek slides 
  &lt;a href=&#34;https://thingsinflow.dk/2023/12/06/state-of-the-r-the-new-stuff/&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/copenhagenr-2024-spring-season/images/state-of-the-r-photo.jpeg&#34;    width = &#34;700&#34; /&gt;&lt;/p&gt;
&lt;h3 id=&#34;-reproducible-workflows-with-r-and-sequencing-data---_adrian-geissler_httpswwwmeetupcomcopenhagenr-user-groupevents298403304&#34;&gt;• 
  &lt;a href=&#34;https://www.meetup.com/copenhagenr-user-group/events/298403304/&#34;&gt;Reproducible Workflows with R and sequencing data - &lt;em&gt;Adrian Geissler&lt;/em&gt;&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Next out was 
  &lt;a href=&#34;https://genomic.social/@asgeissler/with_replies&#34;&gt;Adrian Geissler&lt;/a&gt;, postdoctoral researcher at the University of Copenhagen. His presentation focused on how to handle data management, reproducible workflows, and large scale computing in the life sciences using R for computation and 
  &lt;a href=&#34;https://snakemake.github.io/&#34;&gt;snakemake&lt;/a&gt; for orchestration. Here are 
  &lt;a href=&#34;https://github.com/asgeissler/2024-CopenhagenR-Seminar/blob/main/slides.pdf&#34;&gt;the presentation slides&lt;/a&gt; and 
  &lt;a href=&#34;https://youtu.be/7m52jndBHRY&#34;&gt;a screencast of the presentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
  &lt;a href=&#34;https://youtu.be/7m52jndBHRY&#34;&gt;

&lt;img src=&#34;https://sumsar.net/blog/copenhagenr-2024-spring-season/images/reproducible-workflows-with-r-screenshot.jpeg&#34;    width = &#34;700&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&#34;-simplify-making-shiny-apps-with-teal----_dawid-kałędkowski_--br-r-and-software-freedom---_ramarro-marrone_httpswwwmeetupcomcopenhagenr-user-groupevents299464314&#34;&gt;• 
  &lt;a href=&#34;https://www.meetup.com/copenhagenr-user-group/events/299464314&#34;&gt;Simplify making shiny apps with teal -  &lt;em&gt;Dawid Kałędkowski&lt;/em&gt; // &lt;br&gt; R and Software Freedom - &lt;em&gt;Ramarro Marrone&lt;/em&gt;&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In this double-header evening, 
  &lt;a href=&#34;https://github.com/gogonzo&#34;&gt;Dawid Kałędkowski&lt;/a&gt; kicked things off with showcasing his package 
  &lt;a href=&#34;https://insightsengineering.github.io/teal/latest-tag/articles/getting-started-with-teal.html&#34;&gt;Teal&lt;/a&gt; — a shiny-based interactive exploration framework for quickly creating reproducible dashboards. Next up, Ramarro Marrone tackled the hot topic of software freedom, and went through which parts of the R world were freer and which could be considered non-free. This talk resulted in a heated debate that continued over the following coffee/cake/beer.&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/copenhagenr-2024-spring-season/images/teal-and-software-freedom-photo.jpeg&#34;    width = &#34;650&#34; /&gt;&lt;/p&gt;
&lt;h3 id=&#34;-visualizing-440-bicycle-rides-using-r---_gregers-kjerulf-dubrow_httpswwwmeetupcomcopenhagenr-user-groupevents300016965&#34;&gt;• 
  &lt;a href=&#34;https://www.meetup.com/copenhagenr-user-group/events/300016965&#34;&gt;Visualizing 440 bicycle rides using R - &lt;em&gt;Gregers Kjerulf Dubrow&lt;/em&gt;&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;
  &lt;a href=&#34;https://www.gregdubrow.io/about&#34;&gt;Gregers Kjerulf Dubrow&lt;/a&gt; presented a very personal analysis of all bike trips he made in Copenhagen in 2023, as tracked by the Strava fitness app. The first part of the talk detailed how to import and clean Strava data in R, the second part took a deep dive into the time-location bike data of the more than 440 bicycle rides (including a data anomaly that, in the end, turned out to have had a very real and dangerous cause). The full bike analysis is available on 
  &lt;a href=&#34;https://www.gregdubrow.io/posts/my-year-of-riding-danishly/&#34;&gt;Gregers&amp;rsquo; blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/copenhagenr-2024-spring-season/images/visualizing-440-bicycle-rides-photo.jpeg&#34;    width = &#34;700&#34; /&gt;&lt;/p&gt;
&lt;h3 id=&#34;-animating-a-melody-as-a-mathematical-object---_charles-t-gray_httpswwwmeetupcomcopenhagenr-user-groupevents301262743&#34;&gt;• 
  &lt;a href=&#34;https://www.meetup.com/copenhagenr-user-group/events/301262743&#34;&gt;Animating a melody as a mathematical object - &lt;em&gt;Charles T. Gray&lt;/em&gt;&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Rounding off the season, 
  &lt;a href=&#34;https://www.linkedin.com/in/charles-t-gray/&#34;&gt;Charles T. Gray&lt;/a&gt;, former professional musician turned data scientist, gave us a peek into the musical world through the lens of graphs, nodes, and edges. Her presentation explored animating a midi file as graph using R packages like 
  &lt;a href=&#34;https://urswilke.github.io/pyramidi/&#34;&gt;pyramidi&lt;/a&gt;, 
  &lt;a href=&#34;https://ggraph.data-imaginist.com/&#34;&gt;ggraph&lt;/a&gt;, and 
  &lt;a href=&#34;https://gganimate.com/&#34;&gt;gganimate&lt;/a&gt;. Check out a post-version of her talk on 
  &lt;a href=&#34;https://softloud.github.io/measured/content/digmus/digmus.html&#34;&gt;her blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/copenhagenr-2024-spring-season/images/animating-a-melody-photo.jpeg&#34;    width = &#34;600&#34; /&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Public Pinball Machines per Capita: A new global indicator</title>
      <link>https://sumsar.net/blog/pinball-machines-per-capita/</link>
      <pubDate>Fri, 07 Jun 2024 00:00:00 +0200</pubDate>
      
      <guid>https://sumsar.net/blog/pinball-machines-per-capita/</guid>
      <description>&lt;p&gt;There are tons of well-known global indicators. We’ve all heard of gross
domestic product, life expectancy, rate of literacy, etc. But, ever
since I discovered 
  &lt;a href=&#34;https://pinballmap.com/&#34;&gt;pinballmap.com&lt;/a&gt;, possibly
the world’s most comprehensive database of public pinball locations,
I’ve been thinking about a potential new global indicator: Public
Pinball Machines per Capita. Thanks to Pinball Map’s 
  &lt;a href=&#34;https://pinballmap.com/api/v1/docs/1.0.html&#34;&gt;well-documented
public API&lt;/a&gt;, this indicator
is now a reality!&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/pinball-machines-per-capita/images/top-10-pinball-machines-per-capita.png&#34;    width = &#34;800&#34; /&gt; Here’s how this was
put together (and just scroll to the bottom for a CSV file with this
indicator for all countries).&lt;/p&gt;
&lt;h2 id=&#34;pulling-public-pinball-locations-from-pinball-map&#34;&gt;Pulling public pinball locations from Pinball Map&lt;/h2&gt;
&lt;p&gt;Pinball Map is, from what I can discern, the most popular app for
finding out where there are arcades and bars with pinball machines. It’s
open for anyone to register new pinball locations, but not only that,

  &lt;a href=&#34;https://github.com/pinballmap/pbm/&#34;&gt;the app itself is open source&lt;/a&gt;, and
the data it collects is available through a public API under a
permissive licence! Using this API, we will pull essential data for our
Public Pinball Machines per Capita indicator: all registered pinball
locations and their respective machine counts.&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Loading packages&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(httr2) &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# To interact with the Pinball Map API&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(jsonlite) &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# To parse the JSON responses&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(tidyverse) &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# To munch, crunch, and plot the data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(ggrepel) &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# For less crowed labels on plots&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(WDI) &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# To pull in other country-level data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(maps) &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# For plotting maps&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;details&gt;
&lt;summary&gt;Code for pulling pinball stats from the Pinball Map
API&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# We&amp;#39;re going to pull a lot of data here, possibly abusing the Pinball Map API,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# a bit. But I&amp;#39;m an active patreon sponsor, so hopefully that&amp;#39;s OK...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Did I mention that they are on patreon? https://www.patreon.com/pinballmap&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Pulls and parses JSON from the given URL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;get_req_json &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;\&lt;/span&gt;(url) {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;request&lt;/span&gt;(url)  &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#06287e&#34;&gt;req_perform&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#06287e&#34;&gt;resp_body_json&lt;/span&gt;(simplifyVector &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Pulling all regions defined by the Pinball Map API&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;regions &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;get_req_json&lt;/span&gt;(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;https://pinballmap.com/api/v1/regions.json&amp;#34;&lt;/span&gt;)&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;regions
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Now looping over the region names and for each name pull down all locations&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;region_locations &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;lapply&lt;/span&gt;(regions&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;name, &lt;span style=&#34;color:#06287e&#34;&gt;\&lt;/span&gt;(name) {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  url &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;paste0&lt;/span&gt;(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;https://pinballmap.com/api/v1/region/&amp;#34;&lt;/span&gt;, name, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;/locations.json&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;get_req_json&lt;/span&gt;(url)&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;locations
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Pull down all &amp;#34;regionless&amp;#34; locations. Actually, most locations are regionless.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;regionless_locations &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;get_req_json&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;https://pinballmap.com/api/v1/locations.json?regionless_only=true&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;locations
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Finally, combine it all...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;locations &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;bind_rows&lt;/span&gt;(region_locations, regionless_locations) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;select&lt;/span&gt;(name, country, city, lat, lon, num_machines) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;mutate&lt;/span&gt;(lat &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;as.numeric&lt;/span&gt;(lat), lon &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;as.numeric&lt;/span&gt;(lon)) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# ... and order locations from north to south&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;arrange&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;desc&lt;/span&gt;(lat))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;sample_n&lt;/span&gt;(locations, size &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;pre&gt;&lt;code&gt;                        name country          city  lat    lon num_machines
1 Arena Lanes Bowling Center      US      Oak Lawn 41.7  -87.7            3
2              Pete&#39;s Treats      US Union Springs 42.9  -76.7            1
3         The Summit Windsor      US      Loveland 40.4 -105.0            6
4             Skylark Lounge      US        Denver 39.7 -105.0            2
5         The Escape Gamebar      US       Atlanta 33.9  -84.3            5
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The above shows a sample of five out of the 10,330 locations where you
can play pinball, as of June 2024. As we have the longitude and latitude
we can also figure out that the northernmost place to play pinball is in
Rovaniemi, Finland, and the southernmost place is in Woolston, New
Zealand.&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;locations&lt;span style=&#34;color:#06287e&#34;&gt;[c&lt;/span&gt;(&lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#06287e&#34;&gt;nrow&lt;/span&gt;(locations)),]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;pre&gt;&lt;code&gt;                       name country      city   lat   lon num_machines
1               Kauppayhtiö      FI Rovaniemi  66.5  25.7            2
10330 Fish &amp;amp; Chips On Ferry      NZ  Woolston -43.5 172.7            1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or, why not just plot all pinball locations on a world map?&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Plot code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;extreme_locations &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; locations &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;filter&lt;/span&gt;(lat &lt;span style=&#34;color:#666&#34;&gt;%in%&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;range&lt;/span&gt;(lat)) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;mutate&lt;/span&gt;(display_label &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;paste&lt;/span&gt;(city, country, sep &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;, &amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;ggplot&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_polygon&lt;/span&gt;(data &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;map_data&lt;/span&gt;(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;world&amp;#34;&lt;/span&gt;), &lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; long, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; lat, group &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; group), fill &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;lightblue&amp;#34;&lt;/span&gt;, color &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;lightblue3&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_point&lt;/span&gt;(data &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; locations, &lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; lon, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; lat), color &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;magenta4&amp;#34;&lt;/span&gt;, size &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;, alpha &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;0.50&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_point&lt;/span&gt;(data &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; extreme_locations, &lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; lon, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; lat), color &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;red2&amp;#34;&lt;/span&gt;, size &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;2&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_text&lt;/span&gt;(data &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; extreme_locations, &lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; lon, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; lat, label &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; display_label), nudge_x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;-25&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;theme_void&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;ggtitle&lt;/span&gt;(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Pinball Locations Worldwide (according to pinballmap.com)&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/pinball-machines-per-capita/index_files/figure-commonmark/unnamed-chunk-5-1.png&#34;    width = &#34;800&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Finally, we can now sum up how many public pinball machines there are in
each country, where the USA, unsurprisingly, takes the lead.&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pinball_stats &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; locations &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;group_by&lt;/span&gt;(country) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;summarise&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    n_locations &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;n&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    n_machines &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;(num_machines)) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;arrange&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;desc&lt;/span&gt;(n_machines))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pinball_stats
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 65 × 3
   country n_locations n_machines
   &amp;lt;chr&amp;gt;         &amp;lt;int&amp;gt;      &amp;lt;int&amp;gt;
 1 US             7831      32287
 2 CA              511       1765
 3 AU              427       1247
 4 DE              129       1099
 5 FR              247        707
 6 SE               79        692
 7 GB              160        500
 8 FI               98        496
 9 NL               69        461
10 JP               86        351
# ℹ 55 more rows
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;calculating-public-pinball-machines-per-capita&#34;&gt;Calculating Public Pinball Machines &lt;em&gt;per Capita&lt;/em&gt;&lt;/h2&gt;
&lt;p&gt;Knowing how many public pinball machines there are in each country isn’t
enough, we also need to consider the size of the population. Thanks to

  &lt;a href=&#34;https://CRAN.R-project.org/package=WDI&#34;&gt;the &lt;code&gt;WDI&lt;/code&gt; package&lt;/a&gt; it’s easy to
pull this, and any other indicators you fancy, from 
  &lt;a href=&#34;https://data.worldbank.org/&#34;&gt;the World Bank Open
Data&lt;/a&gt; and to calculate the number of Public
Pinball Machines per Capita (here per million people).&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Code for pulling World Development Indicators&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;country_stats_by_year &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;WDI&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  indicator &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;c&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;NY.GDP.PCAP.CD&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;SP.POP.TOTL&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;SP.DYN.LE00.IN&amp;#34;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;SP.DYN.TFRT.IN&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;IT.NET.USER.ZS&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;AG.LND.FRST.ZS&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ), 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  extra &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;TRUE&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  latest &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;country_stats &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; country_stats_by_year &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;arrange&lt;/span&gt;(country, year) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;group_by&lt;/span&gt;(country) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Keep the latest indicator for each country&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;summarize&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;across&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;everything&lt;/span&gt;(), &lt;span style=&#34;color:#06287e&#34;&gt;\&lt;/span&gt;(x) &lt;span style=&#34;color:#06287e&#34;&gt;last&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;na.omit&lt;/span&gt;(x)))) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;select&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    country_name &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; country, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    country_code &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; iso2c,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    gdp_per_capita &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; NY.GDP.PCAP.CD,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    population &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; SP.POP.TOTL,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    life_expectancy &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; SP.DYN.LE00.IN, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    births_per_woman &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; SP.DYN.TFRT.IN,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    internet_usage_perc &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; IT.NET.USER.ZS,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    forest_coverage_perc &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; AG.LND.FRST.ZS
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;details&gt;
&lt;summary&gt;Code for calculating Public Pinball Machines per
Capita&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pinball_country_stats &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; country_stats &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Let&amp;#39;s keep only larger countries&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;filter&lt;/span&gt;(population &lt;span style=&#34;color:#666&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;500000&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;inner_join&lt;/span&gt;(pinball_stats, by &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;join_by&lt;/span&gt;(country_code &lt;span style=&#34;color:#666&#34;&gt;==&lt;/span&gt; country)) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;mutate&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    n_locations_per_million_capita &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;round&lt;/span&gt;(n_locations &lt;span style=&#34;color:#666&#34;&gt;/&lt;/span&gt; population &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;1000000&lt;/span&gt;, &lt;span style=&#34;color:#40a070&#34;&gt;3&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    n_machines_per_million_capita &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;round&lt;/span&gt;(n_machines &lt;span style=&#34;color:#666&#34;&gt;/&lt;/span&gt; population &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;1000000&lt;/span&gt;, &lt;span style=&#34;color:#40a070&#34;&gt;3&lt;/span&gt;)) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;arrange&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;desc&lt;/span&gt;(n_machines_per_million_capita))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;select&lt;/span&gt;(pinball_country_stats, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  country_name, population, n_machines, n_machines_per_million_capita
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 58 × 4
   country_name  population n_machines n_machines_per_million_capita
   &amp;lt;chr&amp;gt;              &amp;lt;dbl&amp;gt;      &amp;lt;int&amp;gt;                         &amp;lt;dbl&amp;gt;
 1 United States  333287557      32287                          96.9
 2 Finland          5556106        496                          89.3
 3 Sweden          10486941        692                          66.0
 4 Denmark          5903037        323                          54.7
 5 Norway           5457127        266                          48.7
 6 Australia       26005540       1247                          48.0
 7 Canada          38929902       1765                          45.3
 8 New Zealand      5124100        171                          33.4
 9 Switzerland      8775760        267                          30.4
10 Netherlands     17700982        461                          26.0
# ℹ 48 more rows
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, there’s out new global indicator! Looks like the USA is still in
the lead, but now the Nordic countries have bubbled up as some of the
countries with the highest pinball density.&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Plot code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pinball_country_stats &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;head&lt;/span&gt;(&lt;span style=&#34;color:#40a070&#34;&gt;10&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;mutate&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    country_name &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; forcats&lt;span style=&#34;color:#666&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;fct_reorder&lt;/span&gt;(country_name, n_machines_per_million_capita),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    n_machines_per_million_capita &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;round&lt;/span&gt;(n_machines_per_million_capita, &lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;ggplot&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; n_machines_per_million_capita, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; country_name)) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#06287e&#34;&gt;geom_col&lt;/span&gt;(fill &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;lightgreen&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#06287e&#34;&gt;geom_text&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(label &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; n_machines_per_million_capita), hjust &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;1.2&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#06287e&#34;&gt;labs&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Number of machines per million capita&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Country&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      title &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Top 10 countries by number of public pinball machines per million capita&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/pinball-machines-per-capita/index_files/figure-commonmark/unnamed-chunk-9-1.png&#34;    width = &#34;800&#34; /&gt;&lt;/p&gt;
&lt;h2 id=&#34;public-pinball-machines-per-capita-vs-other-indicators&#34;&gt;Public Pinball Machines per Capita VS other indicators&lt;/h2&gt;
&lt;p&gt;Let’s have a look at how Public Pinball Machines per Capita compares to
some other indicators. How about Life Expectancy?&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Plot code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;ggplot&lt;/span&gt;(pinball_country_stats, &lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; life_expectancy, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; n_machines_per_million_capita)) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_label_repel&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(label &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; country_name), fill &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;lightblue&amp;#34;&lt;/span&gt;,  max.overlaps &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;Inf&lt;/span&gt;, box.padding  &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;-0.2&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;scale_x_log10&lt;/span&gt;(labels &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; scales&lt;span style=&#34;color:#666&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;label_comma&lt;/span&gt;(), limits &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;c&lt;/span&gt;(&lt;span style=&#34;color:#40a070&#34;&gt;67&lt;/span&gt;, &lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;NA&lt;/span&gt;)) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;scale_y_log10&lt;/span&gt;(labels &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; scales&lt;span style=&#34;color:#666&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;label_comma&lt;/span&gt;()) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;labs&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Life expectancy at birth (years)&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Number of machines per million capita&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    title &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Number of Public Pinball Machines per Capita vs life expectancy&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/pinball-machines-per-capita/index_files/figure-commonmark/unnamed-chunk-10-1.png&#34;    width = &#34;800&#34; /&gt;&lt;/p&gt;
&lt;p&gt;So maybe playing pinball actually makes you live longer! What’s that
thing they say about correlation, now again… Or what about the fertility
rate (the average number of births per woman)?&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Plot code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;ggplot&lt;/span&gt;(pinball_country_stats, &lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; births_per_woman, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; n_machines_per_million_capita)) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_label_repel&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(label &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; country_name), fill &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;lightcoral&amp;#34;&lt;/span&gt;,  max.overlaps &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;Inf&lt;/span&gt;, box.padding &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;-0.2&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;scale_x_log10&lt;/span&gt;(labels &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; scales&lt;span style=&#34;color:#666&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;label_comma&lt;/span&gt;()) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;scale_y_log10&lt;/span&gt;(labels &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; scales&lt;span style=&#34;color:#666&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;label_comma&lt;/span&gt;()) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;labs&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Fertility rate (no. births per woman)&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Number of machines per million capita&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    title &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Number of Public Pinball Machines per Capita vs fertility rate&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/pinball-machines-per-capita/index_files/figure-commonmark/unnamed-chunk-11-1.png&#34;    width = &#34;800&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Nope, no clear relationship there. Actually, out of all the indicators I
looked through, the one with the highest correlation to Public Pinball
Machines per Capita was…&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Plot code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;ggplot&lt;/span&gt;(pinball_country_stats, &lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; gdp_per_capita, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; n_machines_per_million_capita)) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_label_repel&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(label &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; country_name), fill &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;lightgreen&amp;#34;&lt;/span&gt;, max.overlaps &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;Inf&lt;/span&gt;, box.padding  &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;-0.2&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_smooth&lt;/span&gt;(method &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;lm&amp;#34;&lt;/span&gt;, se &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;FALSE&lt;/span&gt;, color &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;#d03030aa&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;scale_x_log10&lt;/span&gt;(labels &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; scales&lt;span style=&#34;color:#666&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;label_comma&lt;/span&gt;()) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;scale_y_log10&lt;/span&gt;(labels &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; scales&lt;span style=&#34;color:#666&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;label_comma&lt;/span&gt;()) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;labs&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;GDP per capita (in USD)&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Number of machines per million capita&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    title &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Number of Public Pinball Machines per Capita vs GDP per Capita&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/pinball-machines-per-capita/index_files/figure-commonmark/unnamed-chunk-12-1.png&#34;    width = &#34;800&#34; /&gt;&lt;/p&gt;
&lt;p&gt;… GDP per Capita. This shouldn’t surprise anyone who’s ever looked into
buying a pinball machine and walked away &lt;em&gt;in shock&lt;/em&gt; having learned that
a new machine would set you back $8000, at least. Still, the correlation
between these two indicators is strikingly high:&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;cor&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;log&lt;/span&gt;(pinball_country_stats&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;n_machines_per_million_capita),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;log&lt;/span&gt;(pinball_country_stats&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;gdp_per_capita)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;pre&gt;&lt;code&gt;[1] 0.815
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With such a strong correlation with GDP per Capita, it can be
interesting to look at the residuals of the linear regression line
above. That is, what’s left after the influence of GDP per Capita has
been “accounted” for (and I can’t stress the quotes enough here, as
we’re not really accounting for anything).&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Plot code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;lm_model &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;lm&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;log&lt;/span&gt;(n_machines_per_million_capita) &lt;span style=&#34;color:#666&#34;&gt;~&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;log&lt;/span&gt;(gdp_per_capita), data &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; pinball_country_stats)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pinball_country_stats&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;residual &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;residuals&lt;/span&gt;(lm_model)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;ggplot&lt;/span&gt;(pinball_country_stats, &lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; gdp_per_capita, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; residual)) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_label_repel&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(label &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; country_name), fill &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;lightgreen&amp;#34;&lt;/span&gt;,  max.overlaps &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;Inf&lt;/span&gt;, box.padding &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;-0.2&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_smooth&lt;/span&gt;(method &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;lm&amp;#34;&lt;/span&gt;, se &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;FALSE&lt;/span&gt;, color &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;#d03030aa&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;scale_x_log10&lt;/span&gt;(labels &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; scales&lt;span style=&#34;color:#666&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;label_comma&lt;/span&gt;()) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;labs&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;GDP per capita (in USD)&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Residual&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    title &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Residual after accounting for GDP per capita&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/pinball-machines-per-capita/index_files/figure-commonmark/unnamed-chunk-14-1.png&#34;    width = &#34;800&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Here Hungary and Croatia show up as being relative pinball fanatics,
considering their GDP per Capita. While Singapore and Luxembourg
couldn’t care less for the silver ball. If you want to take a look
yourself, here’s a CSV file with the full Public Pinball Machines per
Capita dataset:&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pinball_country_stats &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;select&lt;/span&gt;(country_name, country_code, population, n_locations, n_machines,  n_machines_per_million_capita, gdp_per_capita) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;write_csv&lt;/span&gt;(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;public_pinball_machines_per_capita_2024.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;&lt;a href=&#34;https://sumsar.net/blog/pinball-machines-per-capita/public_pinball_machines_per_capita_2024.csv&#34;&gt;public_pinball_machines_per_capita_2024.csv&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Caveats: This indicator is mostly a joke, 100% depends on the
completeness of Pinball Map, and countries without a single registered
pinball machine are excluded.&lt;/em&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Modeling my pinball scores</title>
      <link>https://sumsar.net/blog/modeling-my-pinball-scores/</link>
      <pubDate>Sun, 24 Mar 2024 00:00:00 +0100</pubDate>
      
      <guid>https://sumsar.net/blog/modeling-my-pinball-scores/</guid>
      <description>&lt;p&gt;Upon discovering that the tiny town I live in has a pinball arcade with
over 40 tables (!), I got a bout of pinball fever. I fancy myself a
fairly accomplished video game player, but was disappointed to discover
that my ability to keep Mario alive didn’t translate to preventing the
pinball from draining. Assuming I just needed a bit of practice, I
downloaded 
  &lt;a href=&#34;https://vpuniverse.com/files/file/18293-fish-tales-vpw/&#34;&gt;a virtual version of Fish
Tales&lt;/a&gt; — a fun,
fishing-based table from 1992 — and began practicing. Here’s the data
and quick analysis of how I improved over 100 games of Fish Tales.&lt;/p&gt;
&lt;p&gt;(By the way, if you didn’t know, the hobbyist pinball emulation scene is
&lt;em&gt;amazing&lt;/em&gt;. Almost every real pinball table from the last 70 years has
been painstakingly 3D-model by &lt;em&gt;someone&lt;/em&gt; and is 
  &lt;a href=&#34;https://vpuniverse.com/&#34;&gt;available completely
for free&lt;/a&gt;, but completely not legally…)&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/modeling-my-pinball-scores/images/fish_tales_animation.webp&#34;    width = &#34;640&#34; /&gt;&lt;/p&gt;
&lt;p&gt;In total, I played 100 games over the course of 10 sessions. The game
ran perfectly on my 2022 MacBook Pro at 120 FPS, with non-noticeable
input latency. I made sure to learn all the rules of Fish Tales (even
though Fish Tales is considered a simple game, the ruleset is
non-obvious and opaque), and I played in a distraction-free environment
(that is, when the kids weren’t around). And yet, my scores improved
like this:&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Plot code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(tidyverse)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(ggside)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(atsar)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(rstan)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(ggpubr)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;scores &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;read_csv&lt;/span&gt;(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;fish_tale_scores.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;scores &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;ggplot&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; game, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; score)) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_point&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;scale_y_log10&lt;/span&gt;(labels &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; scales&lt;span style=&#34;color:#666&#34;&gt;::&lt;/span&gt;comma) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;theme_minimal&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;labs&lt;/span&gt;(title &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Pinball scores by game&amp;#34;&lt;/span&gt;, x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Game&amp;#34;&lt;/span&gt;, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Score&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/modeling-my-pinball-scores/index_files/figure-commonmark/unnamed-chunk-1-1.png&#34;    width = &#34;600&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Here the scores (&lt;a href=&#34;https://sumsar.net/blog/modeling-my-pinball-scores/fish_tale_scores.csv&#34;&gt;raw data available here&lt;/a&gt;) are
shown on a log scale, as the score in Fish Tale, like in many other
pinball games, sometimes snowball and sometimes never go anywhere.&lt;/p&gt;
&lt;p&gt;Looking at my score trajectory, it might be easy to dismiss as not much
of a trajectory at all. However, maybe we’re just lacking the right
model here. But how to model data like this? Maybe a simple linear
trend, while hard to justify from a theoretical perspective, could work
here?&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Plot code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;scores &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;ggplot&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; game, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; score)) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_point&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_smooth&lt;/span&gt;(method &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#39;lm&amp;#39;&lt;/span&gt;, formula &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#39;y ~ x&amp;#39;&lt;/span&gt;, se&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;FALSE&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;scale_y_log10&lt;/span&gt;(labels &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; scales&lt;span style=&#34;color:#666&#34;&gt;::&lt;/span&gt;comma) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;theme_minimal&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;labs&lt;/span&gt;(title &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Pinball scores by game + Linear model&amp;#34;&lt;/span&gt;, x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Game&amp;#34;&lt;/span&gt;, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Score&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/modeling-my-pinball-scores/index_files/figure-commonmark/unnamed-chunk-2-1.png&#34;    width = &#34;600&#34; /&gt;&lt;/p&gt;
&lt;p&gt;On the other hand, as we’re looking to model an improvement in
proficiency, a sigmoid ( ∫ ) model might be more appropriate. That is,
I’m starting from a baseline, then I’m seeing an accelerated rate of
improvement, which eventually tapers off as I reach my “performance
plateau”.&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Plot code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;scores &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;ggplot&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; game, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; score)) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_point&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_smooth&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    method &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#39;nls&amp;#39;&lt;/span&gt;, formula &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#39;y ~ SSlogis(x, Asym, xmid, scal)&amp;#39;&lt;/span&gt;, se &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;scale_y_log10&lt;/span&gt;(labels &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; scales&lt;span style=&#34;color:#666&#34;&gt;::&lt;/span&gt;comma) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;theme_minimal&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;labs&lt;/span&gt;(title &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Pinball scores by game + Sigmoid model&amp;#34;&lt;/span&gt;, x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Game&amp;#34;&lt;/span&gt;, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Score&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/modeling-my-pinball-scores/index_files/figure-commonmark/unnamed-chunk-3-1.png&#34;    width = &#34;600&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Hmm, that plateau came quickly… Nevertheless, both these models are
missing what every good statistical model needs: a good measure of
uncertainty and a computationally expensive model fitting procedure.
Maybe a Bayesian state-space time series model is what we need here!&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Model code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(atsar)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;ss_model &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;fit_stan&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  scores&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;log2_score, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  model_name &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;ss_rw&amp;#34;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  est_drift&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;FALSE&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  mcmc_list&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;list&lt;/span&gt;(n_mcmc&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#40a070&#34;&gt;10000&lt;/span&gt;, n_burn&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#40a070&#34;&gt;2000&lt;/span&gt;, n_thin&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#40a070&#34;&gt;1&lt;/span&gt;, n_chain&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#40a070&#34;&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;details&gt;
&lt;summary&gt;Plot code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;bayes_pred &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;2&lt;/span&gt;^rstan&lt;span style=&#34;color:#666&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;summary&lt;/span&gt;(ss_model)&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;summary &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;as_tibble&lt;/span&gt;(rownames &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;param&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;filter&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;str_starts&lt;/span&gt;(param, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;pred&amp;#34;&lt;/span&gt;)) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;mutate&lt;/span&gt;(game &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;as.numeric&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;str_extract&lt;/span&gt;(param, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;\\d+&amp;#34;&lt;/span&gt;))) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;select&lt;/span&gt;(game, lower &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; `2.5%`,  pred &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; `50%`, upper &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; `97.5%`)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;scores &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;ggplot&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; game, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; score)) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_point&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_smooth&lt;/span&gt;(se&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;FALSE&lt;/span&gt;, method &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#39;loess&amp;#39;&lt;/span&gt;, formula &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#39;y ~ x&amp;#39;&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_ribbon&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    data &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; bayes_pred,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; game, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; pred, ymin &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; lower, ymax &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; upper), 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    alpha &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;0.3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;scale_y_log10&lt;/span&gt;(labels &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; scales&lt;span style=&#34;color:#666&#34;&gt;::&lt;/span&gt;comma) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;theme_minimal&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;labs&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    title &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Pinball scores by game + Bayesian state space model&amp;#34;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Game&amp;#34;&lt;/span&gt;, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Score&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/modeling-my-pinball-scores/index_files/figure-commonmark/unnamed-chunk-5-1.png&#34;    width = &#34;600&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Or we could just go full 101 psychology statistics and force this data
to submit to a t-test, using violence if necessary.&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Plot code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;scores &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;mutate&lt;/span&gt;(Group &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;ifelse&lt;/span&gt;(game &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;50&lt;/span&gt;, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Games 1 to 50&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Games 51 to 100&amp;#34;&lt;/span&gt;) ) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;ggboxplot&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Group&amp;#34;&lt;/span&gt;, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;score&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    color &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Group&amp;#34;&lt;/span&gt;, palette &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;jco&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    add &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;jitter&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    title&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Pinball scores by game, psychology stats-style&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;scale_y_log10&lt;/span&gt;(labels &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; scales&lt;span style=&#34;color:#666&#34;&gt;::&lt;/span&gt;comma) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;stat_compare_means&lt;/span&gt;(method &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;t.test&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/modeling-my-pinball-scores/index_files/figure-commonmark/unnamed-chunk-6-1.png&#34;    width = &#34;600&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Despite there being at least seven things abhorrently wrong with this
approach, we still get a p-value that no amount of p-hackery can fix.&lt;/p&gt;
&lt;p&gt;I asked 
  &lt;a href=&#34;https://fosstodon.org/@rabaath/112118266985015611&#34;&gt;on Mastodon&lt;/a&gt;
what a good model could be for this type of data and 
  &lt;a href=&#34;https://datasci.social/@mszll&#34;&gt;Michael
Szell&lt;/a&gt; promptly responded:&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/modeling-my-pinball-scores/images/michael-szell-toot.png&#34;    width = &#34;490&#34; /&gt; Lognormal, absolutely, that makes a
lot of sense. But the stationary assumption, that hurts… and, yet, it’s
hard to argue against:&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Plot code&lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;breaks &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;c&lt;/span&gt;(&lt;span style=&#34;color:#40a070&#34;&gt;3000000&lt;/span&gt;, &lt;span style=&#34;color:#40a070&#34;&gt;10000000&lt;/span&gt;, &lt;span style=&#34;color:#40a070&#34;&gt;30000000&lt;/span&gt;, &lt;span style=&#34;color:#40a070&#34;&gt;100000000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;scores &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;ggplot&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;aes&lt;/span&gt;(x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; game, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;log&lt;/span&gt;(score))) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_point&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;scale_y_continuous&lt;/span&gt;(breaks &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;log&lt;/span&gt;(breaks), labels&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;scales&lt;span style=&#34;color:#666&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;label_comma&lt;/span&gt;()(breaks)) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;theme_minimal&lt;/span&gt;() &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;labs&lt;/span&gt;(title &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Pinball scores by game + Indepented (😭) Normal distribution (log scale)&amp;#34;&lt;/span&gt;, x &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Game&amp;#34;&lt;/span&gt;, y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;Score&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_ysidehistogram&lt;/span&gt;(bins &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;12&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;geom_ysidefunction&lt;/span&gt;(fun &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;\&lt;/span&gt;(x) &lt;span style=&#34;color:#40a070&#34;&gt;45&lt;/span&gt; &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;dnorm&lt;/span&gt;(x, &lt;span style=&#34;color:#06287e&#34;&gt;mean&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;log&lt;/span&gt;(scores&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;score)), &lt;span style=&#34;color:#06287e&#34;&gt;sd&lt;/span&gt;(&lt;span style=&#34;color:#06287e&#34;&gt;log&lt;/span&gt;(scores&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;score))), colour &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;red&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;theme&lt;/span&gt;(ggside.panel.scale.y &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;0.3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/modeling-my-pinball-scores/index_files/figure-commonmark/unnamed-chunk-7-1.png&#34;    width = &#34;600&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Given that we’ve now nailed the statistical model, I don’t need anymore
feedback here. But if you know how I could improve my pinball game,
please don’t hesitate to pester me over at

  &lt;a href=&#34;https://fosstodon.org/@rabaath/112118266985015611&#34;&gt;@rabaath@fosstodon.org&lt;/a&gt;.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Why pandas feels clunky when coming from R</title>
      <link>https://sumsar.net/blog/pandas-feels-clunky-when-coming-from-r/</link>
      <pubDate>Tue, 20 Feb 2024 00:00:00 +0100</pubDate>
      
      <guid>https://sumsar.net/blog/pandas-feels-clunky-when-coming-from-r/</guid>
      <description>&lt;p&gt;Five years ago I started a new role and I suddenly found myself, a
staunch R fan, having to code in Python on a daily basis. Working with
data, most of my Python work involved using

  &lt;a href=&#34;https://pandas.pydata.org/&#34;&gt;&lt;code&gt;pandas&lt;/code&gt;&lt;/a&gt;, the Python data frame library,
and initially I found it quite hard and clunky to use, being used to the
&lt;em&gt;silky smooth&lt;/em&gt; API of R’s 
  &lt;a href=&#34;https://www.tidyverse.org/&#34;&gt;&lt;code&gt;tidyverse&lt;/code&gt;&lt;/a&gt;. And
you know what? It still feels hard and clunky, even now, 5 years later!&lt;/p&gt;
&lt;p&gt;But, what seems even harder, is explaining to “Python people” what they
are missing out on. From their perspective, pandas is this fantastic
tool that makes Data Science in Python possible. And it is a fantastic
tool, don’t get me wrong, but if you, like me, end up in many “pandas is
great, but…”-type discussions and are lacking clear examples to link to;
here’s a somewhat typical example of a simple analysis, built from the
ground up, that flows nicely in R and the tidyverse but that becomes
clunky and complicated using Python and pandas.&lt;/p&gt;
&lt;p&gt;Let’s first step through a short analysis of purchases using R and the
tidyverse. After that we’ll see how the same solution using Python and
pandas compares.&lt;/p&gt;
&lt;h2 id=&#34;analyzing-purchases-in-r&#34;&gt;Analyzing &lt;code&gt;purchases&lt;/code&gt; in R&lt;/h2&gt;
&lt;p&gt;We’ve been given a table of &lt;a href=&#34;https://sumsar.net/blog/pandas-feels-clunky-when-coming-from-r/purchases.csv&#34;&gt;&lt;code&gt;purchases&lt;/code&gt;&lt;/a&gt; with different
&lt;code&gt;amount&lt;/code&gt;s, where the customer could have received a &lt;code&gt;discount&lt;/code&gt; and where
each purchase happened in a &lt;code&gt;country&lt;/code&gt;. Finance now wants to know: How
much do we typically sell in each country? Let’s read in the data and
take a look:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(tidyverse)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;read_csv&lt;/span&gt;(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;purchases.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;head&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;# A tibble: 6 × 3
  country amount discount
  &amp;lt;chr&amp;gt;    &amp;lt;dbl&amp;gt;    &amp;lt;dbl&amp;gt;
1 USA       2000       10
2 USA       3500       15
3 USA       3000       20
4 Canada     120       12
5 Canada     180       18
6 Canada    3100       21
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, without bothering with printing out the intermediate results,
here’s how a quick pipeline could be built up, answering Finance’s
question.&lt;/p&gt;
&lt;p&gt;“How much do we sell..? Let’s take the total sum!”&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;amount &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;“Ah, they wanted it by country…”&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;group_by&lt;/span&gt;(country) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;summarize&lt;/span&gt;(total &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;(amount))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;“And I guess I should deduct the discount.” (&lt;code&gt;#👈/👆/👇&lt;/code&gt; marks lines
that changed/moved)&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;group_by&lt;/span&gt;(country) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;summarize&lt;/span&gt;(total &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;(amount &lt;span style=&#34;color:#666&#34;&gt;-&lt;/span&gt; discount)) &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👈&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;“Oh, and Maria asked me to remove any outliers. Let’s remove everything
10x larger than the median.”&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;filter&lt;/span&gt;(amount &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;median&lt;/span&gt;(amount) &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;10&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👈&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;group_by&lt;/span&gt;(country) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;summarize&lt;/span&gt;(total &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;(amount &lt;span style=&#34;color:#666&#34;&gt;-&lt;/span&gt; discount))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;“I probably should use the median &lt;em&gt;within&lt;/em&gt; each country. Prices are
quite different across the globe…”&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;group_by&lt;/span&gt;(country) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;                     &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👆&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;filter&lt;/span&gt;(amount &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;median&lt;/span&gt;(amount) &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;10&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👇&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;summarize&lt;/span&gt;(total &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;(amount &lt;span style=&#34;color:#666&#34;&gt;-&lt;/span&gt; discount))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;# A tibble: 11 × 2
   country   total
   &amp;lt;chr&amp;gt;     &amp;lt;dbl&amp;gt;
 1 Australia   540
 2 Brazil      414
 3 Canada      270
 4 France      450
 5 Germany     513
 6 India       648
 7 Italy       567
 8 Japan       621
 9 Spain       594
10 UK          432
11 USA        8455
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;“And we’re done, let’s go for second breakfast!”&lt;/p&gt;
&lt;h2 id=&#34;analyzing-purchases-in-python&#34;&gt;Analyzing &lt;code&gt;purchases&lt;/code&gt; in Python&lt;/h2&gt;
&lt;p&gt;We’re now going to take a look at how this little analysis would look in
Python and pandas. One complication here is that pandas can be written
in many different styles; it’s not like in the tidyverse where there’s
often one obvious way to do something. Here we’re opting for writing
pandas using the fluent method chaining API, as opposed to using the
more “imperative” approach that results in a lot of repeats of &lt;code&gt;df&lt;/code&gt; and
statements like
&lt;code&gt;df[df[&amp;quot;this&amp;quot;] == &amp;quot;that&amp;quot;] = calc_some(df[&amp;quot;other_thing&amp;quot;])&lt;/code&gt;. We’re also
opting for always returning a table with all the data in the data frame
proper. We don’t want data hidden away in the index (that is, pandas’
really advanced system for row and column names). Having data in the
index is generally annoying when one wants to process the data further
or when turning the data into plots.&lt;/p&gt;
&lt;p&gt;Again, let’s step through the R version of the analysis, and below let’s
write the corresponding pandas code. Again, &lt;code&gt;#👈/👆/👇&lt;/code&gt; marks lines that
have changed/moved.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id=&#34;reading-in-the-data&#34;&gt;Reading in the data&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#06287e&#34;&gt;library&lt;/span&gt;(tidyverse)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;read_csv&lt;/span&gt;(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;purchases.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;head&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;# A tibble: 6 × 3
  country amount discount
  &amp;lt;chr&amp;gt;    &amp;lt;dbl&amp;gt;    &amp;lt;dbl&amp;gt;
1 USA       2000       10
2 USA       3500       15
3 USA       3000       20
4 Canada     120       12
5 Canada     180       18
6 Canada    3100       21
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is basically the same in pandas. So far so good!&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Python &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#0e84b5;font-weight:bold&#34;&gt;pandas&lt;/span&gt; &lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;as&lt;/span&gt; &lt;span style=&#34;color:#0e84b5;font-weight:bold&#34;&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; pd&lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;read_csv(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;purchases.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases&lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;head()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;  country  amount  discount
0     USA    2000        10
1     USA    3500        15
2     USA    3000        20
3  Canada     120        12
4  Canada     180        18
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;how-much-do-we-sell-lets-take-the-total-sum&#34;&gt;“How much do we sell..? Let’s take the total sum!”&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases&lt;span style=&#34;color:#666&#34;&gt;$&lt;/span&gt;amount &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;[1] 17210
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is also similar in pandas:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Python &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases[&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;amount&amp;#34;&lt;/span&gt;]&lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;sum()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;17210
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(However, note that this method,

  &lt;a href=&#34;https://pandas.pydata.org/docs/reference/api/pandas.Series.sum.html&#34;&gt;&lt;code&gt;pandas.Series.sum()&lt;/code&gt;&lt;/a&gt;,
is not the same as

  &lt;a href=&#34;https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sum.html&#34;&gt;&lt;code&gt;pandas.DataFrame.sum()&lt;/code&gt;&lt;/a&gt;,
or

  &lt;a href=&#34;https://numpy.org/doc/stable/reference/generated/numpy.sum.html&#34;&gt;&lt;code&gt;numpy.sum()&lt;/code&gt;&lt;/a&gt;,
or the built-in 
  &lt;a href=&#34;https://docs.python.org/library/functions.html&#34;&gt;&lt;code&gt;sum&lt;/code&gt;&lt;/a&gt;
function, each of which has different arguments and behaviors. In R,
it’s always the same built-in &lt;code&gt;sum()&lt;/code&gt; function.)&lt;/p&gt;
&lt;h3 id=&#34;ah-they-wanted-it-by-country&#34;&gt;“Ah, they wanted it by country…”&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;group_by&lt;/span&gt;(country) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;summarize&lt;/span&gt;(total &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;(amount))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;# A tibble: 11 × 2
   country   total
   &amp;lt;chr&amp;gt;     &amp;lt;dbl&amp;gt;
 1 Australia   600
 2 Brazil      460
 3 Canada     3400
 4 France      500
 5 Germany     570
 6 India       720
 7 Italy       630
 8 Japan       690
 9 Spain       660
10 UK          480
11 USA        8500
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is also very similar in Python:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Python &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(purchases
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;groupby(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;country&amp;#34;&lt;/span&gt;)[&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;amount&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;sum()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;country
Australia     600
Brazil        460
Canada       3400
France        500
Germany       570
India         720
Italy         630
Japan         690
Spain         660
UK            480
USA          8500
Name: amount, dtype: int64
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Ah, but here we actually need to do more work. The output has now turned
into a

  &lt;a href=&#34;https://pandas.pydata.org/docs/reference/api/pandas.Series.html&#34;&gt;&lt;code&gt;pandas.Series&lt;/code&gt;&lt;/a&gt;,
not a data frame, and &lt;code&gt;country&lt;/code&gt; got moved to the index. We can solve
this by using &lt;code&gt;.reset_index()&lt;/code&gt;. Also, we’re not happy with the &lt;code&gt;amount&lt;/code&gt;
column name, but &lt;code&gt;.sum()&lt;/code&gt; does not allow us to specify a different name.
Instead of &lt;code&gt;.sum()&lt;/code&gt; we can use the &lt;code&gt;.agg()&lt;/code&gt; method to get around this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Python &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(purchases
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;groupby(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;country&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;agg(total&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;amount&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;sum&amp;#34;&lt;/span&gt;)) &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👈&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;reset_index()                &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👈&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;      country  total
0   Australia    600
1      Brazil    460
2      Canada   3400
3      France    500
4     Germany    570
5       India    720
6       Italy    630
7       Japan    690
8       Spain    660
9          UK    480
10        USA   8500
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Another thing that’s new here is that we now have to pass the &lt;code&gt;sum&lt;/code&gt;
method as a &lt;code&gt;&amp;quot;sum&amp;quot;&lt;/code&gt; string.)&lt;/p&gt;
&lt;h3 id=&#34;and-i-guess-i-should-deduct-the-discount&#34;&gt;“And I guess I should deduct the discount.”&lt;/h3&gt;
&lt;p&gt;A tiny change in R…&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;group_by&lt;/span&gt;(country) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;summarize&lt;/span&gt;(total &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;(amount &lt;span style=&#34;color:#666&#34;&gt;-&lt;/span&gt; discount)) &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👈&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;# A tibble: 11 × 2
   country   total
   &amp;lt;chr&amp;gt;     &amp;lt;dbl&amp;gt;
 1 Australia   540
 2 Brazil      414
 3 Canada     3349
 4 France      450
 5 Germany     513
 6 India       648
 7 Italy       567
 8 Japan       621
 9 Spain       594
10 UK          432
11 USA        8455
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;… but a large change in Python. The &lt;code&gt;.agg()&lt;/code&gt; method can only aggregate
single columns. When this is not the case we have to fall back on
&lt;code&gt;.apply()&lt;/code&gt;, which can handle any type of aggregation. As we want to
avoid a column with the enigmatic name &lt;code&gt;0&lt;/code&gt;, we also have to use
&lt;code&gt;.rename()&lt;/code&gt; to get back to &lt;code&gt;total&lt;/code&gt;, again.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Python &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(purchases
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;groupby(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;country&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;apply(&lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;lambda&lt;/span&gt; df: (df[&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;amount&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#666&#34;&gt;-&lt;/span&gt; df[&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;discount&amp;#34;&lt;/span&gt;])&lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;sum()) &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👈&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;reset_index()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;rename(columns&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;{&lt;span style=&#34;color:#40a070&#34;&gt;0&lt;/span&gt;: &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;total&amp;#34;&lt;/span&gt;})                            &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👈&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;      country  total
0   Australia    540
1      Brazil    414
2      Canada   3349
3      France    450
4     Germany    513
5       India    648
6       Italy    567
7       Japan    621
8       Spain    594
9          UK    432
10        USA   8455
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;oh-and-maria-asked-me-to-remove-any-outliers&#34;&gt;“Oh, and Maria asked me to remove any outliers.”&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;filter&lt;/span&gt;(amount &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;median&lt;/span&gt;(amount) &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;10&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👈&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;group_by&lt;/span&gt;(country) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;summarize&lt;/span&gt;(total &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;(amount &lt;span style=&#34;color:#666&#34;&gt;-&lt;/span&gt; discount))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;# A tibble: 11 × 2
   country   total
   &amp;lt;chr&amp;gt;     &amp;lt;dbl&amp;gt;
 1 Australia   540
 2 Brazil      414
 3 Canada      270
 4 France      450
 5 Germany     513
 6 India       648
 7 Italy       567
 8 Japan       621
 9 Spain       594
10 UK          432
11 USA        1990
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is also a simple change in Python, using &lt;code&gt;.query()&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Python &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(purchases
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;query(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;amount &amp;lt;= amount.median() * 10&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👈&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;groupby(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;country&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;apply(&lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;lambda&lt;/span&gt; df: (df[&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;amount&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#666&#34;&gt;-&lt;/span&gt; df[&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;discount&amp;#34;&lt;/span&gt;])&lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;sum())
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;reset_index()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;rename(columns&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;{&lt;span style=&#34;color:#40a070&#34;&gt;0&lt;/span&gt;: &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;total&amp;#34;&lt;/span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;      country  total
0   Australia    540
1      Brazil    414
2      Canada    270
3      France    450
4     Germany    513
5       India    648
6       Italy    567
7       Japan    621
8       Spain    594
9          UK    432
10        USA   1990
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(But why is it called &lt;code&gt;.query()&lt;/code&gt; when it &lt;em&gt;filters&lt;/em&gt;? And why can’t we use
&lt;code&gt;DataFrame.filter()&lt;/code&gt; instead? Ah, that only filters on the index names.
And why do we suddenly have to pass in Python code as a string? Ah, it’s
actually not Python, but a language that’s &lt;em&gt;similar&lt;/em&gt; to Python. Of
course, all these questions have explanations, yet I still can never
really remember what I’m allowed to put in a &lt;code&gt;.query()&lt;/code&gt; string. Instead
of &lt;code&gt;.query()&lt;/code&gt; we could use &lt;code&gt;.loc[]&lt;/code&gt;, but then we need to do a fair bit
of typing:
&lt;code&gt;.loc[lambda df: df[&amp;quot;amount&amp;quot;] &amp;lt;= df[&amp;quot;amount&amp;quot;].median() * 10]&lt;/code&gt;. Compare
that to the R version &lt;code&gt;filter(amount &amp;lt;= median(amount) * 10)&lt;/code&gt;)&lt;/p&gt;
&lt;h3 id=&#34;i-probably-should-use-the-median-within-each-country&#34;&gt;“I probably should use the median &lt;em&gt;within&lt;/em&gt; each country”&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# R &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;group_by&lt;/span&gt;(country) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;                     &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👆&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;filter&lt;/span&gt;(amount &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;median&lt;/span&gt;(amount) &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;10&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👇&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;summarize&lt;/span&gt;(total &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;(amount &lt;span style=&#34;color:#666&#34;&gt;-&lt;/span&gt; discount))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;# A tibble: 11 × 2
   country   total
   &amp;lt;chr&amp;gt;     &amp;lt;dbl&amp;gt;
 1 Australia   540
 2 Brazil      414
 3 Canada      270
 4 France      450
 5 Germany     513
 6 India       648
 7 Italy       567
 8 Japan       621
 9 Spain       594
10 UK          432
11 USA        8455
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What’s just swapping two lines in R, becomes much more involved in
Python. The reason for this is that &lt;code&gt;.groupby()&lt;/code&gt; doesn’t return a
&lt;code&gt;pandas.DataFrame&lt;/code&gt;, it returns a &lt;code&gt;pandas.api.typing.DataFrameGroupBy&lt;/code&gt;
object, which doesn’t have the same set of methods as a regular data
frame. Especially, it doesn’t have &lt;code&gt;.query()&lt;/code&gt; nor &lt;code&gt;.loc[]&lt;/code&gt;. There are
two solutions here: A first solution is that we fall back on &lt;code&gt;.apply()&lt;/code&gt;,
this time returning a filtered version of each group, but then we also
need to remove the &lt;code&gt;country&lt;/code&gt; index &lt;em&gt;completely&lt;/em&gt; with
&lt;code&gt;.reset_index(drop=True)&lt;/code&gt; as the filtered &lt;code&gt;purchases&lt;/code&gt; already has a
&lt;code&gt;country&lt;/code&gt; column:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Python &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(purchases
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;groupby(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;country&amp;#34;&lt;/span&gt;)                                               &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👈&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;apply(&lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;lambda&lt;/span&gt; df: df[df[&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;amount&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;=&lt;/span&gt; df[&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;amount&amp;#34;&lt;/span&gt;]&lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;median() &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;10&lt;/span&gt;]) &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👈&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;reset_index(drop&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;True&lt;/span&gt;)                                           &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👈&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;groupby(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;country&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;apply(&lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;lambda&lt;/span&gt; df: (df[&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;amount&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#666&#34;&gt;-&lt;/span&gt; df[&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;discount&amp;#34;&lt;/span&gt;])&lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;sum())
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;reset_index()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;rename(columns&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;{&lt;span style=&#34;color:#40a070&#34;&gt;0&lt;/span&gt;: &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;total&amp;#34;&lt;/span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;      country  total
0   Australia    540
1      Brazil    414
2      Canada    270
3      France    450
4     Germany    513
5       India    648
6       Italy    567
7       Japan    621
8       Spain    594
9          UK    432
10        USA   8455
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(The fact that grouped and regular pandas data frames have different
APIs is a constant source of confusion, to me. One example of this is
&lt;code&gt;.filter()&lt;/code&gt;, where &lt;code&gt;DataFrameGroupBy.filter()&lt;/code&gt; does something
&lt;em&gt;completely different&lt;/em&gt; from &lt;code&gt;DataFrame.filter()&lt;/code&gt;. And none of them
actually filter away values!)&lt;/p&gt;
&lt;p&gt;A second solution is that we first calculate the median &lt;code&gt;amount&lt;/code&gt; per
&lt;code&gt;country&lt;/code&gt; and assign it to each row in &lt;code&gt;purchases&lt;/code&gt;. The upside is now
that we can continue to use &lt;code&gt;.query()&lt;/code&gt;, but at the cost of introducing
both &lt;code&gt;.assign()&lt;/code&gt; and &lt;code&gt;.transform()&lt;/code&gt; into the mix.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;# Python &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(purchases
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;assign(country_median&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;lambda&lt;/span&gt; df:                         &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👈&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      df&lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;groupby(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;country&amp;#34;&lt;/span&gt;)[&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;amount&amp;#34;&lt;/span&gt;]&lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;transform(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;median&amp;#34;&lt;/span&gt;)   &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👈&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;query(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;amount &amp;lt;= country_median * 10&amp;#34;&lt;/span&gt;)                   &lt;span style=&#34;color:#60a0b0;font-style:italic&#34;&gt;#👈                   &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;groupby(&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;country&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;apply(&lt;span style=&#34;color:#007020;font-weight:bold&#34;&gt;lambda&lt;/span&gt; df: (df[&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;amount&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#666&#34;&gt;-&lt;/span&gt; df[&lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;discount&amp;#34;&lt;/span&gt;])&lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;sum())
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;reset_index()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#666&#34;&gt;.&lt;/span&gt;rename(columns&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;{&lt;span style=&#34;color:#40a070&#34;&gt;0&lt;/span&gt;: &lt;span style=&#34;color:#4070a0&#34;&gt;&amp;#34;total&amp;#34;&lt;/span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;      country  total
0   Australia    540
1      Brazil    414
2      Canada    270
3      France    450
4     Germany    513
5       India    648
6       Italy    567
7       Japan    621
8       Spain    594
9          UK    432
10        USA   8455
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Compare this with, again, the final R solution:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f0f0f0;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;purchases &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;group_by&lt;/span&gt;(country) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;filter&lt;/span&gt;(amount &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;median&lt;/span&gt;(amount) &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#40a070&#34;&gt;10&lt;/span&gt;) &lt;span style=&#34;color:#666&#34;&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#06287e&#34;&gt;summarize&lt;/span&gt;(total &lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#06287e&#34;&gt;sum&lt;/span&gt;(amount &lt;span style=&#34;color:#666&#34;&gt;-&lt;/span&gt; discount))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This solution is not only shorter but also contains less ‘boilerplate’
code, such as &lt;code&gt;lambda&lt;/code&gt;, &lt;code&gt;reset_index&lt;/code&gt;, etc. The journey to the R
solution was more straight forward and we could build it up one step at
a time. With pandas, we often had to backtrack and switch out parts of
the intermediate solution.&lt;/p&gt;
&lt;h2 id=&#34;so-whats-your-point&#34;&gt;So, what’s your point?&lt;/h2&gt;
&lt;p&gt;My point is that, if you’re a “Python person”, then pandas is a great
tool &lt;em&gt;and&lt;/em&gt; people with extensive R experience may find working with
pandas frustrating for valid reasons. Show them some compassion!&lt;/p&gt;
&lt;p&gt;You might think my &lt;code&gt;purchases&lt;/code&gt; analysis was just a little toy example,
selected to highlight the clunkiness of the pandas API. And yes,
&lt;em&gt;partially&lt;/em&gt;, but my experience is that with larger, real-world code the
problems with the pandas API, outlined in this post, remains. That is,
pandas feels clunky when coming from R because:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The naming of methods and arguments is often confusing (&lt;code&gt;.filter()&lt;/code&gt;
doesn’t filter values. Will &lt;code&gt;.sum(axis=1)&lt;/code&gt; sum the rows or the
columns?)&lt;/li&gt;
&lt;li&gt;Different methods are available for grouped and non-grouped data
frames and methods with the same name can do very different things
(for example &lt;code&gt;DataFrame.filter()&lt;/code&gt; and &lt;code&gt;DataFrameGroupBy.filter()&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Many convenience function are missing from pandas, which means you’ll
have to code them from scratch. For instance, moving the &lt;code&gt;year&lt;/code&gt; to be
the first column is &lt;code&gt;df |&amp;gt; relocate(year)&lt;/code&gt; in the tidyverse. It’s
&lt;code&gt;df[[&amp;quot;year&amp;quot;] + [col for col in df.columns if col != &amp;quot;year&amp;quot;]]&lt;/code&gt; in
pandas.&lt;/li&gt;
&lt;li&gt;Pandas will constantly move columns into the index, and you’ll have to
work hard to get that data out again. You’ll be typing
&lt;code&gt;.reset_index()&lt;/code&gt; many &lt;em&gt;many&lt;/em&gt; times.&lt;/li&gt;
&lt;/ul&gt;</description>
    </item>
    
    <item>
      <title>Baking the cake dataset cake</title>
      <link>https://sumsar.net/blog/baking-the-cake-dataset-cake/</link>
      <pubDate>Mon, 12 Feb 2024 00:00:00 +0100</pubDate>
      
      <guid>https://sumsar.net/blog/baking-the-cake-dataset-cake/</guid>
      <description>&lt;p&gt;Now that I’ve got my hands on 
  &lt;a href=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/&#34;&gt;the source of the cake
dataset&lt;/a&gt; I knew I had to attempt to
bake the cake too. Here, the emphasis is on &lt;em&gt;attempt&lt;/em&gt;, as there’s no way
I would be able to actually replicate 
  &lt;a href=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/#the-cake-recipes&#34;&gt;the elaborate and
cake-scientifically rigorous
recipe&lt;/a&gt; that Cook
followed in her thesis. Skipping things like beating the eggs exactly
“125 strokes with a rotary beater” or wrapping the grated chocolate “in
waxed paper, while white wrapping paper was used for the other
ingredients”, here’s my version of Cook’s Recipe C, the highest rated
cake recipe in the thesis:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;~~ Frances E. Cook&#39;s best chocolate cake ~~

- 112 g butter (at room temperature, not straight from the fridge!)
- 225 g sugar
- ½ teaspoon vanilla, extract or sugar.
- ¼ teaspoon salt
- 96 g eggs, beaten (that would be two small eggs)
- 57 g dark chocolate (regular dark chocolate, not the 85% masochistic kind)
- 122 g milk (that is, ½ a cup)
- 150 g wheat flour
- 2½ teaspoon baking powder

1. In a bowl mix together the butter, sugar, vanilla, and salt 
   using a hand or stand mixer.
2. Add the eggs and continue mixing for another minute.
3. Melt the chocolate in a water bath or in a microwave oven. 
   Add it to the bowl and mix until it&#39;s uniformly incorporated.
4. Add the milk and mix some more.
5. In a separate bowl combine the flour and the baking powder.
   Add it to the batter, while mixing, until it&#39;s all combined evenly.
6. To a &amp;quot;standard-sized&amp;quot; cake pan (around 22 cm/9 inches in diameter)
   add a coating of butter and flour to avoid cake stickage.
7. Add the batter to the pan and bake in the middle of the oven
   at 225°C (437°F) for 24 minutes.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here’s now some notes, photos, and data on how the actual cake bake went
down.&lt;/p&gt;
&lt;h2 id=&#34;some-notes-from-a-cake-bake&#34;&gt;Some notes from a cake bake&lt;/h2&gt;
&lt;p&gt;If you do attempt this recipe, I must warn you that this cake is baked
at an unusually high temperature, as this resulted in the best rated
cake in Cook’s thesis. However, at that temperature my cake came out
just a &lt;em&gt;tiny&lt;/em&gt; bit scorched. Otherwise, I do believe this is a &lt;em&gt;fairly&lt;/em&gt;
standard cake recipe.&lt;/p&gt;
&lt;p&gt;But! I could not be satisfied with baking just the one cake recipe
above. As the whole point of Cook’s thesis was to explore the effect of
baking temperature I, of course, had to explore the same! Cook baked 150
different cakes over six different temperatures, but that was too
ambitious for a Saturday afternoon, so I picked just three of those:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;175°C (347°F) for 39 minutes&lt;/li&gt;
&lt;li&gt;200°C (392°F) for 31½ minutes&lt;/li&gt;
&lt;li&gt;225°C (437°F) for 24 minutes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And then I baked the same cake three times. I was planning to make a
nicely staged photo of all the ingredients, but forgot about that
completely, so here’s instead how my real-life messy cake bake looked:&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/baking-the-cake-dataset-cake/real-life-cake-bake.jpeg&#34;    width = &#34;800&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Even though I did bake three cakes at different temperatures, I cannot
stress enough that there’s &lt;em&gt;no way&lt;/em&gt; I even came close to replicate a
crumb of Cook’s original study. But, that didn’t stop me from making
some pretend-comparisons with her work. One graph from the original
study I particularly liked, was the photo of actual cakes baked at
different temperatures:&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/baking-the-cake-dataset-cake/original-cake-photos.jpeg&#34;    width = &#34;543.6&#34; /&gt;&lt;/p&gt;
&lt;p&gt;And below are the results of my endeavors where, sadly, all my cakes
looks like Cook’s no. 1 above.&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/baking-the-cake-dataset-cake/my-cake-photos.jpeg&#34;    width = &#34;800&#34; /&gt;&lt;/p&gt;
&lt;p&gt;In 
  &lt;a href=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/#the-cake-dataset&#34;&gt;the &lt;code&gt;cake&lt;/code&gt;
dataset&lt;/a&gt; the main
outcome variable is the angle at which the cake breaks. But lacking the
advanced breaking angle apparatus Cook used, there was no way I could
get a good cake break angle measure. Well, at least I broke my cakes:&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/baking-the-cake-dataset-cake/breaking-cakes.gif&#34;    width = &#34;600&#34; /&gt;&lt;/p&gt;
&lt;p&gt;What Cook did, and what I &lt;em&gt;actually&lt;/em&gt; also could do, was to rate the
three different cakes. As I celebrated my birthday, I had a small panel
of cake eaters readily available. Six participants (average age 29.0,
SD=28.7) were given a nibble of each of the three cakes (baked at 175°C,
200°C, and 225°C). The participants were asked to rate the overall
eating quality of each cake on a scale from 1 (“completely awful”) to 10
(“cake perfection”). After the rating session concluded, the
participants were awarded with more cake.&lt;/p&gt;
&lt;p&gt;Unfortunately, the results were somewhat inconclusive, as the
participants rate all cakes fairly highly, with no clear preference for
cakes baked at higher temperatures:&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/baking-the-cake-dataset-cake/index_files/figure-commonmark/unnamed-chunk-2-1.png&#34;    width = &#34;875&#34; /&gt;&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;
&lt;em&gt;Expand for the full cake rating dataset&lt;/em&gt;
&lt;/summary&gt;
&lt;pre&gt;&lt;code&gt;subject,age,temperature,score
1,72,175,5
1,72,200,6
1,72,225,6
2,69,175,4
2,69,200,5
2,69,225,6
3,5,175,8
3,5,200,7
3,5,225,9
4,8,175,9
4,8,200,10
4,8,225,8
5,35,175,7
5,35,200,6
5,35,225,7
6,39,175,7
6,39,200,7
6,39,225,7
&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;p&gt;But, to give a positive spin to this result, it seems like &lt;em&gt;Frances E.
Cook’s best chocolate cake&lt;/em&gt; recipe results in highly rated cakes at any
baking temperature!&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>The source of the cake dataset</title>
      <link>https://sumsar.net/blog/source-of-the-cake-dataset/</link>
      <pubDate>Sun, 28 Jan 2024 00:00:00 +0100</pubDate>
      
      <guid>https://sumsar.net/blog/source-of-the-cake-dataset/</guid>
      <description>&lt;p&gt;In statistics, there are a number of classic datasets that pop up in examples, tutorials, etc. There&amp;rsquo;s 
  &lt;a href=&#34;https://doi.org/10.1111/1740-9713.01589&#34;&gt;the iris dataset&lt;/a&gt; (just type &lt;code&gt;iris&lt;/code&gt; in your nearest R prompt), 
  &lt;a href=&#34;https://allisonhorst.github.io/palmerpenguins/&#34;&gt;the Palmer penguins&lt;/a&gt; (the modern iris alternative), 
  &lt;a href=&#34;https://hbiostat.org/data/&#34;&gt;the titanic dataset(s)&lt;/a&gt; (I hope you&amp;rsquo;re not a guy in 2nd or 3rd class!), etc. While looking for a dataset to illustrate a simple hierarchical model I stumbled upon another one: The &lt;code&gt;cake&lt;/code&gt; dataset in 
  &lt;a href=&#34;https://CRAN.R-project.org/package=lme4&#34;&gt;the &lt;code&gt;lme4&lt;/code&gt; package&lt;/a&gt; which is described as containing &amp;ldquo;data on the breakage angle of chocolate cakes made with three different recipes and baked at six different temperatures [as] presented in Cook (1938)&lt;sup id=&#34;fnref:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt;&amp;rdquo;. For me, this raised a lot of questions: Why measure the breakage angle of chocolate cakes? Why was this data collected? And what were the recipes?&lt;/p&gt;
&lt;p&gt;I assumed the answers to my questions would be found in Cook (1938)&lt;sup id=&#34;fnref1:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt; but, after a fair bit of flustered searching, I realized that this scholarly work, despite its obvious relevance to society, was nowhere to be found online. However, I managed to track down that there existed a hard copy at Iowa State University, accessible only to faculty staff.&lt;/p&gt;
&lt;p&gt;The tl;dr: After receiving help from several kind people at Iowa State University, I received a scanned version of Frances E. Cook&amp;rsquo;s Master&amp;rsquo;s thesis, the source of the cake dataset. Here it is:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/cook_1938_chocolate_cake.pdf&#34;&gt;Cook, Frances E. (1938). &lt;em&gt;Chocolate cake: I. Optimum baking temperature&lt;/em&gt;. (Master&amp;rsquo;s thesis, Iowa State College).&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/cook_1938_chocolate_cake.pdf&#34;&gt;

&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/chocolate-cake-first-page.png&#34;    width = &#34;502&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It contains it all, the background, the details, and the cake recipes! Here&amp;rsquo;s some more details on the cake dataset, how I got help finding its source, and, finally, the cake recipes.&lt;/p&gt;
&lt;h2 id=&#34;the-cake-dataset&#34;&gt;The &lt;code&gt;cake&lt;/code&gt; dataset&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;cake&lt;/code&gt; dataset can be found in 
  &lt;a href=&#34;https://CRAN.R-project.org/package=lme4&#34;&gt;the &lt;code&gt;lme4&lt;/code&gt; package&lt;/a&gt; with the following description:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Data on the breakage angle of chocolate cakes made with three different recipes and baked at six different temperatures. This is a split-plot design with the recipes being whole-units and the different temperatures being applied to sub-units (within replicates). The experimental notes suggest that the replicate numbering represents temporal ordering.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So for each of the $3 \times 6 = 18$ recipe and temperature combinations, Cook made 15 (!) replicates, resulting in a total of $3 \times 6 \times 15 = 270$ cakes/datapoints. Here&amp;rsquo;s the first couple of rows:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:right&#34;&gt;replicate&lt;/th&gt;
&lt;th style=&#34;text-align:left&#34;&gt;recipe&lt;/th&gt;
&lt;th style=&#34;text-align:right&#34;&gt;angle&lt;/th&gt;
&lt;th style=&#34;text-align:right&#34;&gt;temperature&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right&#34;&gt;1&lt;/td&gt;
&lt;td style=&#34;text-align:left&#34;&gt;A&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;42&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;175&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right&#34;&gt;1&lt;/td&gt;
&lt;td style=&#34;text-align:left&#34;&gt;A&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;46&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;185&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right&#34;&gt;1&lt;/td&gt;
&lt;td style=&#34;text-align:left&#34;&gt;A&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;47&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;195&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right&#34;&gt;1&lt;/td&gt;
&lt;td style=&#34;text-align:left&#34;&gt;A&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;39&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;205&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right&#34;&gt;1&lt;/td&gt;
&lt;td style=&#34;text-align:left&#34;&gt;A&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;53&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;215&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right&#34;&gt;1&lt;/td&gt;
&lt;td style=&#34;text-align:left&#34;&gt;A&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;42&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;225&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right&#34;&gt;1&lt;/td&gt;
&lt;td style=&#34;text-align:left&#34;&gt;B&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;39&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;175&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right&#34;&gt;&amp;hellip;&lt;/td&gt;
&lt;td style=&#34;text-align:left&#34;&gt;&amp;hellip;&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;&amp;hellip;&lt;/td&gt;
&lt;td style=&#34;text-align:right&#34;&gt;&amp;hellip;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;If you want the full dataset without getting &lt;code&gt;lme4&lt;/code&gt; here&amp;rsquo;s &lt;a href=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/cake.csv&#34;&gt;the cake dataset as a CSV file&lt;/a&gt;. Plotting this dataset we can quickly conclude that the cake breakage angle increases as a function of baking temperature:&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/cake-plot.png&#34;    width = &#34;1197&#34; /&gt;&lt;/p&gt;
&lt;p&gt;While the cake dataset is found in &lt;code&gt;lme4&lt;/code&gt;, the original source is Cochran and Cox&amp;rsquo;s book &lt;em&gt;Experimental designs&lt;/em&gt;&lt;sup id=&#34;fnref:2&#34;&gt;&lt;a href=&#34;#fn:2&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;2&lt;/a&gt;&lt;/sup&gt;. But what&amp;rsquo;s the original &lt;em&gt;original&lt;/em&gt; source? Any why measure the cake breakage angle?&lt;/p&gt;
&lt;h2 id=&#34;the-hunt-for-the-source-of-the-cake-dataset&#34;&gt;The hunt for the source of the cake dataset&lt;/h2&gt;
&lt;p&gt;From the &lt;code&gt;lme4&lt;/code&gt; documentation I knew that the &lt;code&gt;cake&lt;/code&gt; dataset came from the study by Cook (1938)&lt;sup id=&#34;fnref2:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt; but no amount of Googling, Binging, nor Google Scholaring resulted in any trace of a digital copy.
I did find that physical copies existed at Iowa State University and at Cornell, which presented a problem for me, being physically in Sweden.
There was an option to request that the copy would be digitized, an option available to Iowa State faculty only.&lt;/p&gt;
&lt;p&gt;Twitter to the rescue, I thought, and fired away a tweet that got a tumbleweed response.
But, final proof for me that Twitter is dying, the same request on Mastodon (
  &lt;a href=&#34;https://fosstodon.org/@rabaath&#34;&gt;come join me!&lt;/a&gt;) was an astounding success!&lt;/p&gt;
&lt;p&gt;
  &lt;a href=&#34;https://fosstodon.org/@rabaath/111767748854754120&#34;&gt;

&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/mastodon-call.jpeg&#34;    width = &#34;648&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I got many helpful responses, with several pointing me directly at Iowa State staff that might help me out. Like this one from 
  &lt;a href=&#34;https://fosstodon.org/@kbroman&#34;&gt;Karl Broman&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/mastodon-reply.jpeg&#34;    width = &#34;642&#34; /&gt;&lt;/p&gt;
&lt;p&gt;A quick e-mail later and I got this very encouraging e-mail from Dan Nettleton at the Department of Statistics, Iowa State:&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/dan-nettleton-reply.jpeg&#34;    width = &#34;699&#34; /&gt;&lt;/p&gt;
&lt;p&gt;He recruited the help of Philip M. Dixon, Department of Statistics, and Megan O’Donnell, Research Data Services Lead, and after a couple of days more I got this from Megan:&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/megan-odonnell-reply.jpeg&#34;    width = &#34;703&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;She&lt;/em&gt; (the busy Research Data Services Lead with a looming deadline) is apologizing to &lt;em&gt;me&lt;/em&gt; (the random Swede with an eccentric cake thesis digitization request) that it took a few days to get me everything I asked for!? Still, the feeling of shame for having wasted Megan&amp;rsquo;s time was overshadowed by joy. Attached to the e-mail was, of course, also the full Master&amp;rsquo;s thesis of Frances E. Cook from 1938: &lt;a href=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/cook_1938_chocolate_cake.pdf&#34;&gt;&lt;em&gt;Chocolate cake: I. Optimum baking temperature&lt;/em&gt;.&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;highlights-from-_chocolate-cake-i-optimum-baking-temperature_&#34;&gt;Highlights from &lt;em&gt;Chocolate cake: I. Optimum baking temperature&lt;/em&gt;&lt;/h2&gt;
&lt;p&gt;Reading the thesis, it&amp;rsquo;s immediately clear that the breakage angle of cakes wasn&amp;rsquo;t the main focus. Instead, Cook was after some &amp;ldquo;accurate scientific information&amp;rdquo; on the optimum baking temperature for chocolate cake.&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/introduction.png&#34;    width = &#34;528&#34; /&gt;&lt;/p&gt;
&lt;p&gt;To figure out what was the best chocolate cake, she needed a battery of measures of cake goodness, such as cake tenderness, &lt;em&gt;as measured objectively by its breaking angle&lt;/em&gt;. There were also several subjective measures, as found in the &amp;ldquo;Score Card for Cake&amp;rdquo; on page 50.&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/score-card-for-cake.png&#34;    width = &#34;475&#34; /&gt;&lt;/p&gt;
&lt;p&gt;But how was the breaking angle of the cakes measured? In the thesis, we learn that &amp;ldquo;The tenderness of the cake was tested with the breaking angle apparatus as described by Myers (1936)&lt;sup id=&#34;fnref:3&#34;&gt;&lt;a href=&#34;#fn:3&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;3&lt;/a&gt;&lt;/sup&gt;&amp;rdquo;, but there are no images that show us how it functioned. While I can&amp;rsquo;t find an online trace of Myers (1936)&lt;sup id=&#34;fnref1:3&#34;&gt;&lt;a href=&#34;#fn:3&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;3&lt;/a&gt;&lt;/sup&gt; I do believe I&amp;rsquo;ve found a description of this very apparatus in Lowe and Nelson (1939)&lt;sup id=&#34;fnref:4&#34;&gt;&lt;a href=&#34;#fn:4&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;4&lt;/a&gt;&lt;/sup&gt;!&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/cake-break-apparatus.jpeg&#34;    width = &#34;538.8&#34; /&gt;&lt;/p&gt;
&lt;p&gt;From an outsider perspective, not being active in the field of culinary research myself, the thesis of Cook comes off as being fantastically serious about cake. I especially adore that it includes photographs of all the cakes:&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/cake-photos.jpeg&#34;    width = &#34;453&#34; /&gt;&lt;/p&gt;
&lt;p&gt;But, to be fair, in the photos above, you can clearly see how the baking temperature influences the volume of the cake.&lt;/p&gt;
&lt;h2 id=&#34;the-cake-recipes&#34;&gt;The cake recipes&lt;/h2&gt;
&lt;p&gt;Like in a food blog that has been SEOed to death, here, finally, at the very end, are the cake recipes. I might not be the most experienced cake maker, but this is &lt;em&gt;by far&lt;/em&gt; the most complicated chocolate cake recipe I&amp;rsquo;ve ever seen.&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/recipe1.png&#34;    width = &#34;635.5&#34; /&gt;


&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/recipe2.png&#34;    width = &#34;636.5&#34; /&gt;


&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/recipe3.png&#34;    width = &#34;638&#34; /&gt;


&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/recipe4.png&#34;    width = &#34;634&#34; /&gt;


&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/recipe5.png&#34;    width = &#34;633&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Now, for the baking time and temperature above you get a matrix of options.
The answer for which option to pick can be found a bit further down in table XV, which displays the total scores for each option.&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/total-cake-scores.png&#34;    width = &#34;578&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The winner, when considering the dimensions texture, tenderness, velvetiness and eating quality, was Recipe C with a baking temperature of 225 C° (437 F°) for 24 minutes. I&amp;rsquo;m no cake scientist, but if a linear model is to be believed when extrapolating outside of the range of the dataset (always a good idea) this cake would be &lt;em&gt;delicious&lt;/em&gt; when baked in a pizza oven!&lt;/p&gt;
&lt;p&gt;

&lt;img src=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/images/cake-score-model.png&#34;    width = &#34;633.5&#34; /&gt;&lt;/p&gt;
&lt;div class=&#34;footnotes&#34; role=&#34;doc-endnotes&#34;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&#34;fn:1&#34;&gt;
&lt;p&gt;&lt;a href=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/cook_1938_chocolate_cake.pdf&#34;&gt;Cook, Frances E. (1938). &lt;em&gt;Chocolate cake: I. Optimum baking temperature&lt;/em&gt;. (Master&amp;rsquo;s thesis, Iowa State College).&lt;/a&gt;&amp;#160;&lt;a href=&#34;#fnref:1&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href=&#34;#fnref1:1&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href=&#34;#fnref2:1&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:2&#34;&gt;
&lt;p&gt;Cochran, W. G., and Cox, G. M. (1957) &lt;em&gt;Experimental designs&lt;/em&gt;, 2nd Ed. New York, John Wiley &amp;amp; Sons.&amp;#160;&lt;a href=&#34;#fnref:2&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:3&#34;&gt;
&lt;p&gt;Myers, Elizabeth. (1936). &lt;em&gt;Plain Cake X. Effect of two temperatures of ingredients at time of combining on fat distribution as determined by microscopical examination&lt;/em&gt;. (Unpublished thesis, Iowa State College)&amp;#160;&lt;a href=&#34;#fnref:3&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href=&#34;#fnref1:3&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:4&#34;&gt;
&lt;p&gt;&lt;a href=&#34;https://sumsar.net/blog/source-of-the-cake-dataset/agricultural_research_bulletin_1939_v023_b255.pdf&#34;&gt;Lowe, Belle and Nelson, P. Mabel (1939) &lt;em&gt;The physical and chemical characteristics of lards and other fats in relation to their culinary value. II. Use in plain cake.&lt;/em&gt; Iowa Agrigultural Research Bulletin 255.&lt;/a&gt;&amp;#160;&lt;a href=&#34;#fnref:4&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Get a Git repo where your team can stow their throwaway data science code!</title>
      <link>https://sumsar.net/blog/git-repo-to-stash-throwaway-code/</link>
      <pubDate>Sat, 02 Dec 2023 00:00:00 +0100</pubDate>
      
      <guid>https://sumsar.net/blog/git-repo-to-stash-throwaway-code/</guid>
      <description>&lt;p&gt;When I started working as a Data Scientist nearly ten years ago, the data science team I joined did something I found really strange at first: They had a &lt;strong&gt;single&lt;/strong&gt; GitHub repo where they put &lt;strong&gt;all&lt;/strong&gt; their &amp;ldquo;throwaway&amp;rdquo; code. An R script to produce some plots for a presentation, a Python notebook with a machine learning proof-of-concept, a bash script for cleaning some logs. It all went into the same repo. Initially, this felt sloppy to me, and sure, there are better ways to organize code, but I&amp;rsquo;ve come to learn that not having a single place for throwaway code in a team is far worse. Without a place for throwaway code, what&amp;rsquo;s going to happen is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Some ambitious person on the team will create a new GitHub repo for every single analysis/POC/thing they do, &amp;ldquo;swamping&amp;rdquo; the GitHub namespace.&lt;/li&gt;
&lt;li&gt;Some others will stow their code on the company wiki or drop it in the team Slack channel.&lt;/li&gt;
&lt;li&gt;But most people aren&amp;rsquo;t going to put it anywhere, and we all know that code &amp;ldquo;available on request&amp;rdquo; often isn&amp;rsquo;t available at all.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, in all teams I&amp;rsquo;ve worked in, I&amp;rsquo;ve set up a GitHub repo that looks something like this:&lt;/p&gt;
&lt;p&gt;
  &lt;a href=&#34;https://github.com/rasmusab/ds-exploration-template&#34;&gt;

&lt;img src=&#34;https://sumsar.net/blog/git-repo-to-stash-throwaway-code/data-science-exploration-repo.jpeg&#34;    width = &#34;1722&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;With the following blurb: &lt;em&gt;A place for non-production scripts, notebooks, and other throwaway code. Don&amp;rsquo;t bother with branches and pull requests, unless you want a review, as this is more of a Dropbox folder masquerading as a GitHub repo&lt;/em&gt;. If you want to set up a similar repo, feel free to take a look at 
  &lt;a href=&#34;https://github.com/rasmusab/ds-exploration-template&#34;&gt;the &lt;code&gt;ds-exploration-template&lt;/code&gt; repo over here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;And having such a repo has been very useful! It&amp;rsquo;s not the best place to put code, it does tend to become a bit disorganized after a while, but it is &lt;em&gt;a place&lt;/em&gt; to put code, and where it&amp;rsquo;s easy to do so. And then, when you get a request that makes you think &amp;ldquo;Ah! I remember that Kristin (who&amp;rsquo;s on parental leave and shouldn&amp;rsquo;t be bothered) did something similar last year!&amp;rdquo; it&amp;rsquo;s really great to be able to go to that repo and find that code.&lt;/p&gt;</description>
    </item>
    
  </channel>
</rss>
