# Peter Norvig's Spell Checker in Two Lines of Base R

Peter Norvig, the director of research at Google, wrote a nice essay on How to Write a Spelling Corrector a couple of years ago. That essay explains and implements a simple but effective spelling correction function in just 21 lines of Python. Highly recommended reading! I was wondering how many lines it would take to write something similar in base R. Turns out you can do it in (at least) two pretty obfuscated lines:

While not working exactly as Norvig’s version it should result in similar spelling corrections:

So let’s deobfuscate the two-liner slightly (however, the code below might not make sense if you don’t read Norvig’s essay first):

Some thoughts:

• The main reason for why the R version is so short is because base R includes the adist function. (A one line spell checker in R is indeed possible using the aspell function :)
• A second reason for why the R version is so short is that the many vectorized functions in R make it possible to do a lot of work in one line.
• Indeed, the horrible line creating the sorted_words vector would be a perfect target for some magrittr magic.
• The R version does not solve the problem in exactly the same way as Norvig’s code. He maintains the count of each word in the NWORDS variable in order to be able to extract the most probable matching word. This is not necessary in the R code, as we already have a sorted vector we know that the first item always will be the most probable. Still, I believe the two approaches result in the same spelling corrections (but prove me wrong :).
• There are links to implementations in many other languages at the bottom of Norvig’s essay. Looking at the Java version reminds me of my dark Java past and madness like HashMap<Integer, String> candidates = new HashMap<Integer, String>();.