Make a Lisp [2] Technical SITREP

Today, approximately five weeks after starting this project, I’ve got a Lisp sophisticated enough to read and evaluate functions that are halfway useful. Exhibit A:

(def! factorial (fn* (n) (if (<= n 1) n (* n (factorial (- n 1))))))

Taking this from the left:

(def! factorial ...) define a symbol called factorial and assign to it the result of …
(fn* (n) ...) …creating a function that takes a single argument n and whose value depends on the result of evaluating the boolean expression…
(<= n 1). If this is true, factorial returns n. Otherwise, the result is calculated by
recursively invoking factorial on (- n 1).

To get to this point, I’ve completed steps 0 to 4 in the Make a Lisp guide I’m following:

step 0: the core READ - EVAL - PRINT - LOOP (REPL) programme structure that characterises Lisp. The result is a programme that reads input from the keyboard and prints it straight back
step 1: Improved READ and PRINT functions that can pick out (and print) distinct tokens in the input. Tokens are the ‘words’ that form Mal instructions, e.g. (, def, factorial, (, etc
step 2: Improved EVAL functionality that can recognise and evaluate arithmetic expressions, e.g. converting (+ 1 2) to 3. We now have a desktop calaculator. By the way, these (…) combinations are called ‘forms’ when discussing Lisp
step 3: Environments that map symbols to values, and thus implement variables. We can now type the form (def a (+ 1 2)) and Lisp will do the calculation and store the result in a. If it reads a again it will look the value (3 here) up and use that instead
step 4: Lisp ‘special forms’ that support conditional checks (if) and function definition.

The source code is on Github.

I admit to cheating slightly by looking at the reference C# implementation (‘C#-Mal’), specifically to understand how to:

Implement Mal types (symbol, list, number, string, etc) using C# classes
Create closures (nameless functions that can be created, associated with Lisp symbols and subsequently invoked).
Use the C# regular expression (‘regex’) mechanism to scan the raw input and convert it into Mal tokens

Regex are a programming tool for recognising patterns in streams of text; they are powerful but the pattern definitions are extremely hard to write - the definitions for Mal tokens is

[\s,]*(~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])*"|;.*|[^\s\[\]{}('"`,;)]*)

But the rest I figured out on my own based on the instructions in the guide and the C# documentation. In my defence, I’ve also made some improvements. Specifically, JKL can handle:

Floating point numbers such as: 1.2, .5, -.5 whereas Mal can only handle integers
A wider range of errors without crashing back into the C# debugger, e.g. bad numbers such as 1.2.3, non-terminating strings, functions with incorrect arguments, etc
Multi-line forms. (READ...) is now invoked until the input line is empty; in Mal it’s called once per line (1-1 with EVAL) and will silently drop all but the first form on a line
Multi-form lines. (READ...) now uses a queue of tokens that is refilled if the tokens run out in the middle of a list, vector or hashmap.
Arithmetic operations (+, -, * and \) that have more than two arguments

The biggest changes compared with C#-Mal relate to regex handling and were prompted by the need to handle decimals. I didn’t find the .Net regex documentation particularly enlightening so checked how C#-Mal did things. It turns out that it uses regex in two places, one to perform gross tokenisation of the input lines (the sample pattern listed above), the second to convert selected tokens into Mal symbols, numbers, keywords and numbers. I ended up completely redoing the latter with a more comprehensive version that could handle decimals and detect more errors.

There are still some quite complex steps involved to get JKL to the point where it can self-host, but I have the basic program framework and understand the range of C# mechanisms involved. So, I think I can now avoid looking at the the C#-Mal implementation of the remaining steps.