Today, approximately five weeks after starting this project, I’ve got a Lisp sophisticated enough to read and evaluate functions that are halfway useful. Exhibit A:
(def! factorial (fn* (n) (if (<= n 1) n (* n (factorial (- n 1))))))
Taking this from the left:
(def! factorial ...)
define a symbol calledfactorial
and assign to it the result of …(fn* (n) ...)
…creating a function that takes a single argumentn
and whose value depends on the result of evaluating the boolean expression…(<= n 1)
. If this is true,factorial
returnsn
. Otherwise, the result is calculated by- recursively invoking
factorial
on(- n 1)
.
To get to this point, I’ve completed steps 0 to 4 in the Make a Lisp guide I’m following:
- step 0: the core
READ
-EVAL
-PRINT
-LOOP
(REPL) programme structure that characterises Lisp. The result is a programme that reads input from the keyboard and prints it straight back - step 1: Improved
READ
andPRINT
functions that can pick out (and print) distinct tokens in the input. Tokens are the ‘words’ that form Mal instructions, e.g.(
,def
,factorial
,(
, etc - step 2: Improved
EVAL
functionality that can recognise and evaluate arithmetic expressions, e.g. converting(+ 1 2)
to3
. We now have a desktop calaculator. By the way, these(
…)
combinations are called ‘forms’ when discussing Lisp - step 3: Environments that map symbols to values, and thus implement variables. We can now type the form
(def a (+ 1 2))
and Lisp will do the calculation and store the result ina
. If it readsa
again it will look the value (3
here) up and use that instead - step 4: Lisp ‘special forms’ that support conditional checks (
if
) and function definition.
The source code is on Github.
I admit to cheating slightly by looking at the reference C# implementation (‘C#-Mal’), specifically to understand how to:
- Implement Mal types (symbol, list, number, string, etc) using C# classes
- Create closures (nameless functions that can be created, associated with Lisp symbols and subsequently invoked).
- Use the C# regular expression (‘regex’) mechanism to scan the raw input and convert it into Mal tokens
Regex are a programming tool for recognising patterns in streams of text; they are powerful but the pattern definitions are extremely hard to write - the definitions for Mal tokens is
[\s,]*(~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])*"|;.*|[^\s\[\]{}('"`,;)]*)
But the rest I figured out on my own based on the instructions in the guide and the C# documentation. In my defence, I’ve also made some improvements. Specifically, JKL
can handle:
- Floating point numbers such as:
1.2
,.5
,-.5
whereas Mal can only handle integers - A wider range of errors without crashing back into the C# debugger, e.g. bad numbers such as 1.2.3, non-terminating strings, functions with incorrect arguments, etc
- Multi-line forms.
(READ...)
is now invoked until the input line is empty; in Mal it’s called once per line (1-1 withEVAL
) and will silently drop all but the first form on a line - Multi-form lines.
(READ...)
now uses a queue of tokens that is refilled if the tokens run out in the middle of a list, vector or hashmap. - Arithmetic operations (
+
,-
,*
and\
) that have more than two arguments
The biggest changes compared with C#-Mal relate to regex handling and were prompted by the need to handle decimals. I didn’t find the .Net regex documentation particularly enlightening so checked how C#-Mal did things. It turns out that it uses regex in two places, one to perform gross tokenisation of the input lines (the sample pattern listed above), the second to convert selected tokens into Mal symbols, numbers, keywords and numbers. I ended up completely redoing the latter with a more comprehensive version that could handle decimals and detect more errors.
There are still some quite complex steps involved to get JKL
to the point where it can self-host, but I have the basic program framework and understand the range of C# mechanisms involved. So, I think I can now avoid looking at the the C#-Mal implementation of the remaining steps.