Rylee Alanza Lyman
“Forth” is the name of a stack-based programming language. I’ve been curious about what goes into writing a programming language for a while now, and after a little prompting from a friend, I decided to give writing one a try. The purpose of this post is to talk a little about what the language I wrote, which I’m calling “hither”, is like. In a future post I’d like to talk about some interesting things I learned along the way.
I chose the name “hither” because just like one can “go forth”, one can “come hither”. Naming things is fun, I definitely encourage people to engage their whimsy a little when they do.
You can find the code for hither
on GitHub.
To run it yourself, you’ll need a copy of Zig version 0.12.
I believe that in its current state, as of today (May 21, 2024)
hither
will also run on the development version of Zig 0.13,
but I make no promises.
Anyway, now that you’re all squared away,
you can build and run hither
from the repository root by executing zig build run
.
(By the way, if you’re reading this and want to give hither
a try
but the above isn’t sufficient instructions for you to fill in the blanks,
shoot me an email!
I’d be tickled pink to walk you through it.)
Anyway, the Zig compiler will do its thing for a bit and then hit you with
type some hither code!
>
so now you’re looking at the hither
REPL!
you can type some stuff and hit enter, and the REPL will think about it and tell you what it computed.
for example:
> 2 2 +
4
ok
>
(By the way, apologies if you’re a command line super Amadeus, in which case the REPL interaction might feel a bit constrained, since familiar keyboard interaction patterns from interacting with your shell simply won’t work and will dump strange escape codes instead. I might find the motivation to clean this up, but I also might not.)
Anyway, from this we can learn a few things.
First of all, like Forth, hither uses what is sometimes called “reverse Polish notation”,
a funky way of operating I first learned by playing with my dad’s fancy calculator in the 90s.
Basically rather than “infix” notation, which we’re used to for addition like 2 + 2
,
we put the arguments to the operation before the operand.
This has the advantage of making parsing essentially trivial:
by the time our parser hits the +
, everything is all set up for it to just perform the addition.
Programming languages that operate in this way, by sort of building up longer and longer streams of operations,
are sometimes called concatenative.
So 2 2 +
operates like this: hither has a little “stack”,
a “first in, first out” data structure common to many programming languages.
the hither
interpreter takes in a line of text,
parses it into little chunks I’ll call “words”,
and then pushes each word onto the stack.
Here we have three words, 2
, 2
and +
.
When 2
is pushed onto the stack, nothing happens—it’s just the number 2.
But when +
is pushed, it pops two words from the stack,
checks that they’re numbers, and then pushes their sum onto the stack.
Since that’s the end of the line,
the hither
interpreter checks that there isn’t anything hanging out waiting to be finished on the stack
(secretly it does this by checking the state of a return stack that it also manages).
If everything looks good,
the interpreter prints out the contents of the stack,
so 4
in this case,
and then writes ok
to tell you that the previous operation completed successfully.
From a user’s perspective, hither
supports four basic data types:
integers which we’ve seen,
floating point numbers, like 1.5
,
strings which you enter surrounded by quotation marks as in "this is a string"
,
and words which are whitespace-delimited symbols which don’t parse as strings, integers or numbers:
+
is a word, but so is something like this_is_4-word
.
There are several words predefined for you.
At the time of this writing, they are:
+
, -
, *
, /
, %
, >
, >=
, <
, <=
, ==
, dupe
, swap
, height
, top
, drop
,
and
, or
, not
, print
, ++
, :=
, @
, while
, _
, '
, {
, }
, if
, else
, then
,
exit
, dump
, inspect
and abort
.
I’ll explain some of them, but leave the rest to you to play around with—or to ask me directly!
To define a new word, you can use :=
in combination with either a string or the '
operator,
as in
> 1 ' x :=
ok
> x
1
ok
> 2 "x" :=
ok
> x
2
ok
These are completely identical. Unfortunately, directly writing 1 x :=
won’t work:
the reason is that the interpreter doesn’t know that the :=
operator is coming,
so it attempts to interpret x
as a word.
If x
has been defined already, this succeeds, but puts the value of x
onto the stack,
which doesn’t make sense to assign with :=
.
If x
hasn’t been defined, then we get an error.
That’s why we need to use the '
operator—it effectively tells the interpreter
to push the next token onto the stack without
calling it or even attempting to resolve whether it’s been defined.
To define a word that actually executes some code,
we can use the {
and }
pair.
These start and end a lambda, programmer jargon for an “anonymous function”.
You should think of the curly braces as sort putting what ever comes between them “in quotes”.
Just as doing that in English is different from saying those words directly,
so too in hither
does this “quotation” put a different spin on things.
Rather than executing directly,
the hither
interpreter collects everything inside and packages it up:
> { 1 2 + }
slice: address: 0xfff80, length: 3, type: definition
ok
(On your machine, the actual address you see printed might differ, but the length and type should be the same.)
What this is telling us is that there is something on the stack,
but that thing is a definition, rather than anything else we’ve seen previously.
There are effectively two things you can do with definitions:
call them now or assign them to words to call later.
To call a definition now, you use @
:
> { 1 2 + } @
3
ok
To assign a definition to a word, you use the same syntax we did for variables:
> { 1 2 + } ' three :=
ok
> three
3
ok
hither
Here’s what is apparently an old chestnut in programming interviews:
Print out the numbers from 1 to 100, except when the number is divisible by 3, print “Fizz” instead. When the number is divisible by 5, print “Buzz” instead of the number. When the number is divisible by both 3 and 5, print “FizzBuzz” instead.
There are likely many ways to solve this in hither
even with such a limited palette of options,
but here’s the one I came up with.
I’ll explain how it works after the code.
> { dupe 3 % 0 == if "Fizz" swap then } ' fizz :=
ok
> { dupe 5 % 0 == if "Buzz" swap then } ' buzz :=
ok
> { height 3 == if 2 top ++ else 1 top then print } ' output :=
ok
> 0 ' x :=
ok
> { x fizz buzz output } { x 100 < dupe if x 1 + ' x := then } while
1
2
Fizz
4
Buzz
...
FizzBuzz
91
92
Fizz
94
Buzz
Fizz
97
98
Fizz
Buzz
ok
So there are three main definitions, two of which are nearly identical.
fizz
first duplicates the value on top of the stack with dupe
,
then checks whether it is divisible by 3 with 3 % 0 ==
,
and then if "Fizz" swap then
checks whether the value on the stack is “truthy”
(in hither
all values except 0 are “truthy”)
and if so pushes the string Fizz
onto the stack
and swaps the top two values of the stack.
So for example 9 fizz
operates as follows, with --
separating the stack from the program:
-- 9 fizz
9 -- dupe 3 % 0 == if "Fizz" swap then
9 9 -- 3 % 0 == if "Fizz" swap then
9 9 3 -- % 0 == if "Fizz" swap then
9 0 -- 0 == if "Fizz" swap then
9 0 0 -- == if "Fizz" swap then
9 1 -- if "Fizz" swap then
9 -- "Fizz" swap
9 "Fizz" -- swap
"Fizz" 9 --
output
introduces a couple new words:
height
pushes the current height of the stack onto the stack.
So from an empty stack, we have -- height
becomes 0 --
.
If the height is equal to three,
e.g. when executing 15 fizz buzz
, the stack ends up as "Fizz" "Buzz" 15
,
the statement 2 top
sets the height of the stack to 2,
effectively dropping 15
,
and then ++
joins the two strings on top of the stack.
If the height is not three,
as in 9 fizz buzz
yielding "Fizz" 9
or 1 fizz buzz
yielding 1
,
we set the height to 1.
In either case we print out the result (which has the side effect of clearing the stack).
Next we give ourselves a variable x
.
Finally, we have the while
statement.
This executes a loop in hither
.
It's like this <func> <cond> while
it takes in two values off of the stack, <func>
and <cond>
.
These need not be lambdas, but in practice probably ought to be.
First while
pushes cond
onto the stack (this will call it if it's a lambda),
then checks whether the value at the top of the stack is truthy.
If so, it pushes func
onto the stack, and then repeats.
The first lambda is pretty straightforward.
The second one is a little sneaky:
first it checks whether x
is less than 100
,
and then if so, adds one to x
.
Since this is executed before our <func>
lambda,
that's why you see the numbers from 1 to 100,
even though x
starts at 0
.