“Forth” is the name of a stack-based programming language. I’ve been curious about what goes into writing a programming language for a while now, and after a little prompting from a friend, I decided to give writing one a try. The purpose of this post is to talk a little about what the language I wrote, which I’m calling “hither”, is like. In a future post I’d like to talk about some interesting things I learned along the way.

The name

I chose the name “hither” because just like one can “go forth”, one can “come hither”. Naming things is fun, I definitely encourage people to engage their whimsy a little when they do.

A little demo

You can find the code for hither on GitHub. To run it yourself, you’ll need a copy of Zig version 0.12. I believe that in its current state, as of today (May 21, 2024) hither will also run on the development version of Zig 0.13, but I make no promises.

Anyway, now that you’re all squared away, you can build and run hither from the repository root by executing zig build run.

(By the way, if you’re reading this and want to give hither a try but the above isn’t sufficient instructions for you to fill in the blanks, shoot me an email! I’d be tickled pink to walk you through it.)

Anyway, the Zig compiler will do its thing for a bit and then hit you with

type some hither code!
>

so now you’re looking at the hither REPL! you can type some stuff and hit enter, and the REPL will think about it and tell you what it computed. for example:

> 2 2 +
4
ok
>

(By the way, apologies if you’re a command line super Amadeus, in which case the REPL interaction might feel a bit constrained, since familiar keyboard interaction patterns from interacting with your shell simply won’t work and will dump strange escape codes instead. I might find the motivation to clean this up, but I also might not.)

Anyway, from this we can learn a few things. First of all, like Forth, hither uses what is sometimes called “reverse Polish notation”, a funky way of operating I first learned by playing with my dad’s fancy calculator in the 90s. Basically rather than “infix” notation, which we’re used to for addition like 2 + 2, we put the arguments to the operation before the operand. This has the advantage of making parsing essentially trivial: by the time our parser hits the +, everything is all set up for it to just perform the addition. Programming languages that operate in this way, by sort of building up longer and longer streams of operations, are sometimes called concatenative.

So 2 2 + operates like this: hither has a little “stack”, a “first in, first out” data structure common to many programming languages. the hither interpreter takes in a line of text, parses it into little chunks I’ll call “words”, and then pushes each word onto the stack. Here we have three words, 2, 2 and +. When 2 is pushed onto the stack, nothing happens—it’s just the number 2. But when + is pushed, it pops two words from the stack, checks that they’re numbers, and then pushes their sum onto the stack. Since that’s the end of the line, the hither interpreter checks that there isn’t anything hanging out waiting to be finished on the stack (secretly it does this by checking the state of a return stack that it also manages). If everything looks good, the interpreter prints out the contents of the stack, so 4 in this case, and then writes ok to tell you that the previous operation completed successfully.

Strings and variables

From a user’s perspective, hither supports four basic data types: integers which we’ve seen, floating point numbers, like 1.5, strings which you enter surrounded by quotation marks as in "this is a string", and words which are whitespace-delimited symbols which don’t parse as strings, integers or numbers: + is a word, but so is something like this_is_4-word. There are several words predefined for you. At the time of this writing, they are: +, -, *, /, %, >, >=, <, <=, ==, dupe, swap, height, top, drop, and, or, not, print, ++, :=, @, while, _, ', {, }, if, else, then, exit, dump, inspect and abort. I’ll explain some of them, but leave the rest to you to play around with—or to ask me directly! To define a new word, you can use := in combination with either a string or the ' operator, as in

> 1 ' x :=
ok
> x
1
ok
> 2 "x" :=
ok
> x
2
ok

These are completely identical. Unfortunately, directly writing 1 x := won’t work: the reason is that the interpreter doesn’t know that the := operator is coming, so it attempts to interpret x as a word. If x has been defined already, this succeeds, but puts the value of x onto the stack, which doesn’t make sense to assign with :=. If x hasn’t been defined, then we get an error. That’s why we need to use the ' operator—it effectively tells the interpreter to push the next token onto the stack without calling it or even attempting to resolve whether it’s been defined.

Lambdas

To define a word that actually executes some code, we can use the { and } pair. These start and end a lambda, programmer jargon for an “anonymous function”. You should think of the curly braces as sort putting what ever comes between them “in quotes”. Just as doing that in English is different from saying those words directly, so too in hither does this “quotation” put a different spin on things. Rather than executing directly, the hither interpreter collects everything inside and packages it up:

> { 1 2 + }
slice: address: 0xfff80, length: 3, type: definition
ok

(On your machine, the actual address you see printed might differ, but the length and type should be the same.) What this is telling us is that there is something on the stack, but that thing is a definition, rather than anything else we’ve seen previously. There are effectively two things you can do with definitions: call them now or assign them to words to call later. To call a definition now, you use @:

> { 1 2 + } @
3
ok

To assign a definition to a word, you use the same syntax we did for variables:

> { 1 2 + } ' three :=
ok
> three
3
ok

FizzBuzz in hither

Here’s what is apparently an old chestnut in programming interviews:

Print out the numbers from 1 to 100, except when the number is divisible by 3, print “Fizz” instead. When the number is divisible by 5, print “Buzz” instead of the number. When the number is divisible by both 3 and 5, print “FizzBuzz” instead.

There are likely many ways to solve this in hither even with such a limited palette of options, but here’s the one I came up with. I’ll explain how it works after the code.

> { dupe 3 % 0 == if "Fizz" swap then } ' fizz :=
ok
> { dupe 5 % 0 == if "Buzz" swap then } ' buzz :=
ok
> { height 3 == if 2 top ++ else 1 top then print } ' output :=
ok
> 0 ' x :=
ok
> { x fizz buzz output } { x 100 < dupe if x 1 + ' x := then } while
1
2
Fizz
4
Buzz
...
FizzBuzz
91
92
Fizz
94
Buzz
Fizz
97
98
Fizz
Buzz
ok

So there are three main definitions, two of which are nearly identical. fizz first duplicates the value on top of the stack with dupe, then checks whether it is divisible by 3 with 3 % 0 ==, and then if "Fizz" swap then checks whether the value on the stack is “truthy” (in hither all values except 0 are “truthy”) and if so pushes the string Fizz onto the stack and swaps the top two values of the stack.

So for example 9 fizz operates as follows, with -- separating the stack from the program:

  1. -- 9 fizz
  2. 9 -- dupe 3 % 0 == if "Fizz" swap then
  3. 9 9 -- 3 % 0 == if "Fizz" swap then
  4. 9 9 3 -- % 0 == if "Fizz" swap then
  5. 9 0 -- 0 == if "Fizz" swap then
  6. 9 0 0 -- == if "Fizz" swap then
  7. 9 1 -- if "Fizz" swap then
  8. 9 -- "Fizz" swap
  9. 9 "Fizz" -- swap
  10. "Fizz" 9 --

output introduces a couple new words: height pushes the current height of the stack onto the stack. So from an empty stack, we have -- height becomes 0 --. If the height is equal to three, e.g. when executing 15 fizz buzz, the stack ends up as "Fizz" "Buzz" 15, the statement 2 top sets the height of the stack to 2, effectively dropping 15, and then ++ joins the two strings on top of the stack. If the height is not three, as in 9 fizz buzz yielding "Fizz" 9 or 1 fizz buzz yielding 1, we set the height to 1. In either case we print out the result (which has the side effect of clearing the stack).

Next we give ourselves a variable x. Finally, we have the while statement. This executes a loop in hither. It's like this <func> <cond> while it takes in two values off of the stack, <func> and <cond>. These need not be lambdas, but in practice probably ought to be. First while pushes cond onto the stack (this will call it if it's a lambda), then checks whether the value at the top of the stack is truthy. If so, it pushes func onto the stack, and then repeats.

The first lambda is pretty straightforward. The second one is a little sneaky: first it checks whether x is less than 100, and then if so, adds one to x. Since this is executed before our <func> lambda, that's why you see the numbers from 1 to 100, even though x starts at 0.