Evil#Forth Language Scenic Tour
NOTE: This is not a deep dive into Forth. It’s a whirlwind tour of the syntax and design of the language. You will want to read through the core API documentation if you want a more exhaustive listing of available words.
The language provided by the EvilVM system is a variant of Forth. However, it is not intended to be ANSI standard compliant, and deviates (in some cases, substantially) from other forths. This document provides a very basic introduction to this interesting language. Bear in mind that Forth is not very strongly related to any other family of languages, and can seem alien at the start. Stick with it, though, and it becomes more readable and manageable than you might think at first.
Forth is a highly desirable language paradigm for this project because of its fundamental simplicity and austere requirements. When interacting with a live Forth, the user sits at a terminal, submitting text to the input stream, and reading responses on the output stream.
Words are recognized as any sequence of consecutive non-whitespace characters. The Forth interpreter reads one word at a time. It also maintains a data structure called the dictionary – this is a key->value store that maps words to behaviors (technically, words to locations in memory). When the user submits a word, it is looked up in the dictionary and executed. Think of it like a function call.
But if the word isn’t found, the interpreter tries to convert it to a number to put it on the data stack. This is one of two stacks in Forth. Functions do not take formal arguments in Forth, but rather use the contents of the stack. Let’s run a very simple program in Forth:
3 4 + .
This program instructs the Forth interpreter to do several things. First, it puts two numbers on the stack (3 and 4). When the interpreter reads the +
word, it finds its address in the dictionary and executes it. Its behavior is to take the top two items from the stack, add them, and put the result back on the stack. Finally, the .
word will take the top item on the stack and print it out as a number. Ideally, if you run this you will see “7” as the result.
Forth is more than a calculator, however. You can define new words using the :
compiler. It will consume input words until a ;
terminates compilation and returns to the outer interpreter. The compiler works similarly to the interpreter, only instead of executing words or putting numbers on the stack, it compiles function calls into a new word’s behavior in the dictionary.
: test 3 4 + . ;
This program defines a new word test
that does our math from before. We can run this at any time by entering the word test
. If you are curious to see the contents of test
, you can use the see
word. This word will read a word from the input stream, look it up in the dictionary, find its location and length, and issue a hexdump of its contents:
see test
25bce0e3707 49 83 ec 08 49 89 3c 24 bf 03 00 00 00 49 83 ec
25bce0e3717 08 49 89 3c 24 bf 04 00 00 00 49 03 3c 24 49 83
25bce0e3727 c4 08 e8 f1 26 ff ff 4d 3b a7 30 00 00 00 0f 8f
25bce0e3737 d9 df fc ff c3
If you disassemble this code, you will see how the :
compiler created this new word. Since I assume that you’re an elite hacker, and not a total noob, I’ll show you what this does. It is perhaps not too important to understand all that’s going on, but it’s enough to point out that the code you compile can be inspected and plausibly understood.
Code does not have to be linear in forth – there are conditionals and loops as well. Here’s a word that tells us if a number on the stack is even or odd:
: oddness 1 and if ." Odd!\n" else ." Even!\n" then ;
This word introduces several new elements. First is the conditional behavior using the if ... else ... then
block. The use of the word then
to conclude the conditional is somewhat unusual, but this is Forth tradition. The phrase 1 and
will perform a bit-wise AND operation on the top stack item. Similar to C, zero is considered false
and all other values are true
.
Another thing we see here is an operation that prints a string. The ."
word prints a string to the output stream, terminated by the next unescaped "
character. When I first came to Forth, I thought it was really frustrating that the string “functions” are words just like any others, and there has to be a phantom “space” between ."
and the string’s contents. But, as I grew to understand the utter consistency of the Forth execution model, it grew on me. So, as weird as it seems, I encourage you to embrace it. There are some very beautiful reasons it works this way, trust me!
I mentioned loops, so let’s make a word with a loop in it. Maybe we want to see the oddness of a sequence of numbers. We can use a counted loop like this:
: test
10 0 do
i . i oddness
loop
;
A do
loop like this takes two bounds on the stack, and sets up a loop. It is analogous to a for
loop in C. The i
word is an interesting one – the counted loop keeps its state implicitly, but the counter can be accessed with this word. It’s not a variable, per se, but rather an accessor for counted loop state. There are several of them, for accessing the counters from nested loops:
: silly-loops
3 0 do ( counter will be k )
3 0 do ( counter will be j )
3 0 do ( counter will be i )
i . j . k .
loop
cr
loop
loop
;
The i
, j
, and k
accessors put the counter from the current loop, its parent loop, or its grandparent loop on the stack.
There are other kinds of loops too. We could implement our oddness tester like this as well:
: test
10 begin
dup while
1 - dup . dup oddness
repeat
drop
;
I’ve introduced the dup
word here, which duplicates the item on top of the stack. For uncounted loops, we have to think about the state of the loop because we’re responsible for it (instead of letting do
and loop
do that heavy lifting for us). But there are many times where your conditions for exiting a loop may be complex, so this gives you the flexibility to do uncounted loops. Sometimes loops can be very elegant in Forth because you can put arbitrary code before and after the while
test.
You can some interesting dynamic things with Forth. Let’s take our oddness
function as an example. It consumes a value, and does something with it. Let’s write a combinator that takes a function as an argument on the stack.
: times 0 do i over execute loop ;
' oddness 10 times
I’ve introduced a few words here. One is the '
word, which reads a word from the input and puts its function pointer on the stack. (Note that this word only works in the interpreter… if you need to quote a word while compiling, use the [']
word instead!) The word over
makes a copy of the second item on the stack. E.g., if the stack contained 1 2
, it would make it 1 2 1
. Note also that the upper boundary for the loop is received on the stack too, making this a very flexible combinator. We could define more words that behave like oddness
, and our program becomes more expressive.
' oddness 10 times
' . 10 times
' { 2 * . } 10 times
Ooo, I threw something interesting in there. I used the curly braces {
and }
to define an anonymous function. You can’t define these while compiling a word – this feature is intended for use in the interpreter. See, until now I haven’t explained something about the compiler, though I’ve hinted a little bit. Some words only work in the interpreter, and others only while compiling.
This is because the compiler makes a subtle distinction. Some words are “normal” words, and when encountered by the compiler they just get compiled into a word. But others are what are called “immediate” words. Instead of compiling them, the compiler executes them directly. This gives Forth the ability to extend the behavior of the compiler. You can create new syntax for the language.
But it also means that if your code relies on things that only work in the compiler, then the interpreter becomes less powerful. That’s where {
comes in. It compiles a temporary function, and can be used to easily run code at the console that you don’t want to compile and name in the dictionary. There’s also an alternative to }
– if you end your anonymous word with }!
it will not only finish compiling it, but also execute it right away.
{ 10 0 do banner loop cr }!
Loops have to be compiled, and so they don’t work on their own from the outer interpreter. But using this anonymous word, we can dust off a quick loop and Evil#Forth is none the wiser. This can come in real handy when you’re working interactively.
As a quick aside – you might wonder why there is this complexity. Some Forths are what are called “stateful”. These Forths allow you to write words that detect whether the system is currently compiling or interpreting. Sometimes this simplifies things, but even in stateful Forths there are usually lots of words that don’t check the state, and it can be confusing sometimes.
Evil#Forth does not have a detectable state like this, and words are exclusively immediate or not. This simplifies a few things. But it does mean that there are some words that come in pairs so you can use their functionality in either context. You’ve already seen '
and [']
as an example of this.
And now is a good time to reveal that most of the Evil#Forth language is not built in to the compiler at all. When the compiler bootstraps, it doesn’t even know how to compile if
s or loops – these are actually immediate word definitions that extend the compiler, delivered after connecting to the server. You can dig into the detailed behavior of these words by looking at the source code in api/core.fth
.
So far, we’ve only created executable words in the dictionary. But the dictionary is so much more than that. You can use it for storing data as well. Here’s how people do global variables in Forth:
variable color
variable temp
: switch color @ temp @ color ! temp ! ;
: hello ." Hello!\n" ;
: .red red hello clear ;
: .blue blue hello clear ;
: .color color @ execute switch ;
' .red color !
' .blue temp !
{ 10 0 do .color loop }!
New words introduced here are @
and !
, which read a value from memory and write a value to memory. To read a value, you put its address on the stack, and call @
. For setting a value, you first put the new value on the stack, then the address to write it, and call !
. Thus, writing 0 to a variable would be the phrase 0 <name> !
.
OK, I’m trying not to be too boring with these examples, but there is a bit going on here. We define two variables: color
and temp
. A variable is compiled into the dictionary and its behavior is to put its address on the stack. Variables are one QWORD in size (8 bytes, or 64 bits). Every time the .color
word is executed, it looks up the current value of color
, executes it, and then swaps it with the value in temp
. You should see 10 alternating blue and red “Hello!“s.
In most programming languages, global variables are considered bad, but that’s not so much the case in Forth. Some of this is due to the stack-based nature of the language. It can get onerous sometimes to keep track of too many things on the stack – throwing in a variable here and there can help ease the burden. For a really interesting time, one of the samples included with EvilVM adds a syntax to the compiler to support local variables – I won’t cover it here, but it’s worth a read if you’re interested in avoiding maximum discomfort with stack manipulation!
There’s another kind of variable – a value
. These are like constants in other languages. It’s not generally expected that you’ll change them. Using these can make code a lot cleaner because you don’t have to sprinkle @
and !
everywhere:
$40 value PAGE_EXECUTE_READWRITE
: test ." If you want RWX memory, use " PAGE_EXECUTE_READWRITE . cr ;
The value PAGE_EXECUTE_READWRITE
word becomes a stand-in for its value. Using constants can make your code much more readable. It is, in fact, possible to update the value of a constant, though it kind of defeats the purpose of distinguishing between variable
and value
. You can update a value like this (Note that here we have a distinction between compile- and interpret-time behavior):
10 maximum
20 to maximum
: test 30 [to] maximum ;
But what if you want to store lots of data? Maybe a big buffer? Let’s make a table of squares in the next example and see how we might make an array in Forth:
create squares 256 cells allot
{ 256 0 do i dup * squares i cells + ! loop }!
: squared cells squares + @ . ;
55 squared
99 squared
16 squared
Here I’ve introduced create
. This is a very powerful word, though it may not yet seem obvious. It creates a named entry in the dictionary, and that name’s behavior is just to put the next available spot in the dictionary on the stack. Once we define squares
, the phrase 256 cells allot
will make space for 256 64-bit values (a “cell”) in the dictionary immediately afterwards. So, we can use this space like an array. If you look at the definition of squared
you’ll see that it takes an index, calculates how many bytes in to the squares
array it should be, and reads a number.
This isn’t the most convenient way to do an array. There’s another trick I have to introduce, and you’ll see a hint of what joy the create
word can bring. Let’s make another version of the example above:
create squares 256 cells allot does> swap cells + ;
{ 256 0 do i dup * i squares ! loop }!
: squared squares @ . ;
9 squared
21 squared
12 squared
See how clean that is? It all comes down to the magic of does>
. This word updates the behavior of the last create
d word, setting it to whatever gets compiled until ;
is reached. The trick, though, is that this code, when run, will receive the address right after squares
. Using this kind of definition, you can do some magical things, as you can combine local data with behavior, associated to a name in the dictionary. Check some of the samples provided with EvilVM to see some of this in action.
Finally, let’s do something with Evil#Forth’s FFI. It’s not hard to run Win32 API calls by importing them, wrapping them with Forth words, and just calling them like any other definition. Let’s run the ADVAPI32.DLL function GetUserName
. We’ll import the DLL and wrap the function like this:
loadlib advapi32.dll
value advapi32
advapi32 2 dllfun GetUserName GetUserNameA
create name 256 allot
variable namelen
256 namelen !
: username name namelen GetUserName if name .cstring cr else .err then ;
username
There are a few things in here to wrap your head around. loadlib
is a special word that will read its parameter as a string immediately after it’s invoked. As such, nothing else can appear on that line but the name of the DLL. It puts the HANDLE for that library on the stack, and we assign it to a constant value
. We then use the dllfun
word to create a Forth word that wraps the GetUserNameA
function in that DLL, and the Forth word is named GetUserName
. The 2
indicates how many arguments that function needs to pull off the stack when it runs.
The next definitions are dependent on Microsoft’s design for this function. We need a character buffer of at least 256 bytes, and a variable that stores its length. And the result of the function call is a BOOL, which happens to be compatible with truth tests in Forth, so we can just test for error. I’ve also introduced .cstring
, which prints out a NULL-terminated C-style string, and .err
which will print information about a Win32 error when encountered.
There’s plenty more – for a list of all the words you get in the core API, check out its documentation here.
More Forth Reading
Evil#Forth is not ANSI standard, and does differ substantially from other Forths. But much of its core functionality is very similar. You can find some excellent resources for learning Forth programming elsewhere on the Internet. I’d recommend these as starting points:
- Starting Forth the quintessential intro to Forth
- Thinking Forth a more advanced book on Forth
- Forth Dimensions Forth magazine from the 80s-90s