FAQ
OK, I’ll be honest, I don’t receive questions frequently. So this document is really to answer the sorts of questions I’d expect ought to be frequent. It’s really a slush space for notes on odd cases, or describing EvilVM’s error messages (which are often terse to save space / complexity).
Compiler Errors
How do I understand exception errors?
EvilVM uses the Windows structured exception handler (SEH) to catch exceptions and recover. But, since it’s not possible to properly instrument SEH when the code is compiled dynamically, exception handling is somewhat simplistic. When an exception happens, you’ll see something like this:
Input State:
Line Number: 83
Last word: SCREENSHOT
Context:
Exception RIP: 4025a2
In word: UNKNOWN
Last call: UNKNOWN
Exception RSP: 60fcd0
Context PTR: 60f5c0
Exception Record:
Exception Code: 80000003
Exception Flags: 0
The “Input State” gives you some clue about where either the interpreter or the compiler were when the exception happened. The line number is reset to 1 every time the server sends code to the compiler, so if you just sent a file, this number should correspond to where in the input file the error occurred. The “last word” field tells you the last word that was consumed and parsed by either the interpreter or the compiler. A very common error is to mistype a name in a source file, so these two together can help you quickly determine what was wrong if that happens.
In the “context” area, EvilVM tells you where rip
was at the time of the exception. If this falls within defined dictionary word’s extent, you should see content in the “In word” field. The “Last call” field comes from the value of rsp
at the point of the exception. This may not always be valid (if you’re using the >r
, etc., words), but if it is in a dictionary word, it’ll be there too. Another possibility is that the error occurred in ASM code in the original shellcode (e.g., the transport, outer interpreter, etc.) – in this case, this field will show “shellcode”. If your exception comes from calls into external DLLs, or the values are otherwise not intelligible, they will read “UNKNOWN”, as there’s not much EvilVM can do here.
When an exception occurs, the entire CONTEXT struct is retained, and can be accessed at the indicated address. Refer to Microsoft’s docs for info about what’s in there.
There is also an exception code (c0000005
above). These are NTSTATUS codes, which usually have some useful meaning. I don’t want to include a full dictionary of status codes and error messages, as that will become quite large. But you can look these up in various places, such as here on Microsoft’s site. These will often provide extra flavor, though sometimes they won’t appear on Microsoft’s list because some libraries have their own codes. Check docs if you’re off the beaten path.
Note that when EvilVM goes to throw an exception on purpose, it will usually do so with an int3
instruction (hex value cc
), which will show up as an exception code 0x80000003
. That also means it’ll function as a breakpoint in a debugger, so that can sometimes be a handy way to figure out what’s going on.
I see Fixing 'here'; dictionary may be inconsistent now!
in an error, what’s that mean?
This one was an annoying bug that came up every so often while I was fleshing out some of the compiler’s core features, especially when defining data structures in the dictionary. It can happen that code somehow corrupts the here
pointer, which is pivotal to correctly processing the dictionary. Here’s a quick example of corrupting the here pointer:
\ constant has 1 f
1024 value BUFSIZE
\ but is used with 2 f's
create buffer BUFFSIZE allot
Note that there is a typo, so when BUFFSIZE
is run by the interpreter, it can’t be found, and no proper size is put on the stack. So when allot
goes to make space, it will take whatever is on the stack, which might be a really bad value sometimes. The result can be an invalid here
pointer, and thus anything that ever touches it (including the exception pretty printer, annoyingly, due to its dependence on pad
for printing numbers) will generate exceptions.
This is avoided by adding guards to the allot
word, so at least the basic case can’t happen. But other situations can corrupt the here
pointer, so there’s a failsafe. In the .exception
pretty printer, it will do a sanity check to ensure that here
falls somewhere within the allocated dictionary space. If not, it will estimate a “valid” pointer by going to the last
defined word, and using its length field to derive a new, safe value.
This can still produce an unstable condition, though, because not all defined words have a sensible length field. The length field is actually just the length of the behavior, and so if you have used create
to make spaces in the dictionary in concert with allot
, then you could have some inconsistent data in your future.
Nevertheless, the system tries to do its best to stay interactive. If you encounter this error, and are concerned about stability, you might consider issuing a forget
to rewind to the last mark
point, and reloading code from there.
Compiler Conventions
How is the Forth environment state mapped to registers / global variables?
This will be of interest to you if you’re trying to debug broken code, if you’re writing syntax extensions in immediate words, or trying to make sense of exceptions or code disassembly. The following table highlights the important parts of the current runtime state in the EvilVM environment:
Object | Meaning | Notes |
---|---|---|
rdi | Top item on data stack | |
r12 | Pointer to second item on stack | |
r15 | Pointer to base of global variable table | |
here | Pointer to next available byte in dictionary | |
last | Most recently defined word in dictionary | set at ; |
this | Current word in dictionary | set at : or create |
dict | Pointer to base of dictionary | |
entrypoint | Start address of EvilVM shellcode | spawn a thread here for fun |
base | current numerical base for numbers | reading and printing |
The global variables listed above are accessed at offsets into the global variable table (pointed to by r15
). You can find these offsets (and lots of other interesting global variables) in the file at agent/table.asm
. IO layers or other optional components may add to this table, and may or may not be consistent from one assembly to another. All global variables are of QWORD
size. You can inspect them at runtime as follows:
0 glob 512 dump
clamp to 512
60fce0 00 00 00 00 00 00 00 00 00 00 75 49 fa 7f 00 00
60fcf0 2c 1b 7e 49 fa 7f 00 00 b8 01 7e 49 fa 7f 00 00
60fd00 a0 34 7e 49 fa 7f 00 00 60 9d 76 49 fa 7f 00 00
60fd10 00 ff 02 00 00 00 00 00 e9 1b 40 00 00 00 00 00
60fd20 00 00 00 00 00 00 00 00 89 2a 40 00 00 00 00 00
60fd30 ab 25 40 00 00 00 00 00 7b 32 80 00 00 00 00 00
. . .
All other registers are fair game for custom assembly. Bear in mind that this also means that they’re all volatile, and should generally be considered caller-saved registers. The core API uses rax
, rsi
, rcx
, and rbx
quite frequently. Either save in the caller for maximum safety, or conduct thorough testing before relying on any special use of CPU registers.