Spine - Compiled stack-based language implemented in C then Rust
2024-03-06
Compiler of a stack-based simple language down to ELF executable with x86_64 machine code generated by handwritten code (no assembly), which works! First started in C, then a rewrite in Rust was started with better abstractions.
GitHub repository of the C version GitHub repository of the Rust attempt
Features
- Generates working ELF executables with generated x86_64 machine code.
- ELF headers generator.
- Machine code generator for some x86_64 instructions, no assembly.
- Data segment, code segment, labels with offsets corrected during final ELF generation.
- Simple stack-based language, C version only for now:
- One-character instructions with no whitespace separation required, for golfing purposes.
- Control flow with ifs, loops, callbacks and named functions.
- Classic stack manipulations instructions.
- Support of two local valriables (only two for declaration with less characters for golfing) which can be “runtime-shadowed” or not in each function or callback.
- Separated the call stack from the data stack (even though x86_64 only have one stack for both).
- Pointer dereferecing read and write operations.
- Syscall instruction, which provides lots of freedom.
Gallery
Here is a Spine program (C implementation) that prints argc
, all of argv
and then all of envp
(the environment variables), all of which is on the stack at the start of program execution on Linux. N
is a function that prints an integer in base 10, S
prints a Spine string (pointer and length, which are what is pushed by Spine string literals), and Z
print null-terminated string (or C strings, which are how argv
and envp
are provided).
Here are some Spine named functions. N
is different from the version above as it uses the local variables (h
and v
) to have a shorter code. S
uses the fact that the write syscall takes the length of a string to do only one syscall. M
allocates memory from the OS via the mmap
syscall (one of the intended uses of this syscall).
Why?
For fun!
This took some learning about the x86_64 architecture, the x86_64 machine code representation of assembly instructions (which is known to be quite complex in some areas), the 64-bit ELF executable format with its many fields and headers, how to work with machine code (not even assembly) and debug such matters.