Spine - Compiled stack-based language implemented in C then Rust

2024-03-06

Compiler of a stack-based simple language down to ELF executable with x86_64 machine code generated by handwritten code (no assembly), which works! First started in C, then a rewrite in Rust was started with better abstractions.

GitHub repository of the C version GitHub repository of the Rust attempt

Features

  • Generates working ELF executables with generated x86_64 machine code.
    • ELF headers generator.
    • Machine code generator for some x86_64 instructions, no assembly.
    • Data segment, code segment, labels with offsets corrected during final ELF generation.
  • Simple stack-based language, C version only for now:
    • One-character instructions with no whitespace separation required, for golfing purposes.
    • Control flow with ifs, loops, callbacks and named functions.
    • Classic stack manipulations instructions.
    • Support of two local valriables (only two for declaration with less characters for golfing) which can be “runtime-shadowed” or not in each function or callback.
    • Separated the call stack from the data stack (even though x86_64 only have one stack for both).
    • Pointer dereferecing read and write operations.
    • Syscall instruction, which provides lots of freedom.

Gallery

Here is a Spine program (C implementation) that prints argc, all of argv and then all of envp (the environment variables), all of which is on the stack at the start of program execution on Linux. N is a function that prints an integer in base 10, S prints a Spine string (pointer and length, which are what is pushed by Spine string literals), and Z print null-terminated string (or C strings, which are how argv and envp are provided).

Image of a Spine program printing argc, argv and envp.

Here are some Spine named functions. N is different from the version above as it uses the local variables (h and v) to have a shorter code. S uses the fact that the write syscall takes the length of a string to do only one syscall. M allocates memory from the OS via the mmap syscall (one of the intended uses of this syscall).

Image of some Spine named functions.

Why?

For fun!

This took some learning about the x86_64 architecture, the x86_64 machine code representation of assembly instructions (which is known to be quite complex in some areas), the 64-bit ELF executable format with its many fields and headers, how to work with machine code (not even assembly) and debug such matters.