Implementation/Bootstrap Compiler

From Pit
Jump to: navigation, search

When inventing a self-hosting language, you inevidably run into this problem: How do I compile the compiler written in Pit for the first time, when there isn't an executable version of the Pit compiler yet? The bootstrap compiler is a dirty hack to compile the first "real" version of the Pit compiler. I chose to write the bootstrap compiler in Perl, because Perl is an order of magnitude nicer than C for handling data structures such as symbol tables. Nothing matters except getting the job done, because as soon as the job is done, I'm going to write a new compiler in Pit, and throw away all the work I just did on the bootstrap compiler. After that, I'll never look back. Even when compiling for a new architecture down the road, we'll cross-compile using the real Pit compiler. While the bootstrap compiler is a discouraging project because it feels like a waste of time to write something that will be used just once, it is impossible to write the real compiler without it.

The bootstrap compiler is a dirty hack. The code is terrible, partially because I didn't really know what I was doing when I started, and partially because I put absolutely no effort into maintainability. Please don't judge my coding styles based on the bootstrap compiler. There are many kludges and bad approaches in this code that I'd never do in a normal project, since this will be garbage as soon as the real thing is written.

General approach

  • pc: This is a short program to dispatch the other programs.
    • pit-pasm: This translates Pit into my pseudo-assembly intermediate language. Pasm is architecture-independent and describes the program in a way that translates into native assembly relatively easily.
    • pasm-nasm: This translates pasm into nasm-style x86-32 assembly.
    • cc: The system's C compiler is used for linking. It does not actually compile any part of the Pit code. Rather, it's simply provides a fairly portable command line for linking modules with system libraries.

Linking to Libc

Linking to libc is a bit of a trick. XXX: Wikify text below. It was moved out of a source file.

// This is the part of our wrapper for libc.  Each wrapper has two parts,
// a C function and a Pit function.  The C function exports a function
// prototype that Pit can deal with, and the Pit function implements the
// visible system library interface.
// Be careful of ifferences between the function calling conventions of C and
// Pit.  Specifically:
// In C size_t must match the pointer size, but often int does not.  In Pit,
// int must match the pointer size, so there's no need for size_t.  Never
// never never use int in a function prototype called from Pit, because it
// won't work on 64-bit GCC systems.  Use size_t instead, even if you aren't
// really representing a memory size.
// C only supports input-parameters to functions, so where Pit use use inout or
// out, pass a pointer instead.
// We can't use return values, because Pit's calling convention differs from
// C on how they're returned.  Instead, pass a pointer to a variable to store
// the return value in.
// C cannot pass exceptions to Pit.  Return an error code and check for it in
// the Pit function.
// C does not allow dots in symbol names.  Pit uses dots to separate the
// namespaces a symbol is in from the symbol's casual name.  This means all
// C functions are, from Pit's point of view, in the root namespace.  They
// need to be prototyped with a leading dot, for example, "$.mem_malloc".
// Remember that Pit cannot understand the C preprocessor, so all macros must
// be resolved in the C function.  Define constants for macros that resolve to
// a simple number, and define functions for more complex macros.
// Some functions in libc are already simple enough they can be called
// directly.  free() is one of these.  However, beware of C functions that some
// systems may implement only as macros.  They must be wrapped in a C function.
// errno is a particularly common and evil example.  It has three problems.
// First, its type is int, and Pit has no way of knowing what size this C
// implementation thinks an int is (see above size_t discussion).  Second, on
// systems that support threading, errno must be defined as a preprocessor
// macro so that each thread can have its own copy.  Third, all the values
// assigned to errno are preprocessor macros, which Pit has no way of
// interpreting.  To implement this, we define a constant symbol in C for each
// errno value and assign the value of the preprocessor macro to it.  Whenever
// a libc function reports an error, we report the errno back through a pointer
// parameter. When there is no error, that parameter is assigned zero.