Serene on the LLVM

#Languages | #Serene - 2021-04-13

As you may know, I'm trying to build my new programming language, after a ton of study and many experiments, I finally made the decision on what platform I'll target for Serene. Here are the history and the rational behind this decision.

A little bit of history

After the initial effort on choosing the right platform. I studied a bit about the GraalVM and experiment with it. While it's a nice tool and I see a bright future for it, I wasn't happy with some aspects of it. The most important one being the fact that Oracle is behind it (Why? Well, don't open that door :D) and some other technical reasons which I get to them later. So I looked around again and re-evaluated my choices. I came across the LLVM. Previously I didn't pay much attention to the LLVM because I was blinded by the GraalVM and the fact the both work the same theoretically. I mean using both, we need to create the compiler frontend and they would take care of the backend for us (more or less). Initially, one of the reasons why I've picked the GraalVM over LLVM was due to its support for the LLVM itself, and it seemed obvious that later on we can bridge the LLVM world to Serene's world via GraalVM. But It was quite the opposite.

This time, I looked into the LLVM more thoroughly and boy I was (still am) Impressed, well designed tools and libraries to build a compiler. In compare to the GraalVM it is very mature, well documented and quite modular. Aaand using the LLVM I still can use GraalVM via its support for LLVM IR. Long story short the more I've read about LLVM the more I got obsessed with it. So I've decide to move away from GraalVM and start playing with LLVM.

The challenge of the language again

With moving away from the GraalVM, I had to choose a host language again. While the official language of the *LLVM is C++ I tried to avoid it, since I'm not skilled enough in C++, So after a series of experiments (which all of them are available in dedicated branches on the repo) I tried, Rust, C, C++ (First attempt) and Golang. I wrote the parser and an interpreter as an experiment and also to evaluate the facilities of the language when it comes to working with the LLVM API. After many iterations, I ended up using Golang to create an interpreter with a FFI interface so we can write the compiler in Serene itself.

At the same time I started a journey into mathematics to learn more about the different type systems in theory and different options that we might have for Serene (I'll write about that separately in the future). Most of my day went to my studies and I felt really good. But I always had a voice in my head that kept bugging me about MLIR. I kinda watched a few introductory talks on it before and I had a rough idea about what it is and what it does. In order to shut that voice up, I've decided to look it up and read more about it, while I'm blocked by my math study and to my surprise, it totally blew me away. MLIR is such a brilliant tool, made out of the experience gained in making several languages and compilers, and follows some conventional and well designed principles to build intermediate representation languages.

After I read more and more about the MLIR which by the way it's a sub project of the LLVM, I still firmly believed that using Golang with should create an interpreter as a bootstrap language an then provide a FFI interface via the interpreter to use MLIR's C API to interact with it. How naive I was.

During the course of my study on MLIR, I came across a beautiful thing called TableGen. It's part of the LLVM and designed to generate C++ based on some description in general. It's a generic tool which developers write backends for, in order to generate code for specific purposes and in the case of MLIR to generate IR dialects. The way MLIR utilizes the TableGen to generate dialects and a majority of the operations and types is truly amazing. It makes the cumbersome task of making a multi-layer IR quite straightforward. MLIR singlehandedly changed my mind about the approach I want to take to build the compiler. All of a sudden C++ seemed like a reliable option. So I've decided to give it a go. I revived the old C++ branch, forked into a branch called mlir and started to work with it a bit. Made a prototype and enhanced it. After a lot of consideration I finally decided to merge the mlir branch into the master and move the Golang implementation into its own branch golang-impl.

I'm cleaning up the C++ implementation at the moment and I'll be adding a semantic analysis phase to the compiler and I'll be aiming for a minimal lambda calculus implementation to wire up everything in their most minimal state as the foundation and build upon it.

Also I'll write another essay dedicated to the technical aspects of why LLVM and MLIR are great for our use cases in more detail.