Simple Memory Manager using libssm
-As you may know, I’m working on a programming language called Serene. Serene has a runtime library that provides an interface to some crucial functionalities that any program might need at runtime.
One of such functionalities, is memory management. In languages like C that are notorious for manual memory management, the norm is to allocate memory when you require it, and free that allocation when you don’t need it any more. While it sounds easy, it gets out of control rapidly, causing all sort of issues. From security risks, to resource gobbling programs and so on.
It’s one of the reasons people tend to avoid C. Also, one of the reasons (among many others) languages like C++, and Rust exist today. To address this painful problem that C exposes the programmer to. But like many other issues in life, people might have different interpretation and perspectives on an issue. For example, if we step back and think about, the nature of the issue, we might be able to find a better solution that will fit our needs.
The main problem here is basic, memory is a finite resource. But the scale of this issue changed over time. In the past, the amount of memory available to process was different from today. I remember my first computer had a memory unit about 256Mb or so. However, my main computer today has 192Gb of memory. It’s still a finite resource, but the scale is different. That being said, I’m talking about the general case, we still have devices that operate on low power that has very limited memory by design.
Naturally, to save such a valuable resource, C programmers tend to be cautious about their usage. That’s why they carefully allocate memory and free it as soon as possible. But that process is quite challenging to handle, as mentioned above. Programmers had come up with different approaches to deal with this challenge. Garbage collectors, RAII style memory management, Memory pools, Arena allocators and so on. Each have their pros and cons, definitely, and their use cases too. Arena allocation, is my favourite and the one I chose to implement in Serene’s runtime. It’s simple, easy to understand, easy to implement, and fast relative to other solutions.
The core premise of arena allocation, is to allocate a big (relative to your program) block of memory called an Arena, and you allocate memory chunks in that Arena by bumping a simple pointer, which is pretty cheap to do. You will never free the memory by chunk. When you’re done, you will have to free the Arena entirely. In modern operating systems, you don’t even have to do that. OS’s are smart enough to handle these types of memory leaks.
Arena allocation shines specially in use cases where the program has distinct life cycle check points. For instance, imagine a web server that allocate an Arena per incoming request, and free the Arena when the response is generated.
Let’s not forget that, this approach has downsides too. It’s not perfect. But knowing about the limitations allows us to utilize it better. Here is a list of common issues or limits that apply to Arena allocation (with possible solutions):
- Memory fragmentation: It is a general memory management issue that applies to Arena allocation as well. But in this case, we have it localized per Arena.
- Complicated lifetime: Arena allocation is not suitable for use cases that the lifetimes of memory residence are not linear or not simple enough to untangle.
- Memory waste: A typical problem with Arena allocation is that we might allocate a 128kb Arena, for example, and only use 20kb of it. The rest of 128kb will be unavailable for allocation until we free the Arena. This used to be a major issue again this method in the past, but with memory scaling up in modern times this issue has lost its importance (again in general cases). Another way to reduce the impact of this problem is to choose the Arena size based on your usage by profiling and so on. But at the end of the day, it’s a risk that you should evaluate and see whether you are willing to take it.
- Concurrency: Is not an issue really but more of a concern. You should know how to handle Arena allocation in a concurrent setup. For example, you can allocate Arenas per thread, or protect the Arena with spin locks and other forms of synchronization. In Serene, I synchronize on chunk allocation and since Serene is an immutable language, even on low level, I don’t need to be worried about multiple threads changing chunks data on runtime in parallel.
There are other less important issues as well, but the golden rule to remember, is that, the more predictable use case you have the more suitable Arena allocation can be.
libsmm
While there are a few arena allocation libs out there already, and it’s pretty easy to write one, I implemented one for Serene’s runtime that is generic enough to work in any program. That’s why I ended up extracting it from Serene’s code base and used it in a few projects later. To make it easier for myself, I created a dedicated library called libsmm and extracted the code and moved it into that repo. I had to downgrade the code from C23 to C11 for compatibility reasons as well.
It’s a super simple library. You can just copy and paste the smm directory to your project and build it as part of your code, or build it separately and link it in.
Using Nix
If you’re a Nix user, then you can use libsmm’s flake file as an input to your flake file and call it a day. Just use the default package and you’re good to go. Here is an example:
{
description = "foo";
inputs = {
nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";
systems.url = "github:nix-systems/default";
flake-parts.url = "github:hercules-ci/flake-parts";
libsmm = {
url = "git+https://git.sr.ht/~lxsameer/libsmm";
inputs.nixpkgs.follows = "nixpkgs";
};
};
outputs = inputs:
inputs.flake-parts.lib.mkFlake { inherit inputs; } {
systems = import inputs.systems;
imports = [
inputs.treefmt-nix.flakeModule
];
perSystem = { config, pkgs, self', system, ... }:
let
libsmm = inputs.libsmm.outputs.packages.${system}.default;
in
{
# Same thing applies to derivations as well.
devShells.default = pkgs.mkShell {
nativeBuildInputs = (with pkgs;[
ninja
cmake
meson
pkg-config
llvmPackages_latest.clang-tools
]);
buildInputs = (with pkgs;[
libsmm
]);
};
};
};
}
Build it from source
You would need Meson and Ninja to build libsmm, as follows:
meson setup build
meson compile -C build
meson install -C build
Using libsmm in your project
libsmm exposes its required configuration via pkg-config, so as long as your build tool supports pkg-config you can easily integrate it into your pipeline. Here is an example using Meson:
smm_dep = dependency('libsmm', static: true)
foo = executable('foo',
sources: ['main.c', ...],
include_directories: include_directories('./src/'),
dependencies: [smm_dep],
)
Implementation overview and the API
The main data structure in libsmm is ssm_t that acts as the sole memory manager for your program. It owns the memory blocks and the synchronization means and controls access to memory.
libsmm operates in block level. The default block size of libsmm is 128Kib and can be customized via the DESIRED_BLOCK_SIZE definition. By default, libsmm will allocation just one block of memory at initialization time that is called immortal block, and is dedicated to allocation of memory for immortal value that live forever as far as ssm_t concerns.
libsmm blocks are really chains of blocks, if a block runs out of space, libsmm will automatically allocate a new block and attach it into the same chain as the previous block and keep allocating in the new block.
In order to start allocating memory, you need to allocate a block first. A block is the libsmm’s term for an Arena. Except for the immortal block, libsmm assigns an ID to all the root blocks. And whenever you need to allocate memory, you must pass the block id around to let libsmm know which block to use.
One final concept that we have to talk about is the provider. By default, libsmm uses the standard allocator to allocate and free blocks. Users can change this behaviour and initialization type by passing an instance of the smm_memory_provider_t data structure. This structure contains two fields that are function pointers to allocate and release operations of your memory provider, if you are using anything beside malloc and free(likes of jemalloc and mimalloc). Here is an example how to create a new provider:
#include <smm/smm.h>
static void *foo_allocator(size_t size, size_t alignment) {
// This is an example do NOT just copy and paste it
return aligned_alloc(alignment, size);
#endif
}
static void stdlib_releaser(void *ptr) { free(ptr); }
static smm_memory_provider_t foo_provider = {
.allocate = &foo_allocator,
.release = &foo_releaser,
};
Initialization
The first thing you need to do is to initialize your ssm_t instance. Here is a simple example that you can copy and paste most of the time:
#include <smm/smm.h>
// passing NULL as the provider will force libsmm to use the default provider
smm_t *mm = smm_init(NULL);
// Or you can pass a provider
smm_t *mm = smm_init(&foo_provider);
You need to initialize libsmm as early as possible, I always do it in my main.
Moreover, you have to shut it down when terminating your program gracefully to free up resources.
Now, to start allocating ephemeral memory, first you need to allocate a block chain, by just allocating the root block via the smm_allocate_block function like:
// mm is of type `smm_t *`
smm_block_id_t b = smm_allocate_block(mm);
You will need that block ID until you want it’s time to free the entire chain via the smm_release_block function.
With the block ID at hand, you can allocate memory via the smm_allocate_in_block_aligned function or the smm_allocate_in_block macro. Here is an example:
// mm is of type `smm_t *`
smm_block_id_t bid = smm_allocate_block(mm);
// `alignof` will be available though `smm.h` if you are not on C23 yet.
foo_t *ptr = (foo_t *)smm_allocate_in_block_aligned(mm, bid, sizeof(foo_t), alignof(foo_t));
// or via macro
foo_t *ptr = smm_allocate_in_block(mm, bid, foo_t);
// And the immortal variants which will allocate memory in the immortal section and do not need block id
foo_t *ptr = (foo_t *)smm_immortal_allocate_aligned(mm, sizeof(foo_t), alignof(foo_t));
// via macro
foo_t *ptr = smm_immortal_allocate(mm, foo_t);
As you can see it’s pretty easy and straight forward to use libsmm. Just bear in mind that the minimum C standard requirement for libsmm is C11.
For more information on the design and the internal have a look at the smm.h file, which is heavily annotated.
Conclusion
Arena allocation trades flexibility for clarity and speed. When lifetimes are well-understood, it’s one of the simplest and fastest memory management strategies available and libsmm is a small, easy to use and simple implementation that can be useful for such use cases.