For today's deep-dive into the dark side of the C programming language, we will be exploiting C's gloriously unsafe memory model to implement an immutable string type which offers shared reference lifetime semantics while masquerading as a standard C string, retaining compatibility with builtin <string.h> routines.
On the left is the API for my GblStringRef type, which is part of my libGimbal framework that supports the Sega Dreamcast, Nintendo Gamecube, Sony PSP and PSVita, Windows, MacOS, Linux, iOS, Android, and WebAssembly (and god knows what else). On the right, you can see a little test program that illustrates its usage and behavior.
This type of reference-counted string is extremely useful for when you have multiple potential owners of the same string with a lifetime that isn't even necessarily deterministic, because it allows you to share the same heap-allocated dynamic string among many owners and only release the memory back to the heap once ALL owners have released their references to the shared resource.
Big C-based frameworks such as GTk's GLib offer such strings, and it can be considered loosely equivalent to a std::shared_ptr<const std::string> in C++ or an Arc<String> from Rust... So lets go ahead and steal this shit for ourselves!
How is it implemented? Well, rather than simply calling malloc() with the length of the string to be allocated from the heap (+1 for the NULL terminator), we over-allocate the requested memory chunk to include an additional "header" above the starting address of the string, where we will be storing extra information.
Within the header, I personally store an atomic_size_t that serves as an atomic reference counter for thread-safety, along with the length of the string, so we can have it cached and do not have to rely on strlen() to figure it out dynamically.
Now, the key is to return the freshly allocated memory segment from the GblStringRef_create() routine to the user at offset where the string itself is stored, not the beginning of the allocated memory segment containing our header, which has now been hidden above it.
This means that, from the user's perspective, they can treat the pointer as though it's a regular const char[] array, giving them something that is ergonomic and is compatible with the regular C string routines found within the standard library.
Then, when the user calls any other API routines on the string, passing us back the starting address of the character array, we simply subtract the size of the metadata header struct where we stashed our reference counter and string length, to work with the entire allocated segment, behind-the-scenes!
When the user calls GblStringRef_ref() on an existing GblStringRef*, we increment its atomic reference counter and return the same pointer back to them. Then, each time they call GblStringRef_unref(), we decrement the atomic reference counter, until it hits zero, at which point the user has released the final reference, and we should free the underlying heap allocation.
That's it for today's deep-dive into the dark-arts. Stay tuned for more memory-unsafe, unholy abominations of the C programming language!
github.com/gyrovorbis/libgim…