Restartable Sequences: The Secret of System Programming

Introduction

At the forefront of system programming, an overlooked concept is making waves: restartable sequences (rseq). Introduced with Linux 4.18 in 2018, these sequences allow for the creation of thread-safe data structures without the need for locks or atomic operations. While most developers are still unaware of their potential, those working with processors boasting numerous cores are already reaping the benefits.

Why Restartable Sequences?

Modern processors, like the Ampere Altra with its 128 cores or the AMD Threadripper Pro 7995WX with 96 cores, offer impressive processing power. However, harnessing this power requires advanced techniques for thread management. Restartable sequences solve this problem by allowing significant performance optimizations.

Consider a $160 Raspberry Pi 5 with just 4 cores: restartable sequences can speed up the implementation of malloc() by 3 times compared to using dlmalloc assigned to each thread. On a more powerful machine, like the $4,834 System76 Thelio Astra with the Altra CPU, this improvement jumps to 34 times. On the Threadripper Pro, we're talking about a 43 times improvement. These figures illustrate how restartable sequences can transform your system's performance.

How Do Restartable Sequences Work?

When the Cosmopolitan C runtime creates a thread on a Linux system, it makes an rseq() system call that allocates 32 bytes of TLS (Thread-Local Storage) memory. The kernel then updates this memory with the CPU number each time the thread is rescheduled. This simplifies and accelerates operations like sched_getcpu(), which only needs a simple mov instruction to fetch this information.

This approach is particularly useful in multi-core environments where lock latency and wait times can become bottlenecks. By avoiding locks, restartable sequences allow for smoother and more responsive operations.

Real-world Use Cases

Take the example of optimizing matrix multiplication. Thanks to restartable sequences, it is possible to efficiently parallelize calculations without the usual overheads of synchronization. Consequently, projects in the AI domain have been able to achieve substantial performance gains.

Another use case is memory management in libraries like tcmalloc, jemalloc, and glibc, which have already integrated these sequences to enhance allocation speed.

The Future of Restartable Sequences

Although their implementation currently requires custom assembly code, it is likely that restartable sequences will soon be natively supported by all operating systems and integrated into major system programming languages. This would pave the way for a rewrite of data structure libraries to take advantage of these optimizations.

Conclusion

Restartable sequences are a promising innovation that can transform how we design multi-core systems. For developers ready to invest in high-performance hardware, they present a unique opportunity to boost application performance.

Let's discuss your project in 15 minutes.