Can LLMs Model Real-World Systems in TLA+?

Introduction

Artificial intelligence is not just a buzzword; it is redefining how we approach modeling real-world systems. With the emergence of Large Language Models (LLMs), the possibility of modeling complex systems using TLA+ becomes an intriguing reality. TLA+ is a formal specification language widely used to describe and verify the behavior of concurrent and distributed systems. But the crucial question remains: can LLMs truly model real-world systems in TLA+ with accuracy and fidelity?

LLMs and TLA+

LLMs, such as GPT-4 and Claude, have been trained on vast corpora of data, including publicly available TLA+ examples. When asked to "write a Raft specification in TLA+", they can produce a document that appears correct at first glance. However, as revealed by a recent study conducted by the Specula team, what an LLM produces can often be a mere regurgitation of an existing specification, like the one from the Raft paper, rather than a specification tailored to the specifics of a system like Etcd.

SysMoBench: A Tool for Evaluating LLMs

To differentiate the ability of LLMs to generate original specifications rather than verbatim copies, the SysMoBench tool was developed. SysMoBench provides eleven systems to LLMs and automatically evaluates the TLA+ specifications they generate through several phases:

Syntax Phase: Checks if the specification compiles correctly.
Runtime Phase: Ensures TLC can execute the specification without errors.
Conformance Phase: Compares execution traces from the code with the model to verify consistency.
Invariant Phase: Checks if the specification satisfies key safety and liveness properties.

Challenges in Modeling with LLMs

One of the main challenges LLMs face is the logical abstraction from a complex implementation. For modeling the Etcd system, for instance, it is not enough to know the basic principles of Raft; one must understand how Etcd decomposes its atomic actions and evolves its state. This capacity for abstraction and synthesis is what differentiates a mere transcription from true formal modeling.

Towards More Accurate Modeling

Advancements in natural language processing are bringing us closer to more autonomous and accurate modeling of real-world systems. However, for LLMs to become reliable modeling tools, they must be capable of integrating new data, learning from existing systems, and generating models that account for the specifics of each implementation.

Conclusion

LLMs hold immense potential to transform the way we model complex systems. However, they still need to overcome certain hurdles to become reliable TLA+ modeling tools. With tools like SysMoBench, we can better understand their capabilities and address their shortcomings. Let's discuss your project in 15 minutes to explore how LLMs can be integrated into your workflow.