Specl progress: symbolic checking, 1.7x faster than TLC, more example specs, and a reference manual - Daniel Tisdall

A week ago I announced Specl — a specification language and model checker for concurrent and distributed systems. There has been a lot of progress since then. This post covers what changed, how Specl compares to existing tools, what's still missing, and what's coming next.

Specl now has a full documentation book at specl.danieltisdall.com — covering installation, tutorials, language reference, model checker reference, worked examples, and architecture details.

TLDR; added symbolic modes, huge performance gains on BFS, extended documentation and toolchain usability. A design principle throughout: the toolchain auto-detects and applies the best checking strategies by default — symbolic cascading, partial order reduction, symmetry reduction — with a full set of super-user flags to override every decision.

Symbolic model checking

The biggest addition: a Z3-backed symbolic verification backend. This enables checking specs with parameters too large for explicit-state BFS to enumerate.

Specl's symbolic engine supports three techniques:

Bounded model checking (BMC): unroll the transition relation for k steps and check invariants at each depth. Good for finding bugs in bounded traces.
k-induction: prove that the invariant holds for k base steps and that the inductive step generalizes — if the invariant holds for k consecutive states, it holds for the next. When both pass, the invariant holds for all reachable states. Works when the invariant is k-inductive (often k=2 suffices).
IC3/PDR via CHC solving: use Z3's Spacer engine to compute inductive invariants automatically. No depth bound required.

A portfolio mode (--portfolio) runs BMC, k-induction, and IC3 in parallel and takes the first result. By default, --symbolic runs a smart cascade: tries induction first (cheapest), falls back to k-induction with CTI (counterexample-to-induction) learning, then IC3, then BMC. For most specs, the cascade finds the right technique in milliseconds. Each technique is also available individually — --inductive, --k-induction K, --bmc, --ic3 — for when you know what you want. The --golem flag swaps Z3's Spacer engine for the external Golem CHC solver, which sometimes succeeds where Z3 doesn't. --spacer-profile tunes Z3's aggressiveness (fast, default, thorough).

Concrete example: Specl can verify Paxos with N=10 replicas, MaxBallot=10, and 3 values via k-induction in ~90 seconds — parameters where BFS would need to explore trillions of states.

Performance optimizations

The explicit-state model checker received a major performance overhaul. The cumulative effect on a Raft benchmark (2M states, --fast --bfs):

	Before	After	Speedup
Single-threaded	9.74s	5.40s	1.80x
Parallel (16 threads)	2.48s	2.00s	1.24x

Some of the techniques behind this:

NaN-boxed values. Runtime values are now 8 bytes each. Integers and booleans are stored inline in the NaN payload of an f64 — no heap allocation, no pointer chasing. Collections (dicts, sets, sequences) use the remaining NaN space for a tagged pointer. This eliminated a major source of allocation and indirection.

Superinstructions. The bytecode VM's peephole optimizer fuses common multi-opcode patterns into single instructions. For example, VarParamDictGetIntEq fuses "load variable, load parameter, dict lookup, load integer, compare" into one operation. The optimizer profiles execution during a warmup phase and re-optimizes hot paths. There are currently ~15 superinstructions covering dict lookups, guard patterns, and collection operations.

LTSmin-style tree compression. State storage uses a tree-structured table where each variable value is stored in a shared tree node. States that differ in only a few variables share most of their storage. This reduces memory usage and improves cache locality for the visited-state set.

AtomicFPSet with triangular probing. The --fast mode's fingerprint set uses a lock-free open-addressing hash table with triangular probing (probe distance grows quadratically, better clustering than linear probing) and prefetch hints. Dynamic resizing prevents stalls when the table fills.

Other optimizations. mimalloc as the global allocator. Profile-guided optimization (PGO) with diverse training workloads. Incremental fingerprint hashing (only re-hash variables that changed). Reusable VM buffers to eliminate per-evaluation allocations. Action sorting by guard cost for faster rejection.

Comparison to existing tools

Explicit-state: Specl vs TLC on Raft

I observed Specl to be faster among a wide range of checking compared to TLC. A good showcase is a Raft spec adapted from Vanlightly: raft.specl, raft.tla

The state space match confirms the specs are semantically equivalent:

	Specl	TLC
Distinct states	166,400,302	166,400,302
Wall time	7m 04s	11m 56s
Throughput	392K states/s	233K states/s
Threads	16	16

Specl is 1.7x faster wall-clock. Both tools used 16 threads on 16 cores. Both used BFS. Neither used symmetry reduction.

Methodology notes. Specl's --fast mode stores only 64-bit fingerprints — it trades trace reconstruction for throughput. TLC's default mode uses 64-bit fingerprints for deduplication but also serializes full states to its DiskStateQueue for BFS traversal, enabling trace reconstruction on violations. TLC is doing strictly more I/O work per state, which accounts for some of the performance difference. Both tools are running in their recommended configurations for large state spaces. If a counterexample is found, Specl should be run without --fast to see details of the trace.

A larger example catalogue

Many many examples have been added to the repo.

A few highlights:

The EPaxos spec toggles between the buggy and fixed recovery logic — BFS finds Pierre Sutra's published bug automatically.
The Redlock spec automatically finds Martin Kleppmann's published attack on the Redlock distributed locking algorithm.
The Raft spec (386 lines) is a production-grade model verified against the same state space as its TLA+ counterpart.

Toolchain additions

Beyond the checker, the CLI now has 11 commands. The most notable: specl translate converts existing TLA+ specs to Specl. specl watch re-checks on every save. specl perf-guide analyzes your spec and suggests performance optimizations. specl simulate generates random execution traces for quick exploration. specl test batch-checks all .specl files in a directory for CI integration. The full command reference is in the book.

Output formats: human-readable (default), JSON (for CI), ITF (Apalache-compatible traces), Mermaid (sequence diagrams), and DOT (Graphviz state graphs).

Auto-detection and super-user flags

A design goal: you should be able to run specl check myspec.specl and get sensible results without understanding the internals. The checker auto-detects when partial order reduction and symmetry reduction will help (based on action independence analysis and Dict[0..N, T] patterns), and enables them automatically. --symbolic auto-cascades through techniques from cheapest to most powerful. --no-auto disables all auto-detection for benchmarking or when you want full manual control.

For the explicit-state checker, the super-user flags cover storage: --fast (fingerprint-only, ~10x less memory, auto-reruns with full storage on violations), --bloom (probabilistic fixed-memory, unsound but useful for exploration), --collapse (per-variable interning, 2-5x savings), --tree (LTSmin-style hierarchical compression, 3-10x savings); search strategy: --directed (priority queue toward invariant violations), --swarm N (parallel instances with shuffled action orders), --incremental (cache fingerprints to disk, skip previously explored states on re-runs); reduction: --por (partial order reduction, 2-10x state reduction), --symmetry (collapse equivalent process permutations, up to N!); and scoping: --check-only (focus on named invariants), --max-states, --max-depth, --max-time, --memory-limit.

For the symbolic checker, beyond the individual technique flags (--bmc, --inductive, --k-induction, --ic3, --portfolio, --golem), there are tuning knobs: --seq-bound (max sequence length in the SMT encoding), --spacer-profile (Z3 aggressiveness), --bmc-depth, and --timeout.

The tool is intended to be plugged into LLM workflows — the feedback and CLI is designed to help an LLM without specific Specl knowledge to write and check specs effectively.

Language server

The LSP now has 20+ features: go-to-definition, find references, rename, semantic highlighting, inlay hints (parameter names at call sites), call hierarchy, code lens (reference counts), signature help, folding ranges, quick fixes for undefined variables, code actions for generating init/action/invariant templates, and range formatting.

Language changes

Semicolons replace and as statement separators: x = 1; y = 2 instead of x = 1 and y = 2. Default values for constants reduce boilerplate. view declarations for computed read-only state. key in dict membership operator. Auxiliary invariants as proof helpers for symbolic checking.
view declarations define computed read-only state — projections of the full state that appear in counterexample traces. This is the Specl equivalent of TLA+'s VIEW operator: you declare what matters for understanding a trace without cluttering the spec with extra variables. Views are also used for symmetry reduction (defining the canonical observation of state).

Another addition: default values for constants to reduce boilerplate.

What's missing

These are tracked as open issues:

Language features. Various expressiveness things such as higher-order collection operations — filter, map, fold (#55).

Bugs. Several bug fixes required mainly on symbolic modes: portfolio mode hangs (#58). Symmetry reduction spurious violations (#56). --check-only incompatible with --symbolic (#57). Also support for Set[Seq[T]] (#35)

Symbolic tooling comparison: on performance and feature set to Quint/Apalache etc.

Non-goals

Some things are intentionally out of scope. These could be revisited if there's demand, but they are not on the roadmap:

Temporal properties (liveness, fairness). I would happily add this in future if there is demand for it, but safety properties are more typical.
JIT compilation (Cranelift or similar), i.e. compiling the spec to native code. It could be done in future but its a big architectural departure.

The manual

The documentation is now a full mdBook site at specl.danieltisdall.com. It covers:

Try it

git clone https://github.com/danwt/specl.git
cd specl/specl
cargo install --path crates/specl-cli
specl check examples/showcase/raft.specl -c N=2 -c MaxVal=1 -c MaxElections=2 -c MaxRestarts=0 -c MaxLogLen=3 --no-deadlock --bfs --fast

GitHub: danwt/specl

Book: specl.danieltisdall.com