Configuring the SimulationBuilder
- The Minimal Builder
- Adding Processes
- Iteration Control
- Seed Control
- Tags for Role Distribution
- Invariants
- Chaos and Attrition
- Randomized Network
- Putting It All Together
The SimulationBuilder is the glue. It takes your Process, your Workload, your invariants, and your chaos configuration, and wires them into a runnable simulation.
The Minimal Builder
The simplest possible simulation has one workload and runs once:
#![allow(unused)]
fn main() {
let report = SimulationBuilder::new()
.workload(KvWorkload::new(100, keys))
.run()
.await;
}
This creates a single workload at IP 10.0.0.1, runs it with a random seed, and produces a SimulationReport. No processes, no chaos, no multiple iterations. Useful for smoke testing, but not for finding bugs.
Adding Processes
To test a client-server system, add processes alongside the workload:
#![allow(unused)]
fn main() {
let report = SimulationBuilder::new()
.processes(3, || Box::new(KvServer))
.workload(KvWorkload::new(100, keys))
.run()
.await;
}
The builder creates 3 server processes at 10.0.1.1 through 10.0.1.3 and one workload at 10.0.0.1. The workload finds server IPs through ctx.topology().all_process_ips().
Iteration Control
One iteration is not enough. Different seeds produce different scheduling orders, different random choices, different failure patterns. You need hundreds or thousands of iterations to find bugs hiding in rare interleavings.
Fixed count runs a specific number of iterations:
#![allow(unused)]
fn main() {
.set_iterations(100)
// or equivalently:
.set_iteration_control(IterationControl::FixedCount(100))
}
Time limit runs until a wall-clock deadline:
#![allow(unused)]
fn main() {
.set_time_limit(Duration::from_secs(60))
}
Each iteration gets a different seed, producing a different execution. The seeds are deterministic and derived from the iteration manager, so the same configuration always explores the same seeds.
Seed Control
When a simulation fails on a specific seed, you need to reproduce it. Use set_debug_seeds() to run exactly those seeds:
#![allow(unused)]
fn main() {
SimulationBuilder::new()
.processes(3, || Box::new(KvServer))
.workload(KvWorkload::new(100, keys))
.set_debug_seeds(vec![42, 7891])
.run()
.await;
}
This runs exactly 2 iterations with seeds 42 and 7891. Combined with RUST_LOG=error, this is the primary debugging workflow: find the failing seed in the report, reproduce it in isolation, add logging, find the bug.
Tags for Role Distribution
When your distributed system has roles, tags assign them to processes:
#![allow(unused)]
fn main() {
SimulationBuilder::new()
.processes(5, || Box::new(ConsensusNode))
.tags(&[
("role", &["leader", "follower"]),
("dc", &["east", "west", "eu"]),
])
}
Tags distribute round-robin. Process 0 gets role=leader, dc=east. Process 1 gets role=follower, dc=west. Process 2 gets role=leader, dc=eu. And so on, wrapping around.
Inside a Process, read tags via ctx.topology().my_tags().get("role"). Inside a Workload, query the tag registry: ctx.topology().ips_tagged("role", "leader") returns the IPs of all leader processes.
Invariants
Invariants run after every simulation event. They check cross-workload properties that must hold at all times, not just at the end.
Trait-based invariant:
#![allow(unused)]
fn main() {
struct AgreementInvariant;
impl Invariant for AgreementInvariant {
fn name(&self) -> &str { "agreement" }
fn check(&self, state: &StateHandle, _sim_time_ms: u64) {
if let Some(model) = state.get::<ConsensusModel>("consensus_model") {
for (slot, values) in &model.committed_values {
let unique: HashSet<_> = values.iter().collect();
assert_always!(unique.len() <= 1, "agreement violated");
}
}
}
}
// Register on builder:
.invariant(AgreementInvariant)
}
Closure-based invariant for simpler cases:
#![allow(unused)]
fn main() {
.invariant_fn("key_count_bounded", |state, _time| {
if let Some(model) = state.get::<KvModel>("kv_model") {
assert_always!(model.len() <= 1000, "too many keys");
}
})
}
Invariants read from the StateHandle, which workloads write to via ctx.state().publish(). This is how the test driver communicates its reference model to the invariant checker.
Chaos and Attrition
Real distributed systems do not just run cleanly. Servers crash, networks partition, and then things have to recover. The builder models this with chaos_duration:
#![allow(unused)]
fn main() {
use moonpool_sim::Attrition;
SimulationBuilder::new()
.processes(3, || Box::new(KvServer))
.workload(KvWorkload::new(200, keys))
.chaos_duration(Duration::from_secs(30))
.attrition(Attrition {
max_dead: 1,
prob_graceful: 0.3,
prob_crash: 0.5,
prob_wipe: 0.2,
recovery_delay_ms: Some(1000..5000),
grace_period_ms: Some(2000..4000),
})
.set_iterations(100)
.run()
.await;
}
The simulation lifecycle:
-
Chaos phase (30 simulated seconds): Workloads run concurrently with fault injectors. Attrition randomly kills and restarts processes, respecting
max_deadto avoid killing everything at once. -
Workload completion: After chaos ends, faults stop and the system continues until all workloads finish. Workloads should be finite (do N operations, or sleep for a sim-time duration, then return).
-
Settle: The orchestrator drains remaining events. If the system does not settle within 30 seconds (sim time), the test fails with diagnostics, surfacing cleanup bugs like leaked tasks or unclosed connections.
-
Check: The
check()methods run inside the event loop, so network RPCs work normally.
max_dead: 1 means at most one process is down at any time. The probability weights control the mix of graceful shutdowns (shutdown token fired, grace period) versus instant crashes (no warning, connections abort).
Randomized Network
For additional chaos, enable randomized network configuration:
#![allow(unused)]
fn main() {
.random_network()
}
This varies latency, packet delay distributions, and other network parameters per iteration, based on the seed. Without this flag, the network uses default configuration (consistent, low-latency).
Putting It All Together
A production-grade simulation configuration looks like this:
#![allow(unused)]
fn main() {
let report = SimulationBuilder::new()
.processes(3, || Box::new(KvServer))
.tags(&[("role", &["primary", "replica"])])
.workload(KvWorkload::new(500, keys.clone()))
.invariant(ConservationLaw)
.chaos_duration(Duration::from_secs(30))
.attrition(Attrition {
max_dead: 1,
prob_graceful: 0.3,
prob_crash: 0.5,
prob_wipe: 0.2,
recovery_delay_ms: None,
grace_period_ms: None,
})
.random_network()
.set_iterations(100)
.run()
.await;
}
The builder takes care of the rest: creating the simulated world, assigning IPs, seeding the RNG, running the orchestration loop, collecting metrics, and producing the report.