Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Fault Reference

Consolidated quick-reference of every fault moonpool-sim can inject, organized by category. For detailed explanations and examples, see Network Faults, Storage Faults, and Attrition: Process Reboots.

Every fault listed below is automatically emitted to the "sim:faults" event timeline as a SimFaultEvent. Invariants can read these to correlate application behavior with infrastructure faults.

All defaults below refer to the values in ChaosConfiguration::default() and StorageConfiguration::default(). When using random_for_seed(), these values are randomized per seed within documented ranges.

Network Faults

Configured via ChaosConfiguration (nested under NetworkConfiguration::chaos).

Connection Failures

FaultConfig FieldDefaultReal-World Scenario
Random connection closerandom_close_probability0.001%Reconnection logic, message redelivery, connection pooling
Asymmetric closerandom_close_explicit_ratio30% explicit (FIN), 70% silent (RST)Half-closed sockets, FIN vs RST handling
Close cooldownrandom_close_cooldown5sPrevents cascading failures after a close event
Connect failureconnect_failure_modeProbabilistic (50% refused, 50% hang)Connection establishment retries, timeout handling
Connect failure probabilityconnect_failure_probability50%Ratio of failed vs hanging connections

Latency and Congestion

FaultConfig FieldDefaultReal-World Scenario
Latency distributionlatency_distributionUniformP99/P99.9 tail latency testing
Slow latency spikeslow_latency_probability0.1% (bimodal mode only)GC pauses, cross-datacenter hops
Slow latency multiplierslow_latency_multiplier10x normalMagnitude of tail latency spikes
Write cloggingclog_probability / clog_duration0%, 100-300msBackpressure handling, flow control
Clock driftclock_drift_enabled / clock_drift_maxenabled, 100msLease expiration, distributed consensus, TTL handling
Buggified delaybuggified_delay_probability / buggified_delay_max25%, 100msRace conditions, timing-dependent bugs
Handshake delayhandshake_delay_enabled / handshake_delay_maxenabled, 10msTLS negotiation, connection startup overhead

Network Partitions

FaultConfig FieldDefaultReal-World Scenario
Random partitionpartition_probability0%Split-brain, quorum loss, leader election
Partition durationpartition_duration200ms-2sRecovery time after network heal
Partition strategypartition_strategyRandomRandom / UniformSize / IsolateSingle patterns

Manual partition methods are also available on SimWorld: partition_pair(), partition_send_from(), partition_recv_to().

Data Integrity

FaultConfig FieldDefaultReal-World Scenario
Bit flipsbit_flip_probability0.01%CRC/checksum validation, data corruption detection
Flip rangebit_flip_min_bits / bit_flip_max_bits1-32 bitsPower-law distribution of corruption severity
Flip cooldownbit_flip_cooldown0 (no cooldown)Rate-limiting corruption events
Partial writespartial_write_max_bytes1000 bytesTCP fragmentation, message framing

Half-Open Connections

FaultMethodReal-World Scenario
Peer crash simulationsimulate_peer_crash()TCP keepalive, heartbeat detection, silent failures
Half-open error detectionshould_half_open_error()Timeout-based failure detection
Stable connection exemptionmark_connection_stable()Exempt supervision channels from chaos

Storage Faults

Configured via StorageConfiguration. All fault probabilities default to 0% and must be enabled explicitly or via random_for_seed(). Storage faults are scoped per process: StorageState holds a global config plus optional per-process overrides in per_process_configs. Use SimWorld::set_process_storage_config(ip, config) to assign different fault profiles to individual processes.

FaultConfig FieldDefaultReal-World Scenario
Read corruptionread_fault_probability0%ECC failures, DRAM bit flips, media degradation
Write corruptionwrite_fault_probability0%Bad sectors, controller bugs, disk full
Crash fault (torn writes)crash_fault_probability0%Power loss mid-I/O, crash consistency
Misdirected writemisdirect_write_probability0%Firmware bugs, wrong block written
Misdirected readmisdirect_read_probability0%Controller errors, wrong block read
Phantom writephantom_write_probability0%Drive lies about durability
Sync failuresync_failure_probability0%fsync fails, disk full

Per-Process Storage Operations

MethodParametersDescription
SimWorld::set_process_storage_config(ip, config)IpAddr, StorageConfigurationSet per-process fault config (overrides global)
SimWorld::simulate_crash_for_process(ip, close_files)IpAddr, boolSimulate power loss: torn writes, optional file close
SimWorld::wipe_storage_for_process(ip)IpAddrDelete all storage owned by the process
SimWorld::storage_provider(ip)IpAddrCreate a SimStorageProvider scoped to this process

Storage Performance Simulation

Storage also simulates realistic performance characteristics independent of fault injection.

ParameterConfig FieldDefaultDescription
IOPSiops25,000I/O operations per second limit
Bandwidthbandwidth150 MB/sMaximum throughput
Read latencyread_latency50-200usPer-read operation delay
Write latencywrite_latency100-500usPer-write operation delay
Sync latencysync_latency1-5msPer-sync/flush delay

Process Lifecycle Faults

Configured via Attrition (built-in) or custom FaultInjector implementations.

FaultMechanismBehavior
Graceful rebootRebootKind::GracefulSignal shutdown token, wait grace period (default 2-5s), force kill, restart after recovery delay (default 1-10s)
Crash rebootRebootKind::CrashImmediate task abort, all connections reset, restart after recovery delay
Crash + wipeRebootKind::CrashAndWipeCrash behavior + immediate wipe of all persistent storage owned by the process (scoped by IP)
Continuous attritionAttrition configRandom reboots during chaos phase with weighted prob_graceful/prob_crash/prob_wipe and max_dead limit

Configuration Presets

PresetDescription
NetworkConfiguration::random_for_seed()All chaos parameters randomized per seed for comprehensive testing
NetworkConfiguration::fast_local()1-10us latencies, all chaos disabled
ChaosConfiguration::disabled()Zero probability for every fault category
StorageConfiguration::random_for_seed()Randomized faults (0.001%-0.1%), varied IOPS (10K-100K), varied bandwidth (50-500 MB/s)
StorageConfiguration::fast_local()1M IOPS, 1 GB/s bandwidth, 1us latencies, all faults disabled

See Configuration Reference for the complete builder API and all configuration types.