Backoff and Reconnection

Exponential Backoff
Why It Matters in Simulation
Profile Presets

When a connection fails, the worst thing a peer can do is immediately retry. If ten peers all lose their connections at the same moment (say, after a network partition heals), and all of them retry instantly, they will overwhelm the destination with simultaneous connection attempts. This creates a reconnection storm that can be worse than the original failure.

Exponential Backoff

Moonpool peers use exponential backoff on reconnection, following FoundationDB’s pattern (FlowTransport.actor.cpp:892-897). The ReconnectState tracks the current delay and doubles it after each failure, up to a configured maximum:

#![allow(unused)]
fn main() {
let next_delay = std::cmp::min(
    state.reconnect_state.current_delay * 2,
    config.max_reconnect_delay,
);
state.reconnect_state.current_delay = next_delay;
}

The default PeerConfig starts with a 100ms initial delay and caps at 30 seconds:

#![allow(unused)]
fn main() {
PeerConfig {
    initial_reconnect_delay: Duration::from_millis(100),
    max_reconnect_delay: Duration::from_secs(30),
    max_queue_size: 1000,
    connection_timeout: Duration::from_secs(5),
    max_connection_failures: None, // Unlimited retries
    monitor: Some(MonitorConfig::default()),
}
}

On a successful connection, the backoff resets to the initial delay. The failure counter resets to zero. The peer is ready for the next disruption with a clean slate.

Why It Matters in Simulation

Without backoff, simulation tests that inject network failures produce degenerate behavior. The event queue fills with connection attempts that all fail, each failure spawns another immediate retry, and the simulation spends all its time processing reconnection events instead of making progress on actual workload logic.

With backoff, the chaos engine can sever connections freely. Peers back off, the event queue stays manageable, and when connections restore, peers reconnect in a staggered pattern that avoids thundering herd effects.

You can use assert_sometimes_each! to track backoff depth across simulation runs, ensuring you exercise multiple levels of the exponential curve:

#![allow(unused)]
fn main() {
// Example: track that different backoff depths are reached
assert_sometimes_each!(
    "backoff_depth",
    [("attempt", failure_count)]
);
}

Profile Presets

Different network environments need different backoff tuning. PeerConfig provides presets:

Profile	Initial Delay	Max Delay	Queue Size	Timeout	Max Failures
Default	100ms	30s	1000	5s	Unlimited
Local	10ms	1s	100	500ms	10
WAN	500ms	60s	5000	30s	Unlimited

For simulation tests, the default profile works well. The chaos engine can buggify the actual delays through the TimeProvider, stretching or shortening them to explore timing-sensitive code paths.

Keyboard shortcuts

The Sim Book

Backoff and Reconnection

Exponential Backoff

Why It Matters in Simulation

Profile Presets