Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Backoff and Reconnection

When a connection fails, the worst thing a peer can do is immediately retry. If ten peers all lose their connections at the same moment (say, after a network partition heals), and all of them retry instantly, they will overwhelm the destination with simultaneous connection attempts. This creates a reconnection storm that can be worse than the original failure.

Exponential Backoff

Moonpool peers use exponential backoff on reconnection, following FoundationDB’s pattern (FlowTransport.actor.cpp:892-897). The ReconnectState tracks the current delay and doubles it after each failure, up to a configured maximum:

#![allow(unused)]
fn main() {
let next_delay = std::cmp::min(
    state.reconnect_state.current_delay * 2,
    config.max_reconnect_delay,
);
state.reconnect_state.current_delay = next_delay;
}

The default PeerConfig starts with a 100ms initial delay and caps at 30 seconds:

#![allow(unused)]
fn main() {
PeerConfig {
    initial_reconnect_delay: Duration::from_millis(100),
    max_reconnect_delay: Duration::from_secs(30),
    max_queue_size: 1000,
    connection_timeout: Duration::from_secs(5),
    max_connection_failures: None, // Unlimited retries
    monitor: Some(MonitorConfig::default()),
}
}

On a successful connection, the backoff resets to the initial delay. The failure counter resets to zero. The peer is ready for the next disruption with a clean slate.

Why It Matters in Simulation

Without backoff, simulation tests that inject network failures produce degenerate behavior. The event queue fills with connection attempts that all fail, each failure spawns another immediate retry, and the simulation spends all its time processing reconnection events instead of making progress on actual workload logic.

With backoff, the chaos engine can sever connections freely. Peers back off, the event queue stays manageable, and when connections restore, peers reconnect in a staggered pattern that avoids thundering herd effects.

You can use assert_sometimes_each! to track backoff depth across simulation runs, ensuring you exercise multiple levels of the exponential curve:

#![allow(unused)]
fn main() {
// Example: track that different backoff depths are reached
assert_sometimes_each!(
    "backoff_depth",
    [("attempt", failure_count)]
);
}

Profile Presets

Different network environments need different backoff tuning. PeerConfig provides presets:

ProfileInitial DelayMax DelayQueue SizeTimeoutMax Failures
Default100ms30s10005sUnlimited
Local10ms1s100500ms10
WAN500ms60s500030sUnlimited

For simulation tests, the default profile works well. The chaos engine can buggify the actual delays through the TimeProvider, stretching or shortening them to explore timing-sensitive code paths.