Skip to content

connection: add per-peer exponential reconnection backoff#951

Draft
Jolah1 wants to merge 2 commits into
lightningdevkit:mainfrom
Jolah1:peer-reconnection-backoff
Draft

connection: add per-peer exponential reconnection backoff#951
Jolah1 wants to merge 2 commits into
lightningdevkit:mainfrom
Jolah1:peer-reconnection-backoff

Conversation

@Jolah1

@Jolah1 Jolah1 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Closes #918.

Summary

Today the reconnection loop retries every disconnected peer on a fixed PEER_RECONNECTION_INTERVAL (60s). For a peer that is persistently unreachable this means a reconnect attempt every minute indefinitely, which is wasteful and noisy. This adds per-peer exponential backoff so repeated failures are spaced out, while a successful connection (or a user-initiated connect) resets a peer back to the base interval.

What this does

  • Introduces PeerReconnectState in connection.rs, tracked per peer in a Mutex<HashMap<PublicKey, PeerReconnectState>> on ConnectionManager.
  • Each consecutive failure schedules the next retry at the current backoff and doubles it, capped at PEER_RECONNECTION_MAX_INTERVAL (30 min).
  • The reconnect loop gates each peer on is_reconnect_due rather than attempting every peer every tick.
  • Backoff state is cleared on a successful connect (do_connect_peer), on explicit disconnect(), and when a peer is removed after its last channel closes.
  • Already-connected peers have their state cleared each tick, so an inbound
    reconnection also resets backoff.

Tests

  • reconnect_state_doubles_until_capped - verifies doubling and the cap.
  • reconnect_state_schedules_relative_to_failure_time - verifies the next retry is scheduled relative to the failure instant.

A note on "configurable"

The issue title says "configurable." This PR keeps the base/cap as pub(crate) constants (matching the existing reconnection-interval constant) rather than exposing them on Config. Happy to wire them into Config as a follow-up (or here) if reviewers prefer — wanted to keep the surface minimal first.

Dependency / draft status

Marking this as draft: it builds on #895, which relocates last-channel
peer removal into the event.rs ChannelClosed handler. Once #895 lands I
will rebase and move the clear_reconnect_state call to sit alongside the
relocated removal (and pick up the now-async remove_peer .awaits from #919). Will un-draft after #895 is merged.

Jolah1 and others added 2 commits June 2, 2026 11:13
Previously, the background reconnection task retried every persisted peer
on a fixed 60s interval with no backoff, so an unreachable peer was retried
indefinitely at the same cadence — log spam and wasted work. This became
more visible after lightningdevkit#895 retained peers across force-closes so that
channel_reestablish recovery can run.

Track per-peer reconnect state in ConnectionManager: on failure, double the
retry interval up to PEER_RECONNECTION_MAX_INTERVAL (30 min); on success
(including user-initiated connects), clear the state so a subsequent drop
retries promptly. The 60s tokio::time::interval is kept as the wakeup,
gated per-peer by next_retry_at, since lightningdevkit#588's inline-sleep form does not
generalize to N peers. Backoff state is in-memory and resets on restart —
a fresh post-restart attempt is the correct behavior. State is also
cleared when a peer is removed from the persisted store.

Closes lightningdevkit#918.
@ldk-reviews-bot

ldk-reviews-bot commented Jun 26, 2026

Copy link
Copy Markdown

👋 Hi! This PR is now in draft status.
I'll wait to assign reviewers until you mark it as ready for review.
Just convert it out of draft status when you're ready for review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add configurable reconnection backoff for persisted peers

2 participants