Skip to content

tests: add xfail-strict reproducer for funder-side stuck CHANNELD_AWAITING_LOCKIN#9113

Open
ksedgwic wants to merge 3 commits into
ElementsProject:masterfrom
ksedgwic:stuck-awaiting-lockin-test
Open

tests: add xfail-strict reproducer for funder-side stuck CHANNELD_AWAITING_LOCKIN#9113
ksedgwic wants to merge 3 commits into
ElementsProject:masterfrom
ksedgwic:stuck-awaiting-lockin-test

Conversation

@ksedgwic
Copy link
Copy Markdown
Collaborator

@ksedgwic ksedgwic commented May 5, 2026

When a node is the funder of a channel whose funding tx never confirms (broadcast rejected at ATMP, evicted from mempool, or simply never broadcast), the channel record stays in CHANNELD_AWAITING_LOCKIN indefinitely. CLN already implements the BOLT 2 fundee-side forget rule (PR #1468, --max-funding-unconfirmed-blocks, default 2016) but has no equivalent on the funder side.

The test asserts the desired post-fix behavior (state has moved beyond CHANNELD_AWAITING_LOCKIN) and is marked @pytest.mark.xfail(strict=True) so:

  • CI reports XFAIL today (acceptable; documents the open bug)
  • When the bug is fixed, the test reports XPASS, which strict=True promotes to a hard failure to alert the dev to remove the marker.

Reproduces #9112

…ITING_LOCKIN

When a node is the funder of a channel whose funding tx never confirms
(broadcast rejected at ATMP, evicted from mempool, or simply never
broadcast), the channel record stays in CHANNELD_AWAITING_LOCKIN
indefinitely. CLN already implements the BOLT 2 fundee-side forget
rule (PR ElementsProject#1468, --max-funding-unconfirmed-blocks, default 2016) but
has no equivalent on the funder side.

The test asserts the desired post-fix behavior (state has moved beyond
CHANNELD_AWAITING_LOCKIN) and is marked @pytest.mark.xfail(strict=True)
so:

  - CI reports XFAIL today (acceptable; documents the open bug)
  - When the bug is fixed, the test reports XPASS, which strict=True
    promotes to a hard failure to alert the dev to remove the marker.

Changelog-None
…LATERAL

Same root cause as the previous test_funder_stuck_no_funding_confirm
(funding tx unbroadcastable/unconfirmable, no funder-side cleanup).
This variant covers the second symptom: when the operator (or an
automation like CLBOSS's spenderp) issues `close` on the AWAITING_LOCKIN
channel, CLN transitions to AWAITING_UNILATERAL and tries to broadcast
a commitment tx that spends the (non-existent) funding output. That
commit tx can never confirm either, so the channel record now sits
stuck in AWAITING_UNILATERAL indefinitely.

Stops l2 before close to force unilateral and avoid mutual close
racing in.

Marked xfail-strict so the bug is documented without breaking CI.

Changelog-None
@ksedgwic
Copy link
Copy Markdown
Collaborator Author

ksedgwic commented May 5, 2026

Adds a second xfail-strict reproducer for the same root cause:
test_funder_stuck_close_before_funding_confirm.

If the operator (or an automation like CLBOSS's spenderp) issues
close while the funder is in CHANNELD_AWAITING_LOCKIN with an
unbroadcastable funding tx, the channel transitions to
AWAITING_UNILATERAL and CLN tries to broadcast a commitment tx that
spends the (non-existent) funding output. That commit tx can never
confirm either, so the channel record sits stuck in
AWAITING_UNILATERAL indefinitely.

Same setup as the first test (mock_rpc on sendrawtransaction +
dev-max-funding-unconfirmed-blocks=10); diverges by stopping l2 and
calling close with unilateraltimeout=1 to force AWAITING_UNILATERAL,
then asserts the channel is no longer in that state after THRESHOLD+5
blocks.

Kept as a separate commit so it can be split off if maintainers prefer
to address the two stuck states independently.

The two existing funder-stuck reproducers (test_funder_stuck_no_funding_confirm,
test_funder_stuck_close_before_funding_confirm) demonstrate the channel
record stays in CHANNELD_AWAITING_LOCKIN / AWAITING_UNILATERAL while
the funding tx is merely unbroadcastable (censored at the proxy).

That leaves an hole in the policy argument: as long as the funding
inputs remain unspent, the funding tx could in principle still
confirm, so the state-machine wait is defensible.

This new test removes that hole.  After the channel reaches
CHANNELD_AWAITING_LOCKIN with a censored funding tx, we:

  1. Capture the funding tx hex via the proxy mock.
  2. Force-unreserve the funding inputs (the funding-tx reservation
     is ~2016 blocks, so we explicitly pass a large reserve= value
     to push reserved_til below current height).
  3. Spend the same UTXOs in a separate withdraw tx that DOES land
     on chain (the proxy mock forwards non-funding-tx broadcasts).
  4. Mature the double-spend 100 blocks past confirmation, matching
     Bitcoin's coinbase maturity rule (the canonical reorg-safe depth).

At this point the funding tx is provably and permanently invalid;
no Bitcoin convention treats the spend as still reversible.  Yet
CLN keeps the channel record stuck in CHANNELD_AWAITING_LOCKIN.

The test is marked xfail-strict, like its siblings.  Once a fix
exists, removing the marker will turn an xpass into a hard failure
to alert the developer to clean up the marker.

Reproduces ElementsProject#9112 with a stronger demonstration than the existing
tests.
@ksedgwic
Copy link
Copy Markdown
Collaborator Author

@ddustin added the "double-spent funding tx" modification we discussed, channel is stuck even though the funding can never commit ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant