fix: stop genservers gracefully by dicethedev · Pull Request #319 · lambdaclass/ethlambda

dicethedev · 2026-04-28T04:39:45Z

🗒️ Description

This PR implements graceful shutdown for the ethlambda node's actors and HTTP servers. Previously, the node would abruptly terminate on Ctrl+C without waiting for in-flight operations to complete.

Now, on shutdown signal:

The BlockChain and P2P actors are stopped cleanly via their context's stop() method
The API and metrics HTTP servers are shut down gracefully using Axum's with_graceful_shutdown()
The main thread waits for all actors and server tasks to finish before exiting

These changes ensure that:

Aggregation workers complete their current jobs
Database writes are flushed
Network connections are closed properly

This prevents data corruption and improves reliability.

🔗 Related Issues or PRs

Closes #195

greptile-apps · 2026-04-28T04:42:06Z

Greptile Summary

This PR adds graceful shutdown to the ethlambda node by stopping the BlockChain and P2P actors via context().stop(), signalling both HTTP servers through a shared Arc<Notify>, and awaiting all handles before exit. The crates/net/rpc changes are clean; the main concern is in the shutdown signalling path.

notify_waiters() race: Notify::notify_waiters() only wakes currently registered waiters. If the spawned server tasks haven't been polled (and thus haven't registered their notified() futures) before Ctrl+C fires, both notifications are silently dropped and api_handle.await / metrics_handle.await block indefinitely. Using notify_one() with two separate Arc<Notify> instances, or a watch channel, would eliminate this race.
Unbounded shutdown awaits: There is no timeout on the four join/await calls, so a stuck actor or a server holding open long-lived connections will prevent the process from ever exiting.

Confidence Score: 3/5

Safe to merge for typical usage, but a race condition in the shutdown signalling path can cause an indefinite hang.

One P1 finding (notify_waiters race that silently drops shutdown signals and causes indefinite hangs) and one P2 (no shutdown timeout). The P1 drives the score below the 4/5 ceiling.

bin/ethlambda/src/main.rs — specifically the notify_waiters() call and the unbounded join awaits in the shutdown sequence.

Important Files Changed

Filename	Overview
bin/ethlambda/src/main.rs	Adds graceful shutdown via Arc and actor context().stop(); the notify_waiters() call can silently drop notifications if spawned tasks haven't registered their waiters yet, causing an indefinite hang on shutdown.
crates/net/rpc/src/lib.rs	Adds a generic shutdown future parameter to both server functions and wires it into axum's with_graceful_shutdown(); clean and idiomatic change with no issues.

Sequence Diagram

sequenceDiagram
    participant Main
    participant BlockChain
    participant P2P
    participant ApiServer
    participant MetricsServer

    Main->>BlockChain: spawn actor
    Main->>P2P: spawn actor
    Main->>ApiServer: tokio::spawn (with shutdown_future)
    Main->>MetricsServer: tokio::spawn (with shutdown_future)
    Main->>Main: ctrl_c().await

    Main->>BlockChain: context().stop()
    Main->>P2P: context().stop()
    Main->>ApiServer: shutdown_notify.notify_waiters()
    Main->>MetricsServer: shutdown_notify.notify_waiters()

    Main->>BlockChain: join().await
    BlockChain-->>Main: done
    Main->>P2P: join().await
    P2P-->>Main: done
    Main->>ApiServer: api_handle.await
    ApiServer-->>Main: done
    Main->>MetricsServer: metrics_handle.await
    MetricsServer-->>Main: done

    Main->>Main: info!("Shutdown complete")

Prompt To Fix All With AI

This is a comment left during a code review.
Path: bin/ethlambda/src/main.rs
Line: 238

Comment:
**`notify_waiters()` can silently drop shutdown signals**

`Notify::notify_waiters()` only wakes futures that are *currently registered* as waiters at the moment of the call — it does not store a permit for futures that haven't been polled yet. The shutdown futures inside the spawned tasks (`async move { X.notified().await }`) don't register themselves until axum polls the shutdown future for the first time, which happens asynchronously after `tokio::spawn`. If `ctrl_c` fires before the spawned tasks have had a chance to run, both notifications are silently dropped and `api_handle.await` / `metrics_handle.await` will block indefinitely — the process never exits.

The clearest fix is two separate `Arc<Notify>` handles each called with `notify_one()`, which stores a permit that survives until consumed:

```rust
let api_shutdown_notify = Arc::new(Notify::new());
let metrics_shutdown_notify = Arc::new(Notify::new());
// ... pass clones into spawned tasks ...
api_shutdown_notify.notify_one();
metrics_shutdown_notify.notify_one();
```

Alternatively, a `tokio::sync::watch` channel with a stored value avoids this race entirely for any number of listeners.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: bin/ethlambda/src/main.rs
Line: 240-243

Comment:
**No timeout on shutdown awaits**

All four `await` calls — `blockchain_ref.join()`, `p2p_ref.join()`, `api_handle.await`, and `metrics_handle.await` — are unbounded. If any actor gets stuck in a long-running handler or the HTTP servers hold open long-lived connections, the process will hang indefinitely. Consider wrapping the shutdown sequence in `tokio::time::timeout` so the node can force-exit after a reasonable deadline (e.g., 30 s) even if something stalls.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix: stop genservers gracefully" | Re-trigger Greptile}

MegaRedHand

Hey @dicethedev! Thank you for your contribution! I left you some comments.

Also, remember to sign your commits, as explained in our CONTRIBUTING.md

MegaRedHand · 2026-04-29T15:21:13Z

+    // Handle multiple Ctrl+C signals:
+    // 1st Ctrl+C: graceful shutdown
+    // 2nd Ctrl+C: force exit immediately
+    tokio::spawn(async move {
+        loop {
+            tokio::signal::ctrl_c().await.ok();
+            let count = signal_count_clone.fetch_add(1, Ordering::SeqCst) + 1;
+            if count == 1 {
+                info!("Shutdown signal received, stopping actors and servers...");
+            } else {
+                warn!("Force shutdown requested, exiting immediately");
+                std::process::exit(1);
+            }
+        }
+    });
+
+    // Wait for first signal
+    loop {
+        if signal_count.load(Ordering::SeqCst) > 0 {
+            break;
+        }
+        tokio::time::sleep(tokio::time::Duration::from_millis(100)).await;
+    }


Let's remove the atomic and busy-loop:

// 1st ctrl+c: start graceful shutdown tokio::signal::ctrl_c().await.ok(); info!("Shutdown signal received, stopping actors and servers..."); tokio::spawn(async move { let ctrlcs_to_forced_shutdown = 3; // This can be turned into a loop tokio::signal::ctrl_c().await.ok(); warn!("Graceful shutdown in progress. Press ctrl+C 2 more times to force ungraceful shutdown"); tokio::signal::ctrl_c().await.ok(); warn!("Graceful shutdown in progress. Press ctrl+C 1 more times to force ungraceful shutdown"); tokio::signal::ctrl_c().await.ok(); info!("Forced ungraceful shutdown..."); std::process::exit(1); }) let blockchain_ref = blockchain.actor_ref().clone(); // etc.

MegaRedHand

Left some more comments

MegaRedHand · 2026-04-29T15:22:36Z

    address: SocketAddr,
    store: Store,
    aggregator: AggregatorController,
+    shutdown: tokio_util::sync::CancellationToken,


Let's import this at the top of the file:

Suggested change

shutdown: tokio_util::sync::CancellationToken,

shutdown: CancellationToken,

MegaRedHand · 2026-04-29T15:22:41Z

-pub async fn start_metrics_server(address: SocketAddr) -> Result<(), std::io::Error> {
+pub async fn start_metrics_server(
+    address: SocketAddr,
+    shutdown: tokio_util::sync::CancellationToken,


greptile-apps Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread bin/ethlambda/src/main.rs Outdated

Comment thread bin/ethlambda/src/main.rs

MegaRedHand requested changes Apr 28, 2026

View reviewed changes

Comment thread bin/ethlambda/src/main.rs Outdated

Comment thread crates/net/rpc/src/lib.rs Outdated

Comment thread bin/ethlambda/src/main.rs

Comment thread bin/ethlambda/src/main.rs Outdated

dicethedev requested a review from MegaRedHand April 29, 2026 12:46

dicethedev force-pushed the fix/stop-genservers-gracefully branch 2 times, most recently from e22256b to 1c20bdf Compare April 29, 2026 13:15

dicethedev added 3 commits April 29, 2026 14:29

fix: stop genservers gracefully

e53b72c

fix: use CancellationToken for graceful shutdown

8a7d098

feat: allow force shutdown on second Ctrl+C

28a7f6f

dicethedev force-pushed the fix/stop-genservers-gracefully branch 2 times, most recently from 1c20bdf to 28a7f6f Compare April 29, 2026 13:44

MegaRedHand reviewed Apr 29, 2026

View reviewed changes

Merge branch 'main' into fix/stop-genservers-gracefully

25453d8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stop genservers gracefully#319

fix: stop genservers gracefully#319
dicethedev wants to merge 4 commits intolambdaclass:mainfrom
dicethedev:fix/stop-genservers-gracefully

dicethedev commented Apr 28, 2026

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

MegaRedHand left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MegaRedHand Apr 29, 2026 •

edited

Loading

Uh oh!

MegaRedHand left a comment

Uh oh!

MegaRedHand Apr 29, 2026

Uh oh!

MegaRedHand Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	shutdown: tokio_util::sync::CancellationToken,
	shutdown: CancellationToken,

Conversation

dicethedev commented Apr 28, 2026

🗒️ Description

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

MegaRedHand left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MegaRedHand Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MegaRedHand left a comment

Choose a reason for hiding this comment

Uh oh!

MegaRedHand Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

MegaRedHand Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MegaRedHand Apr 29, 2026 •

edited

Loading