Skip to content

feat: merge-train/spartan#22980

Merged
PhilWindle merged 84 commits into
nextfrom
merge-train/spartan
May 13, 2026
Merged

feat: merge-train/spartan#22980
PhilWindle merged 84 commits into
nextfrom
merge-train/spartan

Conversation

@AztecBot

@AztecBot AztecBot commented May 6, 2026

Copy link
Copy Markdown
Collaborator

BEGIN_COMMIT_OVERRIDE
fix(test): warp L1 forward when proposer scan hits EpochNotStable (#22967)
test(e2e): fail epochs tests on proposer-rollup-check-failed (#22965)
fix: grafana switch to aztec_status="proposed" (#22978)
chore: update benchmark scraper (#22984)
test(e2e): migrate simple epoch tests to pipelining (#22973)
chore: remove top-level yarn.lock (#22987)
refactor(archiver)!: unify L2BlockSource checkpoint lookups via query objects (#22933)
fix(sequencer): bounded sweep instead of event scan for governance proposal check (#22989)
fix(docs): allow webapp-tutorial yarn install to populate empty lockfile in CI (#23000)
test(e2e): enable pipelining in l1-reorgs and mbps redistribution tests (#23009)
fix(archiver): restore pending block height metric under pipelining (#22994)
chore(p2p): remove skipped validation result option (#23034)
refactor(p2p)!: remove slow tx collection flow (#22878)
chore(spartan): add next-net-clone environment config (#22995)
chore(sequencer): add context to proposer-rollup-check-failed logs (#23071)
test(e2e): wait for archiver sync before asserting pipelining (#22997)
refactor(node-rpc)!: remove deprecated AztecNode methods and L2BlockSource tip helpers (#22934)
feat(p2p): detect and track announce IP changes at runtime (#22405)
test: mark tx_stats_bench 10 TPS as flake-retryable on merge-train/spartan (#23083)
fix(sequencer): bind vote-only multicalls to target slot under pipelining (#23090)
feat(sequencer): build optimistically across pruning epoch boundary (#23056)
fix(sequencer): use chainTipsOverride.pending for log context (#23098)
test(e2e): relax post-boundary slot assertion in epochs_proof_at_boundary (#23108)
fix(bb-prover): pool long-lived bb verifier processes instead of spawning per-call (#23093)
fix(sequencer): anchor fee asset price modifier to predicted parent (#23113)
chore: error log when L1 head timestamp drifts (#22947)
fix(sequencer): override full parent checkpoint cell in pipelined simulation (#23073)
test(e2e): enable pipelining on missed l1 slot test (#23068)
fix: more robust metrics reporting in IRM monitor (#23038)
fix: preserve LMDB slashing protection (#23145)
test(e2e): enable pipelining on p2p tests (#23070)
fix(archiver): move L2 tips cache refresh out of write transactions (#23110)
test(e2e): fix data_withholding_slash flake by freezing L1 across restart (#23162)
fix(validator): include proposed checkpoint out-hashes when validating checkpoint proposals (#23119)
refactor(config): drop nested config option, flatten l1Contracts (#23143)
test(e2e): bump bash TIMEOUT for e2e_p2p/add_rollup to match jest 20m (#23177)
fix(p2p): chunk archive of mined txs on block finalization (A-969) (#23085)
fix(p2p): stream tx pool hydration to bound startup memory (A-968) (#23086)
chore: remove orphan --archiver flag usages from start invocations (#23186)
feat(ci): daily merge-train/spartan stale-PR notifier (#23189)
fix: preserve contract artifact permissions (#23174)
fix(ci3): accept slashes in /list/<path:key> for merge-train history (#23160)
feat(ci): route merge-train/spartan flake notifications to #team-alpha-ci (#23219)
fix(cheat-codes): wait for post-warp L2 block in warpL2TimeAtLeastTo (#23213)
feat: slash attesters signing over bad checkpoints (#23180)
refactor(prover-client): split orchestrator into sub-tree + top-tree pair (#22996)
fix(srs): retry transient CRS HTTP downloads with exponential backoff (#23244)
refactor(p2p): remove old reqresp mode (#23158)
docs(sequencer-client): rewrite top-level and timing READMEs (#23149)
fix(aztec-node): include upcoming checkpoint's L1 to L2 messages in simulatePublicCalls (#23163)
END_COMMIT_OVERRIDE

spypsy and others added 6 commits April 16, 2026 11:32
## Summary

- Keep `getPublicIp()` at startup so the ENR always has a valid IP from the start
- Enable discv5 `enrUpdate` with `addrVotesToUpdateEnr: 1` and faster pings (10s) when `queryForIp` is enabled, so PONG votes can correct the IP at runtime if it changes (e.g. residential ISP, Cloud NAT rotation)
- Bridge discv5 IP changes to libp2p's AddressManager so peers see updated addresses
- Have the bootnode explicitly `addEnr()` on discovery to fix routing table gaps where nodes were never inserted
- Improve P2P observability: log KAD table state in peer manager heartbeats, log ENR additions with multiaddrs, log config at startup
- Small change to deploy scripts that allows us to define a full aztec image to deploy on a network rather than just `aztecprotcool/aztec:<tag>`

Fixes [A-310](https://linear.app/aztec-labs/issue/A-310/p2p-query-for-ip-should-detect-ip-changes)

Co-authored-by: Alex Gherghisan <alexghr@users.noreply.github.com>
Co-authored-by: danielntmd <162406516+danielntmd@users.noreply.github.com>
…2967)

## Motivation

The `e2e_epochs/epochs_missed_l1_publish` test fails intermittently when
its proposer-discovery scan looks too far into the future. The L1 rollup
contract reverts with `ValidatorSelection__EpochNotStable` for any epoch
whose randao sample timestamp is still ahead of `block.timestamp`, and
the test was scanning up to 60 slots (~15 epochs at the test's epoch
duration) ahead, well past the queryable horizon.

## Approach

Wrap the proposer scan in a retry loop that catches `EpochNotStable`,
warps L1 forward by one epoch, and re-queries the same candidate. After
each warp the scan also re-anchors the candidate to keep the +4 slot
margin from the new "now", so subsequent steps (the warp to `slotZero`
and sequencer start-up) still have headroom.

## Changes

- **end-to-end (tests)**: Replace the bounded `for` loop in
`epochs_missed_l1_publish.test.ts` with a try/catch retry that warps L1
on `EpochNotStable`.
These sequencer errors were ignored in some tests. Removing that since
this error should not happen. If it does, it's cause for analysis.
@socket-security

socket-security Bot commented May 6, 2026

Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addednpm/​@​types/​node@​20.19.411001008196100

View full report

spalladino and others added 7 commits May 6, 2026 16:12
… objects (#22933)

## Motivation

Clean up the checkpoint side of `L2BlockSource`. PR #22809 already
collapsed the block-side API into 4 query-shaped methods over 2 return
types; the checkpoint surface was left with the pre-refactor sprawl (9
narrow methods over 4 return shapes, parallel by-number / by-range /
by-epoch entrypoints, and a wire-level alias that conflated proposed and
confirmed checkpoints). This change applies the same simplification.

Fixes A-979

## Approach

`L2BlockSource` checkpoint methods reduce to 4 query-shaped readers
(`getCheckpoint`, `getCheckpoints`, `getCheckpointData`,
`getCheckpointsData`) over 2 return shapes (`PublishedCheckpoint`,
`CheckpointData`), plus a polymorphic
`getProposedCheckpointData(query?)` for the proposed-only path. Three
new query types live next to `BlockQuery`/`BlocksQuery`. On-disk format
and `BlockStore` primitives are unchanged — the simplification is at the
API boundary. The public RPC's `getCheckpoint` keeps the same wire
signature but gains a confirmed→proposed fallback (for
`{number}`/`{slot}`/`'proposed'` lookups) and `BadRequestError` guards
for incompatible `include*` flags.

## API surface change

### Methods removed from `L2BlockSource`

`getCheckpoints(from, limit)`, `getCheckpointData(n)`,
`getCheckpointDataRange(from, limit)`, `getCheckpointsForEpoch(epoch)`,
`getCheckpointsDataForEpoch(epoch)`, `getCheckpointNumberBySlot(slot)`,
`getLastCheckpoint()`, `getLastProposedCheckpoint()`. Dead methods on
`data_source_base` also removed: `getCheckpointHeader`,
`getLastBlockNumberInCheckpoint`, `getSynchedCheckpointNumber`.

### Methods added to `L2BlockSource`

```ts
getCheckpoint(query: CheckpointQuery): Promise<PublishedCheckpoint | undefined>
getCheckpoints(query: CheckpointsQuery): Promise<PublishedCheckpoint[]>
getCheckpointData(query: CheckpointQuery): Promise<CheckpointData | undefined>
getCheckpointsData(query: CheckpointsQuery): Promise<CheckpointData[]>
getProposedCheckpointData(query?: ProposedCheckpointQuery): Promise<ProposedCheckpointData | undefined>

type CheckpointQuery         = { number } | { slot } | { tag: 'checkpointed' | 'proven' | 'finalized' }
type CheckpointsQuery        = { from, limit } | { epoch }
type ProposedCheckpointQuery = { number } | { slot } | { tag: 'proposed' }
```

### Public RPC (`AztecNode`) wire-level changes

- `getCheckpointsDataForEpoch(epoch)` removed;
`getCheckpointsData(query: CheckpointsQuery)` added (range or epoch).
- `'latest'` removed from `CheckpointParameter`.
- `'proposed'` semantics changed: previously aliased to "latest
L1-confirmed checkpoint" (a documented foot-gun); now
`getCheckpoint('proposed')` strictly targets the proposed-checkpoint
store, and `getCheckpointNumber('proposed')` returns the proposed-tip
number with confirmed fallback.
- `getCheckpoint({ number }) / ({ slot })` now check confirmed first
then fall back to proposed; tag-based lookups (`'checkpointed'` /
`'proven'` / `'finalized'`) do not fall back.
- `getCheckpoint('proposed', { includeL1PublishInfo: true |
includeAttestations: true })` and the same flags on a by-number/by-slot
lookup that resolves to a proposed entry now throw `BadRequestError`
(proposed checkpoints have no L1 publish info or attestations).

### Types kept

`CheckpointData`, `CommonCheckpointData` (structural base of
`CheckpointData` / `ProposedCheckpointInput`), `ProposedCheckpointData`,
`ProposedCheckpointInput`, `PublishedCheckpoint`, `Checkpoint`. No
structural-type deletions.

Migration guidance for wallet/SDK consumers is in
`docs/docs-developers/docs/resources/migration_notes.md`.

## Changes

- **stdlib**: New query types (`CheckpointQuery`, `CheckpointsQuery`,
`ProposedCheckpointQuery`) + Zod schemas in `block/l2_block_source.ts`.
`'latest'` literal removed from `interfaces/checkpoint_parameter.ts`.
`NormalizedCheckpointDispatch` type for the server's parameter
normalizer. `ArchiverApiSchema` and `AztecNode` schema updated.
`computeL2ToL1MembershipWitness` switched to the new query shape.
- **archiver**: `data_source_base` adds `resolveCheckpointQuery` /
`resolveCheckpointsQuery` mirroring the block-side helpers, implements
the 4 confirmed methods plus the polymorphic proposed lookup.
`BlockStore` adds `getProposedCheckpointBySlot(slot)`. `MockArchiver`
and `mock_l2_block_source` updated to match the new interface.
- **aztec-node**: `server.ts` adds the confirmed→proposed fallback flow
with the two `BadRequestError` guards in `getCheckpoint`, sources all
tips from a single `getL2Tips()` call in `getCheckpointNumber`, and
routes the public RPC through the new internal methods. New
pure-projection helper `projectProposedToCheckpointResponse` in
`block_response_helpers.ts`.
- **consumer migrations**: prover-node (collapses two checkpoint fetches
into one `getCheckpoints({ epoch })`), world-state, slasher, sequencer
(`checkpoint_proposal_job`, `sequencer`), validator
(`proposal_handler`), `L2BlockStream`, pxe `block_stream_source`,
telemetry wrapper, and 10 e2e files updated to the new query shapes.
- **tests**: 48 new `it()` blocks covering each query discriminant, the
throw guards, the confirmed→proposed fallback, the polymorphic
`getProposedCheckpointData` dispatch, and
`BlockStore.getProposedCheckpointBySlot`.
- **docs**: `migration_notes.md` updated with the breaking changes for
downstream wallet/SDK consumers.
…oposal check (#22989)

## Motivation

`hasPayloadBeenProposed` (now `hasActiveProposalWithPayload`) used
`eth_getLogs` over the rollup's full L1 deployment range to find prior
`PayloadSubmitted` events. On long-lived rollups that range exceeds
typical RPC provider block-range caps and the call times out, silently
breaking the sequencer's "stop signaling for an already-proposed
payload" logic. The previous in-memory cache also permanently
blacklisted any payload it saw as proposed once, which is wrong: each
round on `EmpireBase` is independent and the same payload can
legitimately be re-signaled and re-submitted after a prior proposal
becomes Dropped/Rejected/Expired/Executed.

## Approach

Replace the log scan with a bounded view-call sweep over
`Governance.proposals`. The sweep walks newest -> oldest using
`proposalCount`, unwraps each proposal's `GSEPayload` via
`getOriginalPayload()`, and treats only
`Pending`/`Active`/`Queued`/`Executable` as "in an active proposal" --
terminal states allow re-signaling. The descent has a hard early-stop on
the protocol-wide proposal lifetime cap (`4 *
ConfigurationLib.TIME_UPPER = 360 days`), which is safe regardless of
per-proposal frozen configs because every config field is bounded by
`TIME_UPPER` on-chain. Two in-memory caches absorb the per-call cost
over time: terminal proposals (provably immutable on-chain) and wrapper
-> original payload unwraps (immutable bytecode).

## Changes

- **ethereum/contracts/governance**: New
`hasActiveProposalWithPayload(payload)` and `getProposalCount()` on
`ReadOnlyGovernanceContract`. Inlines a minimal `IProposerPayload` ABI
(just `getOriginalPayload`) to avoid generating a full artifact. Handles
`proposeWithLock`-style proposals (no GSEPayload wrapper) by catching
the unwrap revert and skipping.
- **ethereum/contracts/governance (types)**: Adds explicit types
(`Proposal`, `ProposalConfiguration`, `GovernanceConfiguration`,
`ProposeWithLockConfiguration`, `Ballot`) and maps the viem return
shapes of `getProposal` / `getConfiguration` onto them. `Proposal` now
carries both `cachedState` (raw stored) and `state` (live, time-derived
from `getProposalState`); `getProposal` issues both reads in parallel so
callers don't need a separate state RPC.
- **ethereum/contracts/governance (caching)**: Adds two memoization
layers on `ReadOnlyGovernanceContract`. Proposals are cached when
`state` is in any of the four terminal phases
(Executed/Rejected/Dropped/Expired) -- once terminal the entire struct
is provably immutable on-chain. Wrapper unwraps are keyed by wrapper
address and cached forever (deployed bytecode is immutable).
`GovernanceProposerContract` already memoizes its `getGovernance()`, so
the same `ReadOnlyGovernanceContract` instance (and its caches) is
reused across slots in the sequencer publisher.
- **ethereum/contracts/governance_proposer**: Drops the event-based
`hasPayloadBeenProposed`. Adds a memoized `getGovernance()` accessor and
a thin `hasActiveProposalWithPayload` delegate that resolves the
Governance address via the on-chain registry lookup.
- **ethereum/contracts/empire_base**: Removes `hasPayloadBeenProposed`
from `IEmpireBase` -- it's a Governance concern, not a generic empire
concern (slasher doesn't need it).
- **sequencer-client/publisher**: Removes the permanent
`payloadProposedCache` so the publisher re-checks every slot, allowing
re-signaling once a prior proposal is terminal. Switches the failure
mode from fail-closed to fail-open (a flaky L1 endpoint should not
silence governance participation; a duplicate signal is harmless).
Narrows the helper's `base` param from `IEmpireBase` to
`GovernanceProposerContract` since this code path is governance-only.
- **ethereum/contracts (tests)**: New `hasActiveProposalWithPayload`
describe block hitting a real anvil-deployed Governance. Impersonates
the `governanceProposer`, calls `Governance.propose` directly, and
etches hand-rolled mock wrapper bytecode at chosen addresses to drive
(wrapper, original) pairs. Covers: empty governance, live match, no
match, terminal state via warp, reverting wrapper
(proposeWithLock-style), descent past unrelated proposals,
case-insensitive match, and the 360-day hard cutoff via warp. Also adds
a sync-guard describe block that probes `Governance.updateConfiguration`
via impersonated `eth_call` to assert each of
`votingDelay`/`votingDuration`/`executionDelay`/`gracePeriod` accepts
`TIME_UPPER` and rejects `TIME_UPPER + 1` -- if those caps change
on-chain, this trips and `MAX_PROPOSAL_LIFETIME_SECONDS` must be
revisited.
- **sequencer-client/publisher (tests)**: Replaces the cache test with a
"re-checks each call so re-signaling resumes after terminal" test.
Updates the RPC-failure semantics test from fail-closed to fail-open.
…ile in CI (#23000)

## Summary

Fixes the `docs` build failure on `merge-train/spartan` (CI run
[25449092262](https://github.com/AztecProtocol/aztec-packages/actions/runs/25449092262),
log [27a4351a1e5e3568](http://ci.aztec-labs.com/27a4351a1e5e3568)).

## Problem

`validate-webapp-tutorial` in `docs/examples/bootstrap.sh` intentionally
starts each run with an empty `yarn.lock`, then runs `yarn install` to
populate it from the `link:` paths it just wrote into `package.json`. In
CI, Yarn 4 auto-enables `--immutable` when it detects `CI=1`, so the
install fails with `YN0028 (frozen lockfile exception)` because
populating an empty lockfile counts as modifying it.

```
➤ YN0028: │ The lockfile would have been modified by this install, which is explicitly forbidden.
➤ YN0000: · Failed with errors in 6s 829ms
ERROR: Contract artifact not found at /home/aztec-dev/aztec-packages/docs/target/pod_racing_contract-PodRacing.json
```

(The "Contract artifact not found" line is a downstream symptom — the
script doesn't run with `set -e`, so after `yarn install` fails it
continues into the artifact check and reports a misleading error.)

## Fix

Set `YARN_ENABLE_IMMUTABLE_INSTALLS=false` for that one `yarn install`
call, since populating the lockfile is the intended behaviour.

## Verification

Reproduced locally: `CI=true yarn install` against the webapp-tutorial
fails with `YN0028`; with `YARN_ENABLE_IMMUTABLE_INSTALLS=false` it
succeeds.

ClaudeBox log: https://claudebox.work/s/a1863de35053b544?run=1
@spalladino spalladino requested a review from a team as a code owner May 6, 2026 18:24

@ludamad ludamad left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Auto-approved

@AztecBot AztecBot added this pull request to the merge queue May 6, 2026
@AztecBot

AztecBot commented May 6, 2026

Copy link
Copy Markdown
Collaborator Author

🤖 Auto-merge enabled after 4 hours of inactivity. This PR will be merged automatically once all checks pass.

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 6, 2026
AztecBot and others added 4 commits May 7, 2026 04:17
…22994)

## Motivation

The `aztec.archiver.block_height` series with no status attribute
(rendered as the "Pending chain" line on the network, prover, and
fisherman Grafana dashboards) stopped being published a couple of weeks
ago. With pipelining enabled every checkpoint arriving from L1 already
has its blocks in the proposed store, so the L1 synchronizer always took
the new promotion fast path introduced in #22716, leaving
`checkpointsToAdd` empty and skipping the metric call.

## Approach

Record the checkpointed block-height metrics across all valid
checkpoints in the batch instead of only the ones routed through
`addCheckpoints`, so the promoted checkpoint contributes too. The
duration is averaged over the full batch since `addCheckpoints` performs
the work for both paths in a single transaction.

## Changes

- **archiver (`l1_synchronizer.ts`)**: Move the
`processNewCheckpointedBlocks` call to use `validCheckpoints` rather
than `checkpointsToAdd`, restoring the empty-status `block_height`,
`checkpoint_height`, `sync_block_count`, and `sync_per_checkpoint`
series under pipelining.

---------

Co-authored-by: Alex Gherghisan <alexghr@users.noreply.github.com>
@spalladino spalladino added the claudebox Owned by claudebox. it can push to this PR. label May 12, 2026
spalladino and others added 4 commits May 12, 2026 14:19
…23186)

## Motivation

The top-level `--archiver` flag was removed from `aztec start`, but
several scripts, Helm/Terraform values, and docs still pass it. Leaving
these in place would break node and prover startup once they pick up the
new CLI.

## Approach

Grepped the repo for bare `--archiver` (excluding nested
`--archiver.<option>` flags, which are still valid) and removed every
occurrence from start commands, docs, and the bot CLI handler. Also
dropped the now-stale check for an `archiver` option in `start_bot.ts`
and a stray comment in `aztec_start_options.ts`.

## Changes

- **docker-compose.yml**: drop `--archiver` from the node entrypoint
- **spartan (helm + terraform values)**: remove `--archiver` from
`aztec-node`, `aztec-validator`, `aztec-prover-stack`, and the
`full-node`, `rpc`, `archive`, `blob-sink` terraform values; update
`aztec-node/README.md` examples and options table
- **yarn-project/aztec**: drop `archiver` from the unsupported-flags
check in `start_bot.ts`; remove stale comment in
`aztec_start_options.ts`
- **docs/docs-operate**: drop `--archiver` from the
node/prover/sequencer setup, troubleshooting, and CLI reference pages;
reword the reference prose to use `--archiver.blobSinkUrl` as the
example

Versioned snapshots under `docs/network_versioned_docs/version-v4.2.0/`
are intentionally left untouched.
When the open merge-train/spartan PR has been open >24h, post a one-line
alert to #team-alpha. The cron fires once per day, so the channel sees
at most one notification per stuck day. Silent on healthy days.

## Files

- `ci3/merge_train_stale_check` — bash script: queries the open PR for a
merge-train branch, computes age from `created_at`, and posts a
`:warning:` Slack message via `ci3/slack_notify` if age >=
`$STALE_HOURS` (default 24).
- `.github-new/workflows/merge-train-stale-check.yml` — daily schedule
(`7 9 * * *`, 09:07 UTC) + `workflow_dispatch`. Calls the script for
`merge-train/spartan` → `#team-alpha`.

## ⚠ Move workflow into `.github/workflows/` before merging

The workflow is under `.github-new/` because this session was not
started with the `ci-allow` prefix (the prefix needs to be the first
token of the prompt; mid-message `ci-allow` was not picked up by the
session parser, so `.github/` was still blocked). Before merging, move
the file:

git mv .github-new/workflows/merge-train-stale-check.yml
.github/workflows/merge-train-stale-check.yml

Scheduled workflows only execute from the default branch, so the
notifier only starts firing once it has landed on `next`.

## Behaviour

| State of the open `merge-train/spartan` PR | Action |
|---|---|
| No open PR (just merged, awaiting auto-recreate) | Silent — no Slack
post. |
| Open < 24 h (`STALE_HOURS`) | Silent — within expected merge window. |
| Open ≥ 24 h | One `:warning:` line to `#team-alpha` with PR link +
`mergeable_state`. |

## Reuse

Other teams can wire in their own merge-train by adding a job that calls
`./ci3/merge_train_stale_check <branch> <channel>`. Threshold and base
branch are overridable via `STALE_HOURS` / `BASE_BRANCH` env vars.

## Motivation

Driven by a Slack request: merge-train/spartan PR
#22980 has been
stuck on conflicts for ~6 days with no automated notification.

ClaudeBox log: https://claudebox.work/s/e4b1d8ae8d5c867b?run=2

---------

Co-authored-by: Santiago Palladino <santiago@aztec-labs.com>
- Preserve existing JSON file mode when stamping Aztec versions into
Noir contract artifacts.
- Prevent release images from containing root-only-readable account
artifacts used by Spartan deploy jobs.
@AztecBot AztecBot added this pull request to the merge queue May 12, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 12, 2026
AztecBot and others added 8 commits May 12, 2026 15:44
…story (#23160)

## Summary

Test logs print a **History** link like
`…/list/history_<hash>_<TARGET_BRANCH>`. For
`TARGET_BRANCH=merge-train/spartan` the URL contains a `/`, and Flask's
default `/list/<key>` converter only matches a single path segment, so
the link 404s. Percent-encoding (`%2F`) doesn't help: WSGI (gunicorn)
URL-decodes `PATH_INFO` per PEP 3333, so by the time Werkzeug routes the
request, the `%2F` is already a `/`.

Fix: change the route to `/list/<path:key>`, which matches slashes. This
makes the existing history links work — and recovers all data already
written to Redis under keys like `history_<hash>_merge-train/spartan`
(history tracking for `merge-train/*` is already enabled by the existing
condition in `ci3/run_test_cmd`).

This commit reverts the earlier producer-side sanitization in
`ci3/run_test_cmd` / `ci3/exec_test`. Doing it in the producer would
leave existing entries orphaned under the old slash keys; the
dashboard-side fix avoids the split.

Note: dashboard changes ship via `ci3/dashboard/deploy.sh` (manual rsync
+ `systemctl restart rkapp`).

Reproducer: http://ci.aztec-labs.com/54e749c45512a629 → click
**History**.

Background:
https://gist.github.com/AztecBot/33fcdd84eba7b273d3f67dfd2ad6be8f

## Test plan

- [ ] After `ci3/dashboard/deploy.sh`, the History link on a test run on
a `merge-train/*` PR resolves to the existing list.
- [ ] Existing `/list/<key>` URLs without slashes (e.g. `…_next`,
`…_v4`) continue to work.
…a-ci (#23219)

Santiago caught that channel routing alone wasn't enough — today no
flake notifications fire for `merge-train/spartan` at all. The
`merge-train/spartan` PR's full test suite runs on `pull_request` events
(label `ci-full-no-test-cache`), where `REF_NAME=merge-train/spartan`
and `is_merge_queue=0`, so the existing `slack_notify_flake=1` trigger
never matches. Only the rare `merge_group` runs would have qualified.

Two changes in `ci3/run_test_cmd`:

1. Extend the trigger so `slack_notify_flake=1` also fires when
`REF_NAME == merge-train/spartan` (mirroring the existing
`backport-to-v2-staging` case).
2. Add a `flake_slack_channel` resolver that maps the PR head branch to
a Slack channel: `merge-train/spartan` → `#team-alpha-ci`
(`C0B3EFDPT7B`); everything else falls through to `slack_notify`'s
default `#aztec3-ci`. The resolver uses `REF_NAME` directly for
`pull_request` runs and falls back to `gh pr view <num>` (parsed from
`gh-readonly-queue/<base>/pr-<num>-<sha>`) for `merge_group` runs.
Result is cached in `/tmp` so parallel tests on the same EC2 instance
share a single resolution.

Design notes and rationale:
https://gist.github.com/AztecBot/2d706371f8dcb7386880859d69a90435
…23213)

## Summary

Fixes the merge-queue failure in
`e2e_blacklist_token_contract/shielding` ([CI
run](http://ci.aztec-labs.com/d5485e6652b3f32a)) where every test fails
in `applyMint` with `Invalid tx: Invalid expiration timestamp`.

## Root cause

`warpL2TimeAtLeastTo` (introduced in #22084) calls `eth.warp` followed
by `node.mineBlock()`. The sequencer's polling loop captures
`nowSeconds`/`slot` at the top of each `work()` cycle. An in-flight
cycle that started just before the warp will mine an L2 block at the
*pre-warp* slot — L1 sync prunes that block from the canonical chain,
but it lingers in local world state and the PXE anchors subsequent txs
against it. With `MAX_TX_LIFETIME == CHANGE_ROLES_DELAY == 86400s`, the
resulting `expiration_timestamp` lands exactly on the post-warp slot
boundary and the validator rejects the tx as soon as the wall-clock
crosses to the next slot.

## Fix

After `eth.warp`, retry `mineBlock` until the latest L2 block's slot is
at or past the slot corresponding to the warped timestamp. The first
`mineBlock` may return a stale block produced by an in-flight cycle; the
next triggers a fresh sequencer cycle that reads the post-warp time and
builds a block at the post-warp slot. Subsequent txs then anchor against
a fresh block whose `expiration_timestamp` is well in the future.

The signature of `warpL2TimeAtLeastTo`/`warpL2TimeAtLeastBy` widens from
`AztecNodeDebug` to `AztecNode & AztecNodeDebug` so we can read the
latest block via `getBlockData('latest')`. All current callers already
type their node as the intersection.

This re-applies the diagnosis from the prior #22796 (which never
merged), adapted to the current `getBlockData('latest')` API.

Full analysis:
https://gist.github.com/AztecBot/67815cbe3c3f853d97ec3345dfb0c985

## Test plan

- `e2e_blacklist_token_contract/shielding` (originally failing)
-
`e2e_blacklist_token_contract/{access_control,burn,minting,transfer_*,unshielding}`
— share `applyBaseSetup` → `crossTimestampOfChange`
- `e2e_contract_updates`
- `composed/e2e_cheat_codes` (verifies the type-widening change still
resolves the methods correctly)

ClaudeBox log: https://claudebox.work/s/28594b4dc64f1cd0?run=1
…pair (#22996)

Introduces a sub-tree + top-tree orchestrator pair that decomposes the
existing single-class proving orchestrator along the natural
state-coupling boundary — per-checkpoint block-level work vs.
epoch-level top-tree work — while leaving every existing API on the
legacy `EpochProver` / `ProvingOrchestrator` / `EpochProvingState` path
untouched. The prover-node and e2e tests build unchanged; this PR is
purely additive in surface area, with structural refactors on
`ProvingOrchestrator` to share scheduling and top-tree drivers with the
new `TopTreeOrchestrator`.

## What's new

- **`CheckpointSubTreeOrchestrator`**
(`checkpoint-sub-tree-orchestrator.ts`): extends `ProvingOrchestrator`,
single-checkpoint by construction. Drives chonk-verifier / base / merge
/ block-root / block-merge for one checkpoint and resolves a
`SubTreeResult` instead of escalating to the checkpoint root — the
parent's `checkAndEnqueueCheckpointRootRollup` is overridden to
short-circuit. The constructor calls `super.startNewEpoch(epoch, 1,
empty challenges)` to set up a single-checkpoint mini-epoch; the count
and challenges are never read because the override prevents the parent's
finalize / root path from running.

- **`TopTreeOrchestrator`** + **`TopTreeProvingState`**: self-contained
driver from checkpoint-root through epoch-root rollup. Takes
per-checkpoint block-proof promises and pipelines its hint chain against
them. Cancellation surfaces as `TopTreeCancelledError` so callers can
distinguish reorg-driven cancel from a genuine proving failure.

- **`EpochProvingContext`** (`epoch-proving-context.ts`): per-epoch
shared cache for chonk-verifier proofs. Survives sub-tree cancellation
so a tx that gets reorged out and re-appears in a replacement checkpoint
reuses the cached proof.

- **`ProvingScheduler`** (`proving-scheduler.ts`): abstract base owning
the `SerialQueue` deferred-job lifecycle, the `pendingProvingJobs`
controller list, and a unified `deferredProving<S, T>(state, request,
callback, isCancelled?)` submit envelope. The minimal `ProvingStateLike`
contract is just `verifyState()` + `reject(reason)`.

- **`TopTreeProvingScheduler`** (`top-tree-proving-scheduler.ts`):
extends `ProvingScheduler` and holds the checkpoint-merge, padding, and
root-rollup drivers (plus tree-walking helpers) shared by both
orchestrators. Wraps circuit calls via a `wrapCircuitCall` hook
(orchestrator overrides for spans; top-tree leaves identity) and
resolves via an `onRootRollupComplete` hook to bridge the two states'
differing `resolve` signatures. The per-checkpoint root driver stays
subclass-specific because input-building flows differ.

- **`EpochProverFactory` interface on `ProverClient`**: new factory
methods `createEpochProvingContext(epochNumber)`,
`createCheckpointSubTreeOrchestrator(...)`, and
`createTopTreeOrchestrator()`. A single shared
`BrokerCircuitProverFacade` is owned by `ProverClient` and shared across
every orchestrator.

## What changes in existing code

- `ProvingOrchestrator` extends `TopTreeProvingScheduler`; the inline
broker-job submit envelope, queue lifecycle, and the top-tree-section
drivers are inherited. `cancel()` delegates the queue-recreate +
abort-jobs logic to `resetSchedulerState(this.cancelJobsOnStop)`. Three
internal methods (`getOrEnqueueChonkVerifier`,
`checkAndEnqueueBaseRollup`, `checkAndEnqueueCheckpointRootRollup`)
become `protected` so the sub-tree can override them; `provingState` and
`provingPromise` likewise become `protected` so the sub-tree can hook
the parent's failure stream onto `subTreeResult`. No public API change
on `ProvingOrchestrator`.
- `CheckpointProvingState`: gains two read-only accessors used by the
sub-tree's checkpoint-root override — `getSubTreeOutputProofs()` and
`getLastArchiveSiblingPath()`. No state changes.
- `ProverClient` keeps `createEpochProver()` exactly as before (each
call spawns its own `BrokerCircuitProverFacade`); the new factory
methods share a `getFacade()` set up in `start()` and torn down in
`stop()`.

`EpochProver`, `EpochProverManager`, `ServerEpochProver`,
`EpochProvingState`, the integration tests in `orchestrator_*.test.ts`,
`bb_prover_full_rollup.test.ts`, and `stdlib/interfaces/*` are all
unchanged from `merge-train/spartan` — the prover-node and e2e tests
continue to build against the existing `EpochProver` API. Migrating the
prover-node onto the new factories (and the deferred-finalize flow that
goes with optimistic proving) is the follow-up PR.

## Test plan

- 261 prover-client tests pass (full `yarn workspace
@aztec/prover-client test`).
- `yarn build` clean against current merge-train/spartan (modulo the
pre-existing `@aztec/sqlite3mc-wasm` issue inherited from baseline).
@alexghr alexghr added this pull request to the merge queue May 13, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 13, 2026
AztecBot and others added 7 commits May 13, 2026 11:59
…#23244)

## Why

`bb::srs::http_download` made a single HTTP request and threw on any
error — including transient ones like `Could not establish connection`.
With 10 parallel grind shards in `merge-queue-heavy` and `parallel
--halt now,fail=1` in `ci.sh`, a single CDN blip on one shard kills the
whole merge-train run.

That's what failed the most recent merge-train/spartan MQ attempt:
`CrsFactory.Bn254CompressedChunkHashFirstChunk` on shard `x5-full` of
run
[25795648831](https://github.com/AztecProtocol/aztec-packages/actions/runs/25795648831),
with `HTTP request failed for
http://crs.aztec-cdn.foundation/g1_compressed.dat: Could not establish
connection`. No code regression — pure network flake.

## What

Add a bounded retry loop inside `http_download`:

- Up to 3 attempts total
- Retry on connection-class errors (`!res`) and on transient HTTP status
(5xx, 429); don't retry on other 4xx
- After the first failure, tighten the per-attempt connect/read timeouts
to 5s (down from 30s/60s) so retries don't burn the original timeout
budget twice
- Exponential backoff between retries (1s, 2s)
- `vinfo` per retry; throw with attempt count on terminal failure

## Latency budget

Retry-induced extra latency (beyond the first attempt) is bounded:

```
backoff 1s + retry 5s + backoff 2s + retry 5s = 13s   (< 15s)
```

Well within the 600s test timeout, and small enough that a fully-down
CDN fails fast rather than dragging out the grind shard.

WASM path is untouched — it still throws immediately, same as before.
- Removes use of old reqresp method `sendBatchRequest`.
- Lifts code from `proposal_tx_collector.ts` to FastTxCollection.

Testing

- Tests for TxCollection were using the old mechanism so they had to be
migrated.
- I had a good fight with tests for TxCollection because I wanted to
keep things clean without going too much into the p2p network, but still
test something. I ended up making some internals of FastTxCollection
protected and using them in the test. This is ugly, but was already
partially being done. Hopefully we can improve on it with a bigger
refactor.
## Motivation

The top-level `sequencer-client/README.md` was years out of date — it
still referred to single-block-per-slot building and made no mention of
proposer pipelining or the multi-block checkpoint model. The
timing-model README still documented both pipelined and non-pipelined
scheduling even though the non-pipelined mode is about to be removed.
New contributors (human or AI) lacked the context they need to make
changes to block building.

## Approach

Rewrote the top-level README from scratch following the package's
`readme-writer` guidelines: slots / blocks / checkpoints, proposed vs
checkpointed chain, an architecture diagram, the `Sequencer` work loop,
`CheckpointProposalJob` lifecycle, per-block loop pseudocode, the
`SequencerPublisher` Multicall3 bundling and `sendRequestsAt` semantics,
events, configuration reference, and failure modes. Trimmed
`src/sequencer/README.md` to cover only the pipelined timing model with
formulas grounded in `PipelinedCheckpointTimingModel` and a corrected 72
s / 8 s walkthrough. Ran `/codex` for a critical review and fixed all
flagged issues (last-sub-slot-is-not-cooldown, event-emit timing, config
env-var names, attestation-deadline nuance, `insufficient-valid-txs`
handling, publisher `preCheck` semantics).

## Changes

- **sequencer-client**: Replaced `README.md` with an architecture-first
rewrite covering pipelining (build slot vs target slot, depth bound of
2, parent-invalidation discard), the per-slot job lifecycle, the
publisher's Multicall3 flow, and the full config reference.
- **sequencer-client (sequencer)**: Replaced `src/sequencer/README.md`
with a pipelining-only timing model. Documents `timeReservedAtEnd`,
`maxNumberOfBlocks`, per-state deadlines, proposer-vs-committee parallel
timeline, and timing-variation handling.
…imulatePublicCalls (#23163)

## Motivation

`simulatePublicCalls` forks the world state at the latest synced block
but never inserted the L1 to L2 messages that would be added at the
start of the next checkpoint, if the next block falls in a new
checkpoint.

## Approach

If the last proposed block matches the last block in the last proposed
checkpoint (read it carefully, I promise it makes sense), then the last
proposed block is the last block in its checkpoint, so the next block
will land on a new checkpoint, so we add the L1 to L2 messages to the
world-state fork before simulating.
@PhilWindle PhilWindle enabled auto-merge May 13, 2026 15:00
@PhilWindle PhilWindle added this pull request to the merge queue May 13, 2026
Merged via the queue into next with commit 25fa2cc May 13, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-full-no-test-cache ci-no-squash claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants