Skip to content

Feat: Batched Execution, improved benchmarks, better gas optimization#71

Merged
sudo-owen merged 35 commits into
mainfrom
claude/batched-from-main
Jun 2, 2026
Merged

Feat: Batched Execution, improved benchmarks, better gas optimization#71
sudo-owen merged 35 commits into
mainfrom
claude/batched-from-main

Conversation

@sudo-owen
Copy link
Copy Markdown
Collaborator

No description provided.

sudo-owen and others added 30 commits May 28, 2026 16:17
All per-turn-mutable fields (winner/flags/activeMonIndex/lastExecTs/turnId) packed into slot 1
so per-turn BattleData mutations coalesce to one SSTORE (main wrote 2: turnId in slot 0).
turnId uint64->uint16, lastExecuteTimestamp uint48->uint40. Engine field access unchanged
(Solidity handles the new slots). Verified: EngineTest 50/50, InlineEngineGasTest 3/3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ness

- Engine.executeBatchedTurns: loops _executeInternal with DIRECT storage (no transient shadow);
  EVM warm-slot discount amortizes cold SLOADs across the single tx for free. + getStorageKey /
  getSubmitContext, resetCallContext clears per-turn transients.
- SignedCommitManager: moveBuffer + bufferCounters + submitTurnMoves (SINGLE-SIG: msg.sender ==
  committer, revealer sig pins the committer move hash) + executeBuffered + pack/unpack helpers.
- Structs.TurnSubmission (no committer sig). IEngine surface. test/abstract/BatchHelper (single-sig).
- RealMonReplayGasTest: faithful 26-turn real-game replay via SetupMons reuse; asserts
  legacy==batched end state and reports production-faithful steady-state gas.

Result (real 26-turn game, vm.cool steady-state): clean-batched 4,584,625 vs main 5,277,953
= -693,328 (-13.1%); clean-legacy ~= main (no regression). Equivalence verified.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
p0 submits both their move and the CPU's move (computed client-side) in one tx; engine executes
directly, skipping getCPUContext (dozen+ cold SLOADs) + calculateMove every CPU turn. Trust model:
lying only weakens the CPU against p0 (PvE self-handicap); msg.sender == p0 binding unchanged.
Build + CPU suites pass (BetterCPU 52, FairCPU/OkayCPU).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ort.py)

Parses a battle desync report (teams + per-turn moveIndex/salt/extraData) into the Solidity
team monIds + per-slot MonStats + deduped Turn[] sequence for a RealMonReplayGasTest-style
faithful replay. Verified: output reproduces the hand-written 26-turn test data byte-for-byte
from the raw fixture, so any real prod game becomes a gas + equivalence regression.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Batching reduces gas ~13% vs main on a real 26-turn game (equivalence-verified); the prior branch's
transient shadow was counterproductive (EVM already amortizes warm slots free); methodology +
remaining A2 follow-up documented.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…=committer)

Drop the committer signature; bind the committer via msg.sender + the revealer sig (which pins
(battleKey, turnId, committerMoveHash)). Saves ~3.6k/turn on the legacy fallback (clean-legacy
5,296,078 -> 5,201,946). Rewrote the ~15 dual-sig test sites + 4 security tests to the single-sig
model (unilateral-revealer -> NotCommitter; third-party relay -> NotCommitter; committer-move-
changed -> InvalidSignature; replay still prevented by turnId binding). All 506 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…asure helper

- Extract FullyOptimizedInlineGasTest from InlineEngineGasTest.sol into its own file
  (one contract per file; matches the already-separate snapshot JSONs). Behavior-preserving
  (both suites pass, snapshots unchanged).
- test/abstract/GasMeasure.sol: shared production-faithful gas measurement — per-tx cold
  accounting (vm.cool) + a deterministic storage-access tally (cold/warm SLOAD, SSTORE tiers),
  with _snapScenario() recording tally + cold-per-tx gas. Basis for converting the gas tests
  off the all-warm gasleft span (which masks cold-access regressions).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e subset InlineEngineGasTest

- FullyOptimizedInlineGasTest (the production stack: inline validation/RNG/stamina + SignedMatchmaker
  + single-sig dual-signed) now measures each battle per-turn as a cold-start tx (vm.cool) with a
  deterministic storage-access tally (cold/warm SLOAD, z->nz/nz->nz/no-op SSTORE) + cold-per-tx gas.
  Cool+tally is gated inside _fastTurn/_fastSwitchReveal by _measuring, so battle spans just wrap
  with _beginMeasure()/_endMeasure() (no per-turn churn). Setup spans dropped (gasleft polluted by
  the tally's cumulative memory); storage-reuse now asserted via Battle3 < Battle1 cold gas
  (Battle3 replays Battle1 but reuses storage: zToNz 30 -> 4).
- Removed InlineEngineGasTest (+ snapshot): a strict subset of FullyOptimized's optimizations
  (inline validation only, on the slower commit-reveal flow).
- _tally sizes its dedup scratch to the actual per-window access count (was a fixed 8192 array,
  which OOG'd when called once per turn across a battle).

All 503 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uction stack)

EngineGasTest benchmarked the external-validator + commit-reveal config, which isn't what we ship
(production uses inline validation + dual-signed). FullyOptimizedInlineGasTest (new GasMeasure
format) is the production-faithful gas tracker. All 498 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…regen config

The replay was measuring DefaultRuleset(new StaminaRegen()) — the SLOW external stamina-regen
path. Production uses INLINE_STAMINA_REGEN_RULESET (engine-internal regen), which avoids the
per-round-end/after-move reentrant calls (getPlayerSwitchForTurnFlag, getMoveDecisionForBattleState,
stamina getMonStateForBattle). Switching the replay to the prod config:

  clean-legacy : 5,201,946 -> 4,624,316  (inline saves ~577k, ~11%)
  clean-batched: 4,583,171 -> 4,106,467  (inline saves ~477k, ~10%)
  batching delta under prod config: 517,849 (~11.2%)

Inline regen alone saves more than batching does. The old external-regen main baseline (5,277,953)
is no longer comparable and the misleading 'batched < MAIN' line is dropped. The reentrant-read
breakdown that was steering optimizations must be redone against this config.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merges the stale OPT_PLAN.md (described the abandoned transient-shadow / dual-sig / CPU-state-hint
design) and ANALYSIS_BATCHED_GAS.md (external-regen baseline) into one accurate record:
- Corrected production-faithful headline (inline config): legacy 4,624,316 / batched 4,106,467.
- Inline stamina regen was the dominant win (~11%, config not code); batching ~11.2% on top.
- Documents what was tried and rejected with measured reasons: transient shadow (-94k), #4 no-op
  SSTORE guard (regressed every scenario), #6 transient-reset trim (breaks equivalence), delegatecall
  moves (no SLOAD saving + storage-corruption risk).
- Ranks remaining opportunities honestly; flags the CPU/single-player one-tx batch-submit as the
  biggest remaining lever (gated on whether the no-batch constraint is PvP-fairness-only).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Convert the IMoveSet single-effect scans (DeepFreeze frostbite, IronWall/Somniphobia idempotency,
GildedRecovery status-removal, NightTerrors terror-count + sleep-check, Baselight import) from
getEffects()+in-memory-scan to the targeted engine.getEffectData() finder, which locates the effect
internally and returns (exists, index, data) without materializing the full EffectInstance[] array.

Semantically identical (494 tests pass, real-game equivalence holds). getEffectData rollout saves
~49k/game on the real replay (legacy), dominated by Baselight's move-facing level reads. Ability
activateOnSwitch scans are NOT converted — those abilities are inline-encoded and never make the
external call. MegaStarBlast left as-is (scans for Overclock by address AND data, not a first-match).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mons faceted)

Step-0 measurement for PLAN_STARTBATTLE.md. Measures Engine.startBattle via DefaultMatchmaker
confirmBattle against the REAL GachaTeamRegistry with a facet on every mon (ALICE faceted via
assignExp->assignFacets; CPU phantom via setOpponentTeam), so getTeams pays the full facet-delta fold
the mock registry hid. Reports per-account SSTORE/SLOAD (engine = team-store/clear, registry =
getTeams) for COLD (fresh key) and STEADY (recycled key) regimes.

Findings: getTeams = 87 registry SLOADs (~180k, regime-independent); team-store = ~56 engine SSTOREs
(z->nz cold ~1.25M / nz->nz recycled-different-team ~157k / ~0 same-team). Realistic prod startBattle
~420k. Both L1 (shrink Mon record) and L2 (registry facet-fold cache) are ~160-180k levers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ARM-DIFF/WARM-SAME)

Optimize for the warm steady state, not cold. Distinct per-mon data + a CPU-team swap between
battles so the recycled config is genuinely overwritten. Key measured result:

  COLD (first-ever key)        : 1,420,947   (64 z->nz)
  WARM, recycled key, team diff:   268,934   (32 nz->nz)
  WARM, recycled key, team same:   268,934   ( 8 nz->nz)

WARM-DIFF == WARM-SAME exactly: in the warm regime the SSTORE value tier (nz->nz vs no-op) is NOT
the cost driver — per-slot EIP-2929 cold-access (~2.1k/slot first-touch) is. Recycling already
collapses z->nz (22k) to ~2.2k, making nz->nz and no-op indistinguishable. Implication: only
reducing the NUMBER of slots touched helps in warm (stable-team-cache ~-200k; fixed-size moves
~-17k); value/no-op-guard tricks save ~0. Confirms why the #4 no-op guard flopped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
startBattle called validateMatch twice (p0, then p1); the 2nd call re-read the proposal/storageKey
slots (warm) + paid a 2nd dispatch. Batch into one call that reads the proposal pair once and checks
both players. IMatchmaker signature validateMatch(battleKey, player) -> (battleKey, p0, p1); updated
DefaultMatchmaker (real check), SignedMatchmaker + CPU (stubs), and the engine call site.

Measured on StartBattleGasTest: warm startBattle 268,934 -> 267,637 (~-1.3k); cold 1,420,947 ->
1,419,654. 495 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Post-mortem in GAS_OPTIMIZATION.md §5. The cache traded a ~180k startBattle
SSTORE saving for per-read monId indirection + facet fold on the execute hot
path (~16 _getTeamMon calls/turn, each its own cold-start tx). The per-battle
Mon[] is read-hot, not write-once, so the dedup's saving is paid back on every
read. Micro-benchmark (TeamMonReadGasTest, faithful layout, cold-started):
execute +11.3k/turn (full cache, field accessors) vs ~0 (hybrid). Net only
~+44k over a 12-turn battle, net-negative on long battles, and it breaks the
gas-neutral-on-hot-paths constraint.

The implementation arc (slice1a/1b-i/0F/1b-ii, tip 4da60a2) was reset to
7ea210d (validateMatch batch kept). Per-team facets (2125e22) was a cache
enabler reverted to per-mon; cherry-pickable as a standalone feature.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
executeBatchedTurns loops _executeInternal, which emitted EngineExecute (LOG2,
~1,125 gas) on every sub-turn. The whole batch is one tx and clients can
reconstruct per-turn detail from the N MonMoves logs in it, so one
EngineExecute per batch suffices. Thread emitExecuteEvent through
_executeInternal: the single-execute paths (execute / executeWithMoves /
executeWithSingleMove) emit as before; the batch emits once after the loop,
iff >=1 sub-turn ran.

RealMonReplay (26-turn game, prod inline config), clean A/B:
  batched 4,016,511 -> 3,988,426  (-28,085, ~0.7%)
  legacy  4,560,142 -> 4,560,142  (unchanged control)

Indexer note: batched EngineExecute is now once-per-tx, not per-sub-turn. Turn
detail remains in the per-turn MonMoves logs (which still fire per sub-turn,
except on no-submission / forced-switch turns).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e trim

Current-branch-accurate (commit fdb60d9): legacy 4,560,142, batched 3,988,426
(batching saves 571,716 / ~12.5%). Add both wins to the §2 shipped table.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to the once-per-batch trim: the batched flow now emits zero
EngineExecute. Single-execute / legacy paths (execute / executeWithMoves /
executeWithSingleMove) still emit it per turn; batched clients reconstruct
per-turn detail from the batch's N MonMoves logs in the same tx.

RealMonReplay (26-turn game, prod inline): batched 3,988,426 -> 3,987,258
(-1,168 more; -29,253 total vs the original per-turn emit). Legacy unchanged
at 4,560,142.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Measures submit-and-execute-everything-in-one-tx for single-player/CPU: build
all entries up front, call engine.executeBatchedTurns once (the same loop the
batched drain runs), 1 TX_BASE. End-state equivalence asserted vs legacy.

RealMonReplay (26-turn game, prod inline config):
  LEGACY  4,560,376
  BATCHED 3,987,258
  ONE-TX  2,878,407   (-1,108,851 vs batched / -28%; -1,681,969 vs legacy / -37%)

Win is the collapsed submit side: 25 buffering txs + their sig/getSubmitContext
overhead -> one call (~525k is just 25 x 21k tx-base). Execution is identical
(same executeBatchedTurns), so end state matches legacy byte-for-byte.

Measurement calls executeBatchedTurns directly (pranked as moveManager); a
production CPUMoveManager wrapper is a small follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…indable)

§6 #1: ONE-TX 2,878,407 vs BATCHED 3,987,258 vs LEGACY 4,560,376. Moves via
calldata -> transient (no move SSTORE / buffer / commitment). Parked: RNG =
keccak(player salts) makes the one-tx flow intrinsically grindable; documents
the (a) reward-economy vs (b) on-chain-CPU+prevrandao options for later.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Parallel per-mon catalog + synthesis of all 13 mons' custom moves/abilities.
Verdict: a pure fixed-op VM is insufficient; the target is a bounded-Turing
core (fixed ALU incl. keccak + wrapping mul/exp, bounded loops, scratch memory,
persistent effect state, EffectStep lifecycle dispatcher, ~21 engine syscalls +
DAMAGE/STAT_BOOST macro-opcodes). ~40% trivial / ~45% moderate / ~15% hard.
Real ceiling = StatBoosts (unchecked-wrap exponentiation over the live effect
set; 12+ clients). Gates: macro-opcode feasibility + keccak/turn economics.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…alt, CPU salt 0)

p0 submits the whole game (their moves + the off-chain-computed CPU moves) plus a
per-turn salt in ONE call; the manager decodes the compact 19-byte/turn stream
([p0Move 1 | p0Extra 2 | p0Salt 13 | p1Move 1 | p1Extra 2]), forces the CPU salt
to 0, and drives the existing executeBatchedTurns. CPU-only by construction (PvP
uses SignedCommitManager). Collapses the N per-turn submit txs into one.

Test (test_executeGame_decodesPerTurnSalts): runs a 3-turn CPU game through it and
asserts the emitted MonMoves salts match the packed calldata (low 104 bits = player
salt, high = CPU salt 0) and the game advanced — verifying the manager-layer decode
that RealMonReplay option 3 (which calls executeBatchedTurns directly) never touches.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sketches the VM under the premise that StatBoosts + AttackCalculator are engine
syscalls: programs-not-contracts run by a fixed interpreter (the ZK circuit),
side-effects only via ~21 engine syscalls + DAMAGE/STAT_BOOST macro-opcodes,
bounded-Turing core (wrapping mul/exp, bounded loops, scratch memory), EffectStep
lifecycle scheduler, hard cases via explicit opcodes or a kind:custom escape
hatch. Phased plan (inline -> off-chain interp -> transpile+equiv -> on-chain
interp -> ZK) gated on macro-opcode + keccak/turn economics.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
§7 validation verdict: the restricted VM is honest only for the ~25-35%
state-machine slice; damage/boost/status (~80%) need DAMAGE(190LOC)/STAT_BOOST
(635LOC) subsystem-opcodes + a general KECCAK. The keccak is load-bearing
decorrelation (mixRngForAttacker, effect-trigger reroll), not lazy stream-split.
§8 no-keccak variation: entropy-stream RNG (decorrelation native), bit-packed KV
keys, Poseidon state commitment -> keccak-free native circuit, eliminating the
dominant ZK cost. §9 1-tx CPU gains: on-chain 2.88M -> ~0.5M (verify+settle) +
DA cut, against now-cheap proving. Gated on STAT_BOOST/DAMAGE circuit cost +
the VolatilePunch decorrelation prototype.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
sudo-owen and others added 5 commits June 1, 2026 10:29
* Inline StatBoosts into the Engine

Replace the standalone StatBoosts effect contract with native Engine
functions (addStatBoost / addKeyedStatBoost / removeStatBoost /
removeKeyedStatBoost / clearAllStatBoosts), collapsing the ~10-15
cross-contract call round-trips each boost application used to make
(getEffects, getMonStatsForBattle, getGlobalKV, addEffect/editEffect/
removeEffect, updateMonState x5, setGlobalKV) into direct storage access.

Design mirrors the existing inline-StaminaRegen precedent:
- Boost sources stay in the per-mon effect mappings under a new
  STAT_BOOST_ADDRESS sentinel (stepsBitmap = STAT_BOOST_STEPS = 0x8020),
  and the aggregated snapshot stays in globalKV. Both already recycle
  storage across battles via the MappingAllocator-managed storageKey, so
  no new storage is introduced.
- _runSingleEffect grows a sentinel branch that handles the temp-boost
  switch-out natively instead of an external IEffect call.
- Boost source keying is still derived from msg.sender, so behavior is
  identical (callers now invoke the Engine directly during execute).
- Pure pack/aggregate math lifted verbatim into src/lib/StatBoostLib.sol.

All 16 src callers (Overclock, BurnStatus, FrostbiteStatus, UpOnly,
SaviorComplex, EternalGrudge, Chronoffense, Deadlift, HoneyBribe,
Initialize, TripleThink, ActusReus, Interweaving, Tinderclaws, Loop,
Renormalize) drop the StatBoosts immutable and call the Engine directly.
Deploy scripts updated (SetupMons regenerated; EngineAndPeriphery no
longer deploys StatBoosts). StatBoosts.sol deleted.

Gas (production-faithful, regenerated snapshots):
- RealMonReplay 26-turn game: legacy 4,520,297 -> 4,403,999 (-116k);
  batched -> 3,751,425 (-120k); one-tx -> 2,641,245 (-121k).
- FullyOptimizedInline battles: -79k / -135k / -83k cold gas, fewer SLOADs.
- Minor non-stat-boost regressions (StartBattle +130, EngineOptimization
  +6.3k, Matchmaker +~130) from the larger Engine selector table / code
  size; far outweighed by the stat-boost-path savings.

All 498 tests pass.

* Stat boosts: drop the globalKV snapshot, telescope off monState

The aggregated boosted-stats snapshot in globalKV only existed because the
old StatBoosts contract was external and could not read back the absolute
boosted value to telescope deltas. Inside the Engine that constraint is
gone: the stat-boost system is the sole writer of the 5 stat-delta fields,
so the live monState delta already IS the previous boost contribution
(cleared sentinel == 0). _applyStatBoosts now computes the new boosted stat
and feeds (new - base - currentDelta) through _updateMonStateInternal,
keeping OnUpdateMonState parity, with no snapshot read/write.

Removing the snapshot also eliminates its zero->nonzero globalKV write and
the globalKV key-buffer bookkeeping (~20k each on first touch per battle).

Also collapse the switch-out path from O(n^2) to O(n): the first temp entry
hit during OnMonSwitchOut now drops every temp source on the mon and
re-aggregates the survivors in a single pass (siblings get tombstoned, so
the loop's tombstone guard skips them) instead of recomputing per instance.

Gas vs the prior (snapshotted) inlining:
- FullyOptimizedInline battles: -89.6k / -71.6k / -10.2k coldGas;
  -11 / -16 / -11 SSTOREs per battle.
- RealMonReplay 26-turn game: legacy 4,403,999 -> 4,380,318;
  batched -> 3,733,733; one-tx -> 2,623,569.
Total vs the original external StatBoosts: Fast_Battle1/2/3 -169k / -206k /
-93k coldGas (-10% / -12.6% / -7.6%).

All 498 tests pass.

* Forbid direct stat-delta writes; document inlined stat boosts

Guard the external updateMonState so the 5 stat deltas (Speed..SpecialDefense,
contiguous enum values 2-6) can only be changed through add/removeStatBoost.
Direct writes now revert with StatRequiresStatBoost, so the boost aggregation
(which telescopes off the live monState delta) can't be silently clobbered.
Hp/Stamina/IsKnockedOut/ShouldSkipTurn stay writable. The internal stat-boost
path keeps using _updateMonStateInternal directly and is unaffected.

Test mocks that wrote stat deltas directly now go through the stat-boost API:
- ReduceSpAtkMove: SpecialAttack -1 via a 10% Divide on base 10 (still fires
  OnUpdateMonState, so the heal-on-update test is unchanged).
- TempStatBoostEffect: Temp +100% Attack multiply on apply (base 1 -> delta 1);
  dropped its OnMonSwitchOut hook since the Engine clears temp boosts natively.
- OneTurnStatBoost: both hooks apply a merged +100% Attack multiply (delta +3),
  asserting both ran; updated the assertion accordingly.
Added DirectStatWriteMove + test_directStatWriteIsRejected to lock in the guard.

CLAUDE.md: new "Stat Boosts (inlined into the Engine)" section documenting the
API, sentinel storage, msg.sender keying, the no-snapshot telescoping, and the
ownership invariant; removed the stale StatBoosts.sol references.

Guard cost is +252 gas on the external-updateMonState path (ExternalStaminaRegen);
stat-boost battles unaffected. All 499 tests pass.

* Attack path: reuse ctx defender types instead of re-reading storage

_dispatchStandardAttackInternal re-resolved the defender Mon and re-read
stats.type1/type2 for the TypeCalcLib effectiveness calls, even though
_getDamageCalcContextInternal already loaded ctx.defenderType1/defenderType2.
Use the context fields instead, dropping a _getTeamMon resolve + SLOAD per
damaging move.

Gas: FullyOptimizedInline battles -1,118 / -1,173 / -1,212 coldGas
(-5 totalSload each); RealMonReplay 26-turn game -2,982 across legacy/
batched/one-tx; EngineOptimization flat (-6). All 499 tests pass.

(Also evaluated reusing tombstoned effect slots for new stat-boost sources;
dropped it — the benefit is invisible in realistic battles and the extra
branches added ~+500 gas of codegen ripple to unrelated warm-path scenarios.)

* Gate step pipelines on a listener union, not effect count

The PreDamage / AfterDamage / OnUpdateMonState pipelines were spun up
whenever the mon had ANY effect (count > 0), then iterated and found no
listener. OnUpdateMonState (Dreamcatcher) and PreDamage (Adaptor) have a
single listener each game-wide, yet OnUpdateMonState fires on every
stat-boost delta / stamina regen / heal and PreDamage on every damage
event — so in the common battle (neither mon present) the whole
abi.encode + _runEffects shell + per-effect scan was pure waste. Stat-boost
entries living in the effect list made the count gate even leakier.

Track a per-battle uint16 `playerEffectStepsUnion` (OR of every player
effect's stepsBitmap, set in _addEffectInternal, reset at startBattle) and
gate each pipeline on `union & (1<<step)` AND the per-mon count. It's an
over-approximation — never cleared on removal — so it can only run a
pipeline that finds nothing (as today), never skip a live listener. All
listener-effect adds (incl. inline abilities) route through
_addEffectInternal, so the bit is always set before combat.

Gas:
- RealMonReplay 26-turn game: legacy 4,377,336 -> 4,236,874 (-140k);
  batched -> 3,547,621 (-183k); one-tx -> 2,437,457 (-183k).
- FullyOptimizedInline battles: -76k / -66k / -78k coldGas,
  -134 / -116 / -134 totalSload each.
- Small regressions from the added union SLOAD where nothing is skipped:
  EngineOptimization +132/+371, StartBattle +221 — dwarfed by the wins.

All 499 tests pass (Xmon/Dreamcatcher, Nirvamma/Adaptor, AfterDamage
listeners, OnUpdateMonState heal all still fire).

---------

Co-authored-by: Claude <noreply@anthropic.com>
@sudo-owen sudo-owen merged commit 1a8be82 into main Jun 2, 2026
1 check passed
@sudo-owen sudo-owen deleted the claude/batched-from-main branch June 2, 2026 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant