Skip to content

fix(tests): retry rmdir to clear Windows EBUSY flake in updater-integration#7728

Merged
JohnMcLear merged 2 commits into
developfrom
fix/windows-ebusy-updater-integration
May 11, 2026
Merged

fix(tests): retry rmdir to clear Windows EBUSY flake in updater-integration#7728
JohnMcLear merged 2 commits into
developfrom
fix/windows-ebusy-updater-integration

Conversation

@JohnMcLear
Copy link
Copy Markdown
Member

Summary

  • The Windows backend-test job has been intermittently red since Clean update path #7607 with EBUSY: resource busy or locked, rmdir against the temp-repo dirs created in src/tests/backend/specs/updater-integration.ts. Failing case most often: "crash-loop guard: bootCount=3 forces immediate rollback" — but every it() in the file is exposed.
  • Each test builds a tmp git repo via execSync('git …') and cleans up in try…finally with fs.rm(dir, {recursive: true, force: true}). On Windows, git child processes can briefly hold file handles after exit (NTFS lazy-release / antivirus scan / pack files), so the first rmdir hits EBUSY. fs.rm's default maxRetries is 0 — no recovery, test errors.
  • Hoist cleanup to a single cleanupTmp() helper that passes maxRetries: 10, retryDelay: 100. Built-in fs.rm capability since Node 14.14. On Linux/macOS it is a no-op (nothing to retry); on Windows it absorbs the transient lock.
  • No production code touched.

Test plan

  • cd src && npx mocha --import=tsx --timeout 120000 tests/backend/specs/updater-integration.ts → 5/5 pass locally on Linux.
  • pnpm ts-check reports no new errors for this file (pre-existing plugin_packages errors unrelated).
  • CI: Windows backend-test job (with + without plugins, Node 22/24/25) clean across a few runs to confirm the flake is gone.

Notes

The brief also flagged a separate "shutdown hook leak" line that prints right after the rmdir error. That message is a consequence of the assertion failure propagating before the test's own server shuts down cleanly; once the rmdir succeeds, the leak warning goes away. Separate investigation deferred.

🤖 Generated with Claude Code

…ration

The Windows backend-test job has been intermittently red on `crash-loop
guard: bootCount=3 forces immediate rollback` (and other cases in
`updater-integration.ts`) with:

  Error: EBUSY: resource busy or locked, rmdir
  'C:\Users\RUNNER~1\AppData\Local\Temp\updater-it-...'

Each `it()` builds a temp git repo via `execSync('git ...')` and cleans
up in a `try…finally` with `fs.rm(dir, {recursive: true, force: true})`.
On Windows, git child processes can briefly hold file handles after
exit (NTFS lazy-release / antivirus scan / pack-file handles), so the
first rmdir attempt hits EBUSY. `fs.rm`'s default `maxRetries` is 0, so
there is no recovery and the test errors out.

Hoist the cleanup to a single `cleanupTmp()` helper that passes
`maxRetries: 10, retryDelay: 100` (a built-in `fs.rm` capability since
Node 14.14). On Linux/macOS this is a no-op — there's nothing to retry.
On Windows it absorbs the transient lock.

No production code touched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

Review Summary by Qodo

Fix Windows EBUSY flake in updater-integration tests with retry logic

🐞 Bug fix 🧪 Tests

Grey Divider

Walkthroughs

Description
• Introduce cleanupTmp() helper with retry logic for Windows EBUSY flake
• Configure maxRetries: 10, retryDelay: 100 to handle transient file locks
• Replace all direct fs.rm() calls with new helper in test cleanup blocks
• Resolves intermittent test failures in updater-integration on Windows
Diagram
flowchart LR
  A["Windows git process<br/>holds file handles"] -->|transient lock| B["fs.rm fails<br/>with EBUSY"]
  C["New cleanupTmp()<br/>helper"] -->|maxRetries: 10| D["Retry rmdir<br/>on Windows"]
  D -->|success| E["Test cleanup<br/>completes"]
  B -.->|before fix| F["Test fails"]
  E -.->|after fix| G["Test passes"]
Loading

Grey Divider

File Changes

1. src/tests/backend/specs/updater-integration.ts 🐞 Bug fix +11/-5

Add retry-enabled cleanup helper for Windows EBUSY errors

• Added cleanupTmp() helper function with maxRetries: 10, retryDelay: 100 configuration
• Replaced 6 direct fs.rm() calls with cleanupTmp(dir) in test cleanup blocks
• Added explanatory comment about Windows file handle retention and NTFS lazy-release behavior
• No changes to test logic or production code

src/tests/backend/specs/updater-integration.ts


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 11, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider

Great, no issues found!

Qodo reviewed your code and found no material issues that require review

Grey Divider

Qodo Logo

Windows CI failure on this branch surfaced a *second* flake in the same
file (`crash-loop guard: bootCount=3 forces immediate rollback`):

  TypeError: Cannot read properties of undefined (reading 'execution')
    at updater-integration.ts:230

`checkPendingVerification` kicks off `performRollback` as fire-and-forget
(`void performRollback(s, deps).catch(...)`), and the test waits a flat
250 ms before asserting `states.at(-1)!.execution.status === 'rolled-back'`.
On Linux 250 ms is plenty. On Windows, git checkout + spawned-process
bookkeeping regularly push past that — so `saveState` hasn't fired yet
and `states` is empty.

This race was previously masked: the test's `finally` ran `fs.rm`,
which threw EBUSY against handles still held by the in-flight rollback,
and JS's "finally-throws-override-try-throws" semantics meant mocha
reported the EBUSY rather than the underlying TypeError. The retry-rm
patch on this branch unmasked it.

Replace the flat sleep with condition-based polling (25 ms tick, 10 s
ceiling) for a terminal state (`rolled-back` | `rollback-failed`). The
existing `assert.equal(... 'rolled-back')` still runs, giving a clean
diff if rollback landed on the failure side instead. Linux runtime
drops 329 ms → 104 ms because the poll exits as soon as the state
lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JohnMcLear JohnMcLear merged commit b3faeff into develop May 11, 2026
43 checks passed
@JohnMcLear JohnMcLear deleted the fix/windows-ebusy-updater-integration branch May 11, 2026 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant