Deflake the session-level timeout test with trio's virtual clock#2788
Conversation
The session-level read timeout also governs the initialize handshake sent by Client.__aenter__, so the test's 50ms value doubled as a real-time deadline for the in-process handshake. Under CI load the handshake tail exceeds 50ms (observed max ~190ms on a saturated windows runner), which failed the test before its body ran -- three times in the week since it landed, twice on windows and once on ubuntu, always on the 3.12/locked matrix cell. Instead of widening the margin and paying for it in real wait time, run this one test on trio's MockClock with autojump: virtual time advances only when every task is blocked, so the handshake can never time out no matter how slow the runner, and the blocked tool call hits its deadline the moment the run goes idle. The test keeps its original timeout value and snapshot, is immune to scheduler stalls by construction, and the file now runs in milliseconds.
There was a problem hiding this comment.
LGTM — test-only deflake of a single timeout test using trio's virtual clock; the only finding is a one-word README wording nit.
Extended reasoning...
Overview
This PR modifies a single test file, tests/interaction/lowlevel/test_timeouts.py, switching test_session_level_timeout_applies_to_every_request from a real-clock 50ms timeout to trio's MockClock(autojump_threshold=0) via a per-test anyio_backend parametrization. The timeout value, assertion, and snapshot are unchanged; the module docstring and inline comments are updated to explain the approach. No production code is touched.
Security risks
None. This is a test-only change with no impact on library code, transports, auth, or any runtime behavior shipped to users.
Level of scrutiny
Low. The change is small, self-contained, and follows an established anyio/trio testing pattern (per-test backend parametrization with a mock clock). The PR description documents the flake history, the reasoning for why the effectively-zero-timeout pattern used by the file's other tests cannot apply here, and stress-test results (500 iterations under MockClock, 30 consecutive file runs). trio is already a dev dependency, so no new dependency is introduced.
Other factors
The bug-hunting pass surfaced only a documentation nit: tests/interaction/README.md still refers to these as "real-clock timeout tests", which is now slightly imprecise for the session-level test. That is a one-word follow-up and does not affect correctness, so it does not block approval. The remaining two tests in the file are unchanged and still run on asyncio.
| # A session-level timeout cannot use the effectively-zero pattern above: it also governs the | ||
| # initialize handshake, which must complete before the blocked tool call can wait the timeout | ||
| # out in full. Any real-clock margin is a bet against CI scheduler stalls (a 50ms value lost | ||
| # that bet in CI; the in-process handshake tail reaches ~190ms on a loaded windows runner), so | ||
| # this test runs on trio's virtual clock instead. With autojump, time advances only when every | ||
| # task is blocked: the handshake always has a runnable task and therefore cannot time out no | ||
| # matter how slow the runner, and once the tool call blocks on the never-answered request the | ||
| # run goes idle and the clock jumps straight to the deadline — deterministic, with no real wait. | ||
| @requirement("protocol:timeout:session-default") | ||
| @pytest.mark.parametrize( | ||
| "anyio_backend", | ||
| [pytest.param(("trio", {"clock": MockClock(autojump_threshold=0)}), id="trio-mockclock")], | ||
| ) | ||
| async def test_session_level_timeout_applies_to_every_request() -> None: |
There was a problem hiding this comment.
🟡 Nit: tests/interaction/README.md (line 64) still describes these as "the real-clock timeout tests" when explaining why they skip the transport-parametrized connect fixture, but after this change the session-level test runs on trio's MockClock virtual clock rather than the real clock. Consider rewording that phrase (e.g. just "the timeout tests") so the README stays consistent with the updated docstring and comments here.
Extended reasoning...
What's stale: tests/interaction/README.md lines 62–66 list the test groups that don't use the transport-parametrized connect fixture, and one of them is described as "the real-clock timeout tests (the timeout machinery is transport-independent and must not race transport latency)". That phrasing was accurate before this PR, when all three tests in test_timeouts.py ran against the real clock with effectively-zero or 50ms timeouts.
Why this PR makes it inaccurate: This change moves test_session_level_timeout_applies_to_every_request onto trio's MockClock(autojump_threshold=0) via the per-test anyio_backend parametrization. Concretely: the test no longer waits any real time at all — the handshake runs on virtual time that only advances when every task is blocked, and the 0.05s deadline is reached by an autojump, not by the real clock. So of the three timeout tests in the file, only the two per-request tests (test_request_timeout_fails_the_pending_call, test_session_serves_requests_after_timeout) still run on the real clock; the README's collective label "real-clock timeout tests" no longer covers the file.
Step-by-step: (1) A reader sees "real-clock timeout tests" in the README and opens tests/interaction/lowlevel/test_timeouts.py. (2) The module docstring now says the session-level test "runs on trio's virtual clock instead", and the comment block above test_session_level_timeout_applies_to_every_request (lines 90–97) explains the MockClock approach in detail. (3) The README and the file now describe the same tests with contradictory clock semantics — exactly the kind of drift the PR was otherwise careful to avoid, since it updated both the module docstring and the inline comments.
Why nothing else catches it: the README is prose; nothing enforces consistency between it and the test file, so the drift will persist until someone notices it manually.
Addressing the counter-argument: one reviewer noted that the README sentence's real purpose — explaining why the timeout tests skip the connect fixture — remains fully valid, and that two of the three tests still use the real clock, so this is a single slightly-imprecise adjective. That's all true, which is why this is filed as a nit rather than a blocking issue: the substantive rationale (transport-independence, not racing transport latency) is unchanged and arguably strengthened. But the PR's own description checks "I have added or updated documentation as needed" as incomplete, and the change deliberately updated every other piece of documentation describing these tests; keeping the suite README in step is a one-word fix.
How to fix: in tests/interaction/README.md line 64, drop or adjust the adjective — e.g. "the timeout tests (the timeout machinery is transport-independent and must not race transport latency; the session-level one runs on trio's virtual clock)" or simply "the timeout tests (...)".
test_session_level_timeout_applies_to_every_requestflaked three times in its first week. This switches it to trio's virtual clock so it is deterministic by construction, with no real waiting.Motivation and Context
The test sets
Client(server, read_timeout_seconds=0.05). A session-level read timeout applies to every request that doesn't set its own — including theInitializeRequestsent insideClient.__aenter__— so the in-process handshake had to beat 50ms of real clock before the test body even ran. Under CI load it sometimes didn't, and the test died in__aenter__withTimed out while waiting for response to InitializeRequest:Looping the in-memory handshake on a CPU-saturated
windows-latestrunner (3.12, locked) puts its tail at p99 ≈ 70ms and max ≈ 190ms — 4.3% of handshakes exceeded 50ms; even idle, the max was 26ms. So the existing value was a standing bet against scheduler stalls, and any wider real-clock margin pays for itself in wall-clock time on every run.The file's other tests are deterministic because the timed request can never be answered (the handler blocks forever), so an effectively-zero timeout always wins. The session-default test can't use that trick: the same knob governs the handshake, which must succeed first.
Running this one test on trio's
MockClock(autojump_threshold=0)removes the bet instead of widening it. Virtual time only advances when every task is blocked: during the handshake some task is always runnable, so the handshake cannot time out no matter how slow the runner, and once the tool call blocks on the never-answered request the run goes idle and the clock jumps straight to the deadline. The timeout value and snapshot stay as they were; the file now runs in milliseconds. trio is already a dev dependency, and the per-testanyio_backendparametrization keeps the rest of the suite on asyncio.How Has This Been Tested?
tests/interaction/suite passes (529 tests, ~5s).time.monotonichas 15.625ms granularity, sofail_after(1e-6)actually lets the handshake win most races (and 3.13+ switched to QPC, flipping the behavior). Virtual time sidesteps clock granularity entirely.Breaking Changes
None — test-only.
Types of changes
Checklist
Additional context
All three flakes hit the 3.12/locked matrix cell. For the two Windows hits there's a plausible amplifier: on 3.12, asyncio's clock is
GetTickCount64-based, so a 50ms deadline computed from a floor-quantized clock can fire up to ~15.6ms early in real time.The CI tracebacks above are also a live reproduction of #2114: the
MCPErrorfrom the failed initialize surfaces fromClient.__aenter__double-wrapped inExceptionGroups.AI Disclaimer