fix(client): propagate HTTP transport errors to caller (#2110)#2483
Draft
fede-kamel wants to merge 2 commits intomodelcontextprotocol:mainfrom
Draft
fix(client): propagate HTTP transport errors to caller (#2110)#2483fede-kamel wants to merge 2 commits intomodelcontextprotocol:mainfrom
fede-kamel wants to merge 2 commits intomodelcontextprotocol:mainfrom
Conversation
…rotocol#2110) Previously, `sse_client` and `streamable_http_client` post paths caught exceptions with a bare `except Exception` and only logged them. HTTP errors (401/403/404/5xx) and network errors silently disappeared, leaving the caller blocked indefinitely on `read_stream.receive()`. Changes: - Move the error handling inside `_send_message` (sse) and `handle_request_async` (streamable_http) so exceptions are caught before anyio's task group wraps them in a BaseExceptionGroup. Narrow the catch to `httpx.HTTPError` and forward the exception to `read_stream_writer` — the pattern already used by `stdio.py` and `websocket.py`. - In `_handle_post_request`, preserve the upstream HTTP status code in `ErrorData.data = {"http_status": N}` for both the 404 session- terminated branch and the generic >=400 branch, and include the status in `error.message` for the generic branch. The JSON-RPC code mapping is unchanged; callers that inspected `error.message` before still work. Tests: - Unit: sse and streamable_http post-error propagation via httpx.MockTransport, parametrized over 401/404/500 plus a network- error case. - Integration: real uvicorn subprocess serving a configurable failing POST endpoint; asserts caller receives the error within a bounded timeout (proving the no-hang behavior end-to-end) and that the HTTP status is preserved in `error.data`. - Negative control confirmed: all 10 test cases fail against the pre-fix code (they hit the fail_after deadlines) and pass with the fix applied. Coverage stays at 100.00%; `strict-no-cover` clean. Refs: modelcontextprotocol#2110
… 3.11 On Python 3.11 the `async with streamable_http_client(...)` context manager's `__aexit__` has a coverage-instrumentation quirk (noted in AGENTS.md) where statements immediately after the `async with` block are reported as uncovered even when the test itself passes. Move the `assert isinstance(received, httpx.ConnectError)` inside the block so coverage hits it deterministically across the 3.10–3.14 matrix. Verified locally against 3.11.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix for #2110: in
sse_clientandstreamable_http_client, HTTP and network errors during POST were caught by a bareexcept Exceptionand only logged — callers blocked forever onread_stream.receive()because the failure was never delivered through the stream.This PR:
_send_message(SSE) andhandle_request_async(StreamableHTTP) so exceptions are caught before anyio's task group wraps them in aBaseExceptionGroup. A narrowexcept httpx.HTTPErrorforwards the exception toread_stream_writer, mirroring the pattern already used bystdio.pyandwebsocket.py.ErrorData.data = {"http_status": N}for both the 404 session-terminated branch and the generic ≥400 branch in_handle_post_request, and includes the status inerror.messagefor the generic branch. JSON-RPC code mapping is unchanged.Status: draft — awaiting
ready for worklabel on #2110 per CONTRIBUTING.md. Code is verified locally; opening as draft so review can start when maintainers are ready. Happy to close and wait if you'd prefer.Test plan
Unit (
httpx.MockTransport, fast / deterministic)test_sse_client_propagates_post_http_error_to_caller[401|404|500]— parametrized over three status codes. Uses a_LongLivedSSEStreamhelper to emit the endpoint event and keep the SSE GET open (mirrors real server behaviour sosse_readerdoesn't tear down the stream before the POST path runs).test_streamable_http_client_propagates_post_network_error_to_caller—MockTransporthandler raiseshttpx.ConnectError, asserts the exception surfaces onread_stream.Integration (real uvicorn subprocess)
test_sse_client_real_server_surfaces_post_http_error[401|404|500]— realSseServerTransporton GET/sse, customRouteon POST/messages/returning the configured status. Real httpx client against127.0.0.1:<ephemeral>. Asserts realhttpx.HTTPStatusError.response.status_codesurfaces viaread_streamwithinfail_after(5).test_sse_client_real_server_handles_no_route_match— server has NO/messages/POST route (starlette emits a natural 404). Distinct from the deliberate-status case.test_streamable_http_client_real_server_preserves_http_status[500|401|404]— real uvicorn/mcpreturning the configured status. Asserts:error.codemapping (500/401 →INTERNAL_ERROR, 404 →INVALID_REQUEST"Session terminated")error.data == {"http_status": <N>}(the newly-preserved HTTP status)error.messagefor non-404 cases.test_streamable_http_client_real_server_surfaces_network_error— binds an ephemeral port, never starts a server on it. Realhttpx.ConnectErrorarrives on read stream (not a hang).Negative control (proof of regression guard)
I reverted the fix with
git stash push src/mcp/client/sse.py src/mcp/client/streamable_http.pyand re-ran all 10 substantive test cases. All 10 failed against unfixedmain— the pre-fix code hit everyfail_after(3)/fail_after(5)deadline waiting for a response that never arrived. With the fix restored, all 10 pass. Before/after logs available on request.Local CI equivalent
Plus
ruff check,ruff format,pyright, andpre-commiton the changed files — all clean.Scope deliberately excluded
test_stdio.pyflaky test (the thing that sank PR fix(client): propagate HTTP transport exceptions to caller #2124) — not touched here. Out of scope.ClientSessionlevel — happy to add in this PR or a follow-up if preferred; transport-level coverage is already thorough.Relationship to prior PRs
BaseExceptionGroupwrapping doesn't swallow it, and (b) preserving HTTP status inErrorData.datarather than via a new exception type (minimal, non-breaking).Closes #2110 (if scope is accepted — happy to remove this line).