Skip to content

fix(client): propagate HTTP transport errors to caller (#2110)#2483

Draft
fede-kamel wants to merge 2 commits intomodelcontextprotocol:mainfrom
fede-kamel:fix/2110-http-error-propagation
Draft

fix(client): propagate HTTP transport errors to caller (#2110)#2483
fede-kamel wants to merge 2 commits intomodelcontextprotocol:mainfrom
fede-kamel:fix/2110-http-error-propagation

Conversation

@fede-kamel
Copy link
Copy Markdown

Summary

Fix for #2110: in sse_client and streamable_http_client, HTTP and network errors during POST were caught by a bare except Exception and only logged — callers blocked forever on read_stream.receive() because the failure was never delivered through the stream.

This PR:

  • Moves the error handling inside _send_message (SSE) and handle_request_async (StreamableHTTP) so exceptions are caught before anyio's task group wraps them in a BaseExceptionGroup. A narrow except httpx.HTTPError forwards the exception to read_stream_writer, mirroring the pattern already used by stdio.py and websocket.py.
  • Preserves the upstream HTTP status code in ErrorData.data = {"http_status": N} for both the 404 session-terminated branch and the generic ≥400 branch in _handle_post_request, and includes the status in error.message for the generic branch. JSON-RPC code mapping is unchanged.

Status: draft — awaiting ready for work label on #2110 per CONTRIBUTING.md. Code is verified locally; opening as draft so review can start when maintainers are ready. Happy to close and wait if you'd prefer.

Test plan

Unit (httpx.MockTransport, fast / deterministic)

  • test_sse_client_propagates_post_http_error_to_caller[401|404|500] — parametrized over three status codes. Uses a _LongLivedSSEStream helper to emit the endpoint event and keep the SSE GET open (mirrors real server behaviour so sse_reader doesn't tear down the stream before the POST path runs).
  • test_streamable_http_client_propagates_post_network_error_to_callerMockTransport handler raises httpx.ConnectError, asserts the exception surfaces on read_stream.

Integration (real uvicorn subprocess)

  • test_sse_client_real_server_surfaces_post_http_error[401|404|500] — real SseServerTransport on GET /sse, custom Route on POST /messages/ returning the configured status. Real httpx client against 127.0.0.1:<ephemeral>. Asserts real httpx.HTTPStatusError.response.status_code surfaces via read_stream within fail_after(5).
  • test_sse_client_real_server_handles_no_route_match — server has NO /messages/ POST route (starlette emits a natural 404). Distinct from the deliberate-status case.
  • test_streamable_http_client_real_server_preserves_http_status[500|401|404] — real uvicorn /mcp returning the configured status. Asserts:
    • correct JSON-RPC error.code mapping (500/401 → INTERNAL_ERROR, 404 → INVALID_REQUEST "Session terminated")
    • error.data == {"http_status": <N>} (the newly-preserved HTTP status)
    • the status code appears in error.message for non-404 cases.
  • test_streamable_http_client_real_server_surfaces_network_error — binds an ephemeral port, never starts a server on it. Real httpx.ConnectError arrives on read stream (not a hang).

Negative control (proof of regression guard)

I reverted the fix with git stash push src/mcp/client/sse.py src/mcp/client/streamable_http.py and re-ran all 10 substantive test cases. All 10 failed against unfixed main — the pre-fix code hit every fail_after(3) / fail_after(5) deadline waiting for a response that never arrived. With the fix restored, all 10 pass. Before/after logs available on request.

Local CI equivalent

$ ./scripts/test
...
1180 passed, 98 skipped, 1 xfailed in ~20s
TOTAL   19845   0   1506   0  100.00%
✅ No lines wrongly marked with 'pragma: no cover'

Plus ruff check, ruff format, pyright, and pre-commit on the changed files — all clean.

Scope deliberately excluded

Relationship to prior PRs

Closes #2110 (if scope is accepted — happy to remove this line).

…rotocol#2110)

Previously, `sse_client` and `streamable_http_client` post paths caught
exceptions with a bare `except Exception` and only logged them. HTTP
errors (401/403/404/5xx) and network errors silently disappeared,
leaving the caller blocked indefinitely on `read_stream.receive()`.

Changes:
- Move the error handling inside `_send_message` (sse) and
  `handle_request_async` (streamable_http) so exceptions are caught
  before anyio's task group wraps them in a BaseExceptionGroup.
  Narrow the catch to `httpx.HTTPError` and forward the exception to
  `read_stream_writer` — the pattern already used by `stdio.py` and
  `websocket.py`.
- In `_handle_post_request`, preserve the upstream HTTP status code in
  `ErrorData.data = {"http_status": N}` for both the 404 session-
  terminated branch and the generic >=400 branch, and include the
  status in `error.message` for the generic branch. The JSON-RPC code
  mapping is unchanged; callers that inspected `error.message` before
  still work.

Tests:
- Unit: sse and streamable_http post-error propagation via
  httpx.MockTransport, parametrized over 401/404/500 plus a network-
  error case.
- Integration: real uvicorn subprocess serving a configurable failing
  POST endpoint; asserts caller receives the error within a bounded
  timeout (proving the no-hang behavior end-to-end) and that the
  HTTP status is preserved in `error.data`.
- Negative control confirmed: all 10 test cases fail against the
  pre-fix code (they hit the fail_after deadlines) and pass with the
  fix applied.

Coverage stays at 100.00%; `strict-no-cover` clean.

Refs: modelcontextprotocol#2110
… 3.11

On Python 3.11 the `async with streamable_http_client(...)` context
manager's `__aexit__` has a coverage-instrumentation quirk (noted in
AGENTS.md) where statements immediately after the `async with` block
are reported as uncovered even when the test itself passes.

Move the `assert isinstance(received, httpx.ConnectError)` inside the
block so coverage hits it deterministically across the 3.10–3.14
matrix. Verified locally against 3.11.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTTP transport swallows non-2xx status codes causing client to hang

1 participant