Python: Fix structured-output corruption by joining Message.text without spaces#4897
Python: Fix structured-output corruption by joining Message.text without spaces#4897t-anjan wants to merge 3 commits into
Conversation
…out spaces
Bug
---
`Message.text` aggregates the message's text contents with `" ".join(...)`.
For natural language that's mostly fine, but a streamed response is delivered
as multiple `TextContent` deltas — joining them with spaces injects spurious
whitespace mid-token. Downstream this manifests as:
- structured-output JSON corruption (a path like `"report_lines/sofp.json"`
becomes `"re port_ lines/sof p.jso n"` and Pydantic rejects it),
- intermittent field-validation failures observed in production with the
OpenAI-compatible chat client (OpenRouter → Gemini) — `{"action ": ...}`
with a trailing space in a key, `{"readiness": "not_started "}` with a
trailing space in a value,
- inconsistent text between the streaming and non-streaming code paths
(the streaming path coalesces multiple text contents into one and so
emits no extra spaces; the non-streaming path went through `" ".join` and
therefore did).
Why fix `Message.text` rather than `ChatResponse.value` / `AgentResponse.value`
---
A previous attempt patched only the `.value` properties so that structured
parsing would bypass `Message.text`. That helps the parsing call site but
leaves every other consumer broken — anything that reads `message.text` or
`response.text` directly (logging, telemetry, downstream serializers,
provider implementations) still sees the corrupted string. Fixing
`Message.text` itself is the root-cause fix and removes the inconsistency
between the streaming and non-streaming paths.
Multiple `TextContent` blocks are almost always a streaming-coalescing
artefact — one logical token stream that happened to be split across deltas.
Any spacing the model wanted is already inside the chunks, so concatenating
without an extra separator preserves what the model emitted.
Changes
-------
- `Message.text` now uses `"".join(...)` and the docstring explains why.
- `test_chat_message_contents` and the two `test_chat_response_updates_to_chat_response_*`
tests are updated to reflect the new behaviour (the assertions previously
baked in the spurious extra space).
- New regression test `test_chat_message_text_concatenates_without_injecting_spaces`
builds a JSON value split across three deltas and asserts both
`message.text` and `ChatResponse.value` round-trip correctly.
- `test_workflow_as_agent_yield_output_with_list_of_chat_messages` is updated
so the non-streaming and streaming assertions agree (both now expect
`"thirdfourth"` — previously they disagreed).
Test plan
---------
- `uv run pytest packages/core/tests --ignore=packages/core/tests/core/test_observability.py`
→ 2622 passed.
- `uv run pytest packages/core/tests/core/test_observability.py packages/anthropic/tests`
→ 269 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
75581f5 to
6576cd7
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses structured-output parsing failures caused by inserted whitespace between split text chunks, but the implementation changes the global Message.text concatenation behavior rather than only the structured parsing path.
Changes:
- Updates
Message.textto concatenateTextContentblocks without separators. - Adjusts existing tests to expect direct chunk concatenation.
- Adds a structured-output parsing regression test using split JSON chunks.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
python/packages/core/agent_framework/_types.py |
Changes Message.text concatenation behavior and documentation. |
python/packages/core/tests/core/test_types.py |
Updates text concatenation expectations and adds split-JSON structured output coverage. |
python/packages/core/tests/workflow/test_workflow_agent.py |
Updates workflow response text expectations for separate text contents. |
| shows up as field-validation failures downstream. | ||
| """ | ||
| return " ".join(content.text for content in self.contents if content.type == "text") # type: ignore[misc] | ||
| return "".join(content.text for content in self.contents if content.type == "text") # type: ignore[misc] |
There was a problem hiding this comment.
Thanks @copilot — your scope concern is the right thing to flag on first pass. Pushing back on the substance, with line references so a reviewer can verify quickly:
The "natural-language segments get merged" risk doesn't survive contact with the coalescing pipeline. _finalize_response (_types.py:1976-1980) calls _coalesce_text_content (_types.py:1944) on every message after both streaming and non-streaming construction, merging adjacent same-type TextContent blocks into one via Content.__add__ (_types.py:1429) — which is itself already separator-free. So in any caller that reads .text from a finalized response message, the join separator is irrelevant because there's only one element to join. I audited the in-repo consumers (foundry/_memory_provider.py, foundry/_agent.py, gemini/_chat_client.py, redis/_context_provider.py, core/_compaction.py, core/_harness/_memory.py, github_copilot/_agent.py, claude/_agent.py, orchestrations/_concurrent.py, azurefunctions/_workflow.py, durabletask/_shim.py, ag-ui/_client.py, lab/tau2/_message_utils.py) and every one of them reads .text post-finalization.
The separator-free behaviour is what the rest of the codebase already does for streamed content. Any change to make Message.text agree with _coalesce_text_content on multi-TextContent messages is necessarily a switch to "".join — they're the same operation, expressed twice.
Why the narrower .value-only fix is genuinely insufficient, not just stylistically different:
-
Streaming and non-streaming paths currently disagree on
.text._coalesce_text_contentonly fires fromfrom_updates/from_update_generator, so the streaming path produces a single coalescedTextContentand.textreturns it intact. Non-streaming providers (e.g.openai/_chat_client.py:1921-1928) build aMessagedirectly from Noutput_textitems with no coalescing, so.textwas hitting" ".joinon a multi-element list. The pre-fixtest_workflow_as_agent_yield_output_with_list_of_chat_messagesliterally asserted["thirdfourth"](streaming, line 536) and["third fourth"](non-streaming, line 544) on the same logical input — the test itself was a screenshot of the bug. Fixing only.valueleaves that disagreement in.textuntouched. -
.value-only doesn't help anyone reading.textdirectly. Logging, telemetry, downstream serializers, theChatResponse.text/__str__paths (_types.py:2289-2292,2312), and providers that pre-build messages would all keep producing whitespace-corrupted strings. -
The existing
.valuefallback already routes throughMessage.text(_parse_structured_response_value(self.text, ...)at_types.py:2308,2572). A.value-only patch that bypassesself.textintroduces a second concatenation pathway that can drift fromMessage.text's behaviour over time. Fixing the source is structurally simpler.
Updated tests are not regressions, they were the bug specification.
test_chat_message_contentspreviously assertedMessage.text == "Hello, how are you? I'm fine, thank you!"for["Hello, how are you?", "I'm fine, thank you!"]— i.e. it asserted thatMessage.textwould inject a space between two complete sentences. The model never produced that space.test_chat_response_updates_to_chat_response_multipleasserted"I'm doing well, thank you!"(two spaces) for chunks["I'm doing well, ", "thank you!"]— i.e. the trailing space inside the chunk plus an injected space from" ".join.test_workflow_as_agent_yield_output_with_list_of_chat_messagesis the streaming/non-streaming disagreement above.
In each case the new assertion is what the model actually emits.
Real-world impact (already in the PR body): Message.text corruption surfaces in production via OpenRouter→Gemini structured output, where the OpenAI-compatible chat client builds TextContent per output_text item on the non-streaming path and OpenRouter forwards Gemini text in multiple items. Failures show as Field required ... input_value={'action ': ...} (space injected into a JSON key) or Input should be 'not_started', ... input_value='not_started ' (space injected into a value). The .value-only fix would address the parsing call site for these but leave the same corrupted string reaching telemetry, logs, and any downstream .text consumer.
Happy to break the change up if a reviewer prefers the historical sequence (e.g. land Message.text first, then a follow-up that drops the now-redundant raw-text walk if _parse_structured_response_value is later refactored), but the substantive position is that fixing Message.text is the root cause and the smaller surface area in the long run.
Cite the existing separator-free coalescing path (Content.__add__ via _finalize_response) as the precedent, and use the production failure example for concreteness instead of the abstract "streaming-coalescing artefact" framing. No behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Message.textaggregates the message's text contents with" ".join(...). For natural language responses returned as a single content that's mostly fine, but a streamed response is delivered as multipleTextContentdeltas — joining them with spaces injects spurious whitespace mid-token.This PR changes
Message.textto use"".join(...). Any spacing the model wanted is already inside the chunks, so concatenating without a separator preserves what the model actually emitted.Why fix
Message.textrather than justChatResponse.value/AgentResponse.valueAn earlier revision of this PR patched only the two
.valueproperties so structured-output parsing would bypassMessage.text. That helps the parsing call site but leaves every other consumer broken — anything that readsmessage.textorresponse.textdirectly (logging, telemetry, downstream serializers, provider implementations) still sees the corrupted string.It also leaves a real inconsistency in place: the streaming path (
from_updates/from_update_generator) coalesces multipleTextContentblocks into one and so emits no extra spaces, while the non-streaming path was running through" ".joinand therefore did. Two test fixtures in this repo previously baked in that disagreement (see "Pre-existing assertions" below).Fixing
Message.textitself is the root-cause fix and removes the inconsistency.Real-world failures
Observed in production with the OpenAI-compatible chat client (OpenRouter → Gemini) using
response_formatfor structured output. Failures were intermittent — retrying the same request usually succeeded.Failure 1 — space injected into a JSON key:
LLM returned
"action": "request_evidence"but Pydantic received"action ": "request_evidence":Failure 2 — space injected into a JSON value:
LLM returned
"readiness": "not_started"but Pydantic received"readiness": "not_started ":In both cases the raw LLM response was valid JSON — the corruption was introduced by
" ".join()inMessage.text.Pre-existing assertions updated
Three pre-existing tests were asserting the buggy behaviour and have been updated:
test_chat_message_contents— directly asserted thatMessage.textjoined two text contents with an inserted space.test_chat_response_updates_to_chat_response_multiple— asserted"I'm doing well, thank you!"(two spaces: the trailing space inside the chunk plus the one inserted by" ".join). Now asserts the single space the model actually produced.test_workflow_as_agent_yield_output_with_list_of_chat_messages— asserted the streaming path produced"thirdfourth"while the non-streaming path produced"third fourth". Both now produce"thirdfourth".Changes
python/packages/core/agent_framework/_types.py—Message.textnow uses"".join(...). Updated docstring explains the rationale.test_chat_message_text_concatenates_without_injecting_spaces— builds a JSON value split across three deltas and asserts bothmessage.textandChatResponse.valueround-trip correctly.Test plan
uv run pytest packages/core/tests --ignore=packages/core/tests/core/test_observability.py→ 2622 passed.uv run pytest packages/core/tests/core/test_observability.py packages/anthropic/tests→ 269 passed.