Skip to content

Python: Fix structured-output corruption by joining Message.text without spaces#4897

Open
t-anjan wants to merge 3 commits into
microsoft:mainfrom
t-anjan:fix/structured-output-space-join
Open

Python: Fix structured-output corruption by joining Message.text without spaces#4897
t-anjan wants to merge 3 commits into
microsoft:mainfrom
t-anjan:fix/structured-output-space-join

Conversation

@t-anjan
Copy link
Copy Markdown
Contributor

@t-anjan t-anjan commented Mar 25, 2026

Summary

Message.text aggregates the message's text contents with " ".join(...). For natural language responses returned as a single content that's mostly fine, but a streamed response is delivered as multiple TextContent deltas — joining them with spaces injects spurious whitespace mid-token.

This PR changes Message.text to use "".join(...). Any spacing the model wanted is already inside the chunks, so concatenating without a separator preserves what the model actually emitted.

Why fix Message.text rather than just ChatResponse.value / AgentResponse.value

An earlier revision of this PR patched only the two .value properties so structured-output parsing would bypass Message.text. That helps the parsing call site but leaves every other consumer broken — anything that reads message.text or response.text directly (logging, telemetry, downstream serializers, provider implementations) still sees the corrupted string.

It also leaves a real inconsistency in place: the streaming path (from_updates / from_update_generator) coalesces multiple TextContent blocks into one and so emits no extra spaces, while the non-streaming path was running through " ".join and therefore did. Two test fixtures in this repo previously baked in that disagreement (see "Pre-existing assertions" below).

Fixing Message.text itself is the root-cause fix and removes the inconsistency.

Real-world failures

Observed in production with the OpenAI-compatible chat client (OpenRouter → Gemini) using response_format for structured output. Failures were intermittent — retrying the same request usually succeeded.

Failure 1 — space injected into a JSON key:
LLM returned "action": "request_evidence" but Pydantic received "action ": "request_evidence":

Field required [type=missing, input_value={'action ': 'request_evid...}]

Failure 2 — space injected into a JSON value:
LLM returned "readiness": "not_started" but Pydantic received "readiness": "not_started ":

Input should be 'not_started', ... [input_value='not_started ']

In both cases the raw LLM response was valid JSON — the corruption was introduced by " ".join() in Message.text.

Pre-existing assertions updated

Three pre-existing tests were asserting the buggy behaviour and have been updated:

  • test_chat_message_contents — directly asserted that Message.text joined two text contents with an inserted space.
  • test_chat_response_updates_to_chat_response_multiple — asserted "I'm doing well, thank you!" (two spaces: the trailing space inside the chunk plus the one inserted by " ".join). Now asserts the single space the model actually produced.
  • test_workflow_as_agent_yield_output_with_list_of_chat_messages — asserted the streaming path produced "thirdfourth" while the non-streaming path produced "third fourth". Both now produce "thirdfourth".

Changes

  • python/packages/core/agent_framework/_types.pyMessage.text now uses "".join(...). Updated docstring explains the rationale.
  • New regression test test_chat_message_text_concatenates_without_injecting_spaces — builds a JSON value split across three deltas and asserts both message.text and ChatResponse.value round-trip correctly.

Test plan

  • uv run pytest packages/core/tests --ignore=packages/core/tests/core/test_observability.py → 2622 passed.
  • uv run pytest packages/core/tests/core/test_observability.py packages/anthropic/tests → 269 passed.

…out spaces

Bug
---
`Message.text` aggregates the message's text contents with `" ".join(...)`.
For natural language that's mostly fine, but a streamed response is delivered
as multiple `TextContent` deltas — joining them with spaces injects spurious
whitespace mid-token. Downstream this manifests as:

- structured-output JSON corruption (a path like `"report_lines/sofp.json"`
  becomes `"re port_ lines/sof p.jso n"` and Pydantic rejects it),
- intermittent field-validation failures observed in production with the
  OpenAI-compatible chat client (OpenRouter → Gemini) — `{"action ": ...}`
  with a trailing space in a key, `{"readiness": "not_started "}` with a
  trailing space in a value,
- inconsistent text between the streaming and non-streaming code paths
  (the streaming path coalesces multiple text contents into one and so
  emits no extra spaces; the non-streaming path went through `" ".join` and
  therefore did).

Why fix `Message.text` rather than `ChatResponse.value` / `AgentResponse.value`
---
A previous attempt patched only the `.value` properties so that structured
parsing would bypass `Message.text`. That helps the parsing call site but
leaves every other consumer broken — anything that reads `message.text` or
`response.text` directly (logging, telemetry, downstream serializers,
provider implementations) still sees the corrupted string. Fixing
`Message.text` itself is the root-cause fix and removes the inconsistency
between the streaming and non-streaming paths.

Multiple `TextContent` blocks are almost always a streaming-coalescing
artefact — one logical token stream that happened to be split across deltas.
Any spacing the model wanted is already inside the chunks, so concatenating
without an extra separator preserves what the model emitted.

Changes
-------
- `Message.text` now uses `"".join(...)` and the docstring explains why.
- `test_chat_message_contents` and the two `test_chat_response_updates_to_chat_response_*`
  tests are updated to reflect the new behaviour (the assertions previously
  baked in the spurious extra space).
- New regression test `test_chat_message_text_concatenates_without_injecting_spaces`
  builds a JSON value split across three deltas and asserts both
  `message.text` and `ChatResponse.value` round-trip correctly.
- `test_workflow_as_agent_yield_output_with_list_of_chat_messages` is updated
  so the non-streaming and streaming assertions agree (both now expect
  `"thirdfourth"` — previously they disagreed).

Test plan
---------
- `uv run pytest packages/core/tests --ignore=packages/core/tests/core/test_observability.py`
  → 2622 passed.
- `uv run pytest packages/core/tests/core/test_observability.py packages/anthropic/tests`
  → 269 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@t-anjan t-anjan force-pushed the fix/structured-output-space-join branch from 75581f5 to 6576cd7 Compare May 15, 2026 03:18
Copilot AI review requested due to automatic review settings May 15, 2026 03:18
@t-anjan t-anjan changed the title Python: Fix structured output parsing when text contents are not coalesced Python: Fix structured-output corruption by joining Message.text without spaces May 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses structured-output parsing failures caused by inserted whitespace between split text chunks, but the implementation changes the global Message.text concatenation behavior rather than only the structured parsing path.

Changes:

  • Updates Message.text to concatenate TextContent blocks without separators.
  • Adjusts existing tests to expect direct chunk concatenation.
  • Adds a structured-output parsing regression test using split JSON chunks.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
python/packages/core/agent_framework/_types.py Changes Message.text concatenation behavior and documentation.
python/packages/core/tests/core/test_types.py Updates text concatenation expectations and adds split-JSON structured output coverage.
python/packages/core/tests/workflow/test_workflow_agent.py Updates workflow response text expectations for separate text contents.

shows up as field-validation failures downstream.
"""
return " ".join(content.text for content in self.contents if content.type == "text") # type: ignore[misc]
return "".join(content.text for content in self.contents if content.type == "text") # type: ignore[misc]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @copilot — your scope concern is the right thing to flag on first pass. Pushing back on the substance, with line references so a reviewer can verify quickly:

The "natural-language segments get merged" risk doesn't survive contact with the coalescing pipeline. _finalize_response (_types.py:1976-1980) calls _coalesce_text_content (_types.py:1944) on every message after both streaming and non-streaming construction, merging adjacent same-type TextContent blocks into one via Content.__add__ (_types.py:1429) — which is itself already separator-free. So in any caller that reads .text from a finalized response message, the join separator is irrelevant because there's only one element to join. I audited the in-repo consumers (foundry/_memory_provider.py, foundry/_agent.py, gemini/_chat_client.py, redis/_context_provider.py, core/_compaction.py, core/_harness/_memory.py, github_copilot/_agent.py, claude/_agent.py, orchestrations/_concurrent.py, azurefunctions/_workflow.py, durabletask/_shim.py, ag-ui/_client.py, lab/tau2/_message_utils.py) and every one of them reads .text post-finalization.

The separator-free behaviour is what the rest of the codebase already does for streamed content. Any change to make Message.text agree with _coalesce_text_content on multi-TextContent messages is necessarily a switch to "".join — they're the same operation, expressed twice.

Why the narrower .value-only fix is genuinely insufficient, not just stylistically different:

  1. Streaming and non-streaming paths currently disagree on .text. _coalesce_text_content only fires from from_updates / from_update_generator, so the streaming path produces a single coalesced TextContent and .text returns it intact. Non-streaming providers (e.g. openai/_chat_client.py:1921-1928) build a Message directly from N output_text items with no coalescing, so .text was hitting " ".join on a multi-element list. The pre-fix test_workflow_as_agent_yield_output_with_list_of_chat_messages literally asserted ["thirdfourth"] (streaming, line 536) and ["third fourth"] (non-streaming, line 544) on the same logical input — the test itself was a screenshot of the bug. Fixing only .value leaves that disagreement in .text untouched.

  2. .value-only doesn't help anyone reading .text directly. Logging, telemetry, downstream serializers, the ChatResponse.text / __str__ paths (_types.py:2289-2292, 2312), and providers that pre-build messages would all keep producing whitespace-corrupted strings.

  3. The existing .value fallback already routes through Message.text (_parse_structured_response_value(self.text, ...) at _types.py:2308, 2572). A .value-only patch that bypasses self.text introduces a second concatenation pathway that can drift from Message.text's behaviour over time. Fixing the source is structurally simpler.

Updated tests are not regressions, they were the bug specification.

  • test_chat_message_contents previously asserted Message.text == "Hello, how are you? I'm fine, thank you!" for ["Hello, how are you?", "I'm fine, thank you!"] — i.e. it asserted that Message.text would inject a space between two complete sentences. The model never produced that space.
  • test_chat_response_updates_to_chat_response_multiple asserted "I'm doing well, thank you!" (two spaces) for chunks ["I'm doing well, ", "thank you!"] — i.e. the trailing space inside the chunk plus an injected space from " ".join.
  • test_workflow_as_agent_yield_output_with_list_of_chat_messages is the streaming/non-streaming disagreement above.

In each case the new assertion is what the model actually emits.

Real-world impact (already in the PR body): Message.text corruption surfaces in production via OpenRouter→Gemini structured output, where the OpenAI-compatible chat client builds TextContent per output_text item on the non-streaming path and OpenRouter forwards Gemini text in multiple items. Failures show as Field required ... input_value={'action ': ...} (space injected into a JSON key) or Input should be 'not_started', ... input_value='not_started ' (space injected into a value). The .value-only fix would address the parsing call site for these but leave the same corrupted string reaching telemetry, logs, and any downstream .text consumer.

Happy to break the change up if a reviewer prefers the historical sequence (e.g. land Message.text first, then a follow-up that drops the now-redundant raw-text walk if _parse_structured_response_value is later refactored), but the substantive position is that fixing Message.text is the root cause and the smaller surface area in the long run.

t-anjan and others added 2 commits May 15, 2026 09:17
Cite the existing separator-free coalescing path (Content.__add__ via
_finalize_response) as the precedent, and use the production failure
example for concreteness instead of the abstract "streaming-coalescing
artefact" framing.

No behaviour change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants