Python: Fix structured-output corruption by joining Message.text without spaces by t-anjan · Pull Request #4897 · microsoft/agent-framework

t-anjan · 2026-03-25T07:13:44Z

Summary

Message.text aggregates the message's text contents with " ".join(...). For natural language responses returned as a single content that's mostly fine, but a streamed response is delivered as multiple TextContent deltas — joining them with spaces injects spurious whitespace mid-token.

This PR changes Message.text to use "".join(...). Any spacing the model wanted is already inside the chunks, so concatenating without a separator preserves what the model actually emitted.

Why fix `Message.text` rather than just `ChatResponse.value` / `AgentResponse.value`

An earlier revision of this PR patched only the two .value properties so structured-output parsing would bypass Message.text. That helps the parsing call site but leaves every other consumer broken — anything that reads message.text or response.text directly (logging, telemetry, downstream serializers, provider implementations) still sees the corrupted string.

It also leaves a real inconsistency in place: the streaming path (from_updates / from_update_generator) coalesces multiple TextContent blocks into one and so emits no extra spaces, while the non-streaming path was running through " ".join and therefore did. Two test fixtures in this repo previously baked in that disagreement (see "Pre-existing assertions" below).

Fixing Message.text itself is the root-cause fix and removes the inconsistency.

Real-world failures

Observed in production with the OpenAI-compatible chat client (OpenRouter → Gemini) using response_format for structured output. Failures were intermittent — retrying the same request usually succeeded.

Failure 1 — space injected into a JSON key:
LLM returned "action": "request_evidence" but Pydantic received "action ": "request_evidence":

Field required [type=missing, input_value={'action ': 'request_evid...}]

Failure 2 — space injected into a JSON value:
LLM returned "readiness": "not_started" but Pydantic received "readiness": "not_started ":

Input should be 'not_started', ... [input_value='not_started ']

In both cases the raw LLM response was valid JSON — the corruption was introduced by " ".join() in Message.text.

Pre-existing assertions updated

Three pre-existing tests were asserting the buggy behaviour and have been updated:

test_chat_message_contents — directly asserted that Message.text joined two text contents with an inserted space.
test_chat_response_updates_to_chat_response_multiple — asserted "I'm doing well, thank you!" (two spaces: the trailing space inside the chunk plus the one inserted by " ".join). Now asserts the single space the model actually produced.
test_workflow_as_agent_yield_output_with_list_of_chat_messages — asserted the streaming path produced "thirdfourth" while the non-streaming path produced "third fourth". Both now produce "thirdfourth".

Changes

python/packages/core/agent_framework/_types.py — Message.text now uses "".join(...). Updated docstring explains the rationale.
New regression test test_chat_message_text_concatenates_without_injecting_spaces — builds a JSON value split across three deltas and asserts both message.text and ChatResponse.value round-trip correctly.

Test plan

uv run pytest packages/core/tests --ignore=packages/core/tests/core/test_observability.py → 2622 passed.
uv run pytest packages/core/tests/core/test_observability.py packages/anthropic/tests → 269 passed.

…out spaces Bug --- `Message.text` aggregates the message's text contents with `" ".join(...)`. For natural language that's mostly fine, but a streamed response is delivered as multiple `TextContent` deltas — joining them with spaces injects spurious whitespace mid-token. Downstream this manifests as: - structured-output JSON corruption (a path like `"report_lines/sofp.json"` becomes `"re port_ lines/sof p.jso n"` and Pydantic rejects it), - intermittent field-validation failures observed in production with the OpenAI-compatible chat client (OpenRouter → Gemini) — `{"action ": ...}` with a trailing space in a key, `{"readiness": "not_started "}` with a trailing space in a value, - inconsistent text between the streaming and non-streaming code paths (the streaming path coalesces multiple text contents into one and so emits no extra spaces; the non-streaming path went through `" ".join` and therefore did). Why fix `Message.text` rather than `ChatResponse.value` / `AgentResponse.value` --- A previous attempt patched only the `.value` properties so that structured parsing would bypass `Message.text`. That helps the parsing call site but leaves every other consumer broken — anything that reads `message.text` or `response.text` directly (logging, telemetry, downstream serializers, provider implementations) still sees the corrupted string. Fixing `Message.text` itself is the root-cause fix and removes the inconsistency between the streaming and non-streaming paths. Multiple `TextContent` blocks are almost always a streaming-coalescing artefact — one logical token stream that happened to be split across deltas. Any spacing the model wanted is already inside the chunks, so concatenating without an extra separator preserves what the model emitted. Changes ------- - `Message.text` now uses `"".join(...)` and the docstring explains why. - `test_chat_message_contents` and the two `test_chat_response_updates_to_chat_response_*` tests are updated to reflect the new behaviour (the assertions previously baked in the spurious extra space). - New regression test `test_chat_message_text_concatenates_without_injecting_spaces` builds a JSON value split across three deltas and asserts both `message.text` and `ChatResponse.value` round-trip correctly. - `test_workflow_as_agent_yield_output_with_list_of_chat_messages` is updated so the non-streaming and streaming assertions agree (both now expect `"thirdfourth"` — previously they disagreed). Test plan --------- - `uv run pytest packages/core/tests --ignore=packages/core/tests/core/test_observability.py` → 2622 passed. - `uv run pytest packages/core/tests/core/test_observability.py packages/anthropic/tests` → 269 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR addresses structured-output parsing failures caused by inserted whitespace between split text chunks, but the implementation changes the global Message.text concatenation behavior rather than only the structured parsing path.

Changes:

Updates Message.text to concatenate TextContent blocks without separators.
Adjusts existing tests to expect direct chunk concatenation.
Adds a structured-output parsing regression test using split JSON chunks.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`python/packages/core/agent_framework/_types.py`	Changes `Message.text` concatenation behavior and documentation.
`python/packages/core/tests/core/test_types.py`	Updates text concatenation expectations and adds split-JSON structured output coverage.
`python/packages/core/tests/workflow/test_workflow_agent.py`	Updates workflow response text expectations for separate text contents.

t-anjan · 2026-05-15T03:50:14Z

+            shows up as field-validation failures downstream.
        """
-        return " ".join(content.text for content in self.contents if content.type == "text")  # type: ignore[misc]
+        return "".join(content.text for content in self.contents if content.type == "text")  # type: ignore[misc]


Thanks @copilot — your scope concern is the right thing to flag on first pass. Pushing back on the substance, with line references so a reviewer can verify quickly:

The "natural-language segments get merged" risk doesn't survive contact with the coalescing pipeline. _finalize_response (_types.py:1976-1980) calls _coalesce_text_content (_types.py:1944) on every message after both streaming and non-streaming construction, merging adjacent same-type TextContent blocks into one via Content.__add__ (_types.py:1429) — which is itself already separator-free. So in any caller that reads .text from a finalized response message, the join separator is irrelevant because there's only one element to join. I audited the in-repo consumers (foundry/_memory_provider.py, foundry/_agent.py, gemini/_chat_client.py, redis/_context_provider.py, core/_compaction.py, core/_harness/_memory.py, github_copilot/_agent.py, claude/_agent.py, orchestrations/_concurrent.py, azurefunctions/_workflow.py, durabletask/_shim.py, ag-ui/_client.py, lab/tau2/_message_utils.py) and every one of them reads .text post-finalization.

The separator-free behaviour is what the rest of the codebase already does for streamed content. Any change to make Message.text agree with _coalesce_text_content on multi-TextContent messages is necessarily a switch to "".join — they're the same operation, expressed twice.

Why the narrower .value-only fix is genuinely insufficient, not just stylistically different:

Streaming and non-streaming paths currently disagree on .text. _coalesce_text_content only fires from from_updates / from_update_generator, so the streaming path produces a single coalesced TextContent and .text returns it intact. Non-streaming providers (e.g. openai/_chat_client.py:1921-1928) build a Message directly from N output_text items with no coalescing, so .text was hitting " ".join on a multi-element list. The pre-fix test_workflow_as_agent_yield_output_with_list_of_chat_messages literally asserted ["thirdfourth"] (streaming, line 536) and ["third fourth"] (non-streaming, line 544) on the same logical input — the test itself was a screenshot of the bug. Fixing only .value leaves that disagreement in .text untouched.

.value-only doesn't help anyone reading .text directly. Logging, telemetry, downstream serializers, the ChatResponse.text / __str__ paths (_types.py:2289-2292, 2312), and providers that pre-build messages would all keep producing whitespace-corrupted strings.

The existing .value fallback already routes through Message.text (_parse_structured_response_value(self.text, ...) at _types.py:2308, 2572). A .value-only patch that bypasses self.text introduces a second concatenation pathway that can drift from Message.text's behaviour over time. Fixing the source is structurally simpler.

Updated tests are not regressions, they were the bug specification.

test_chat_message_contents previously asserted Message.text == "Hello, how are you? I'm fine, thank you!" for ["Hello, how are you?", "I'm fine, thank you!"] — i.e. it asserted that Message.text would inject a space between two complete sentences. The model never produced that space.

test_chat_response_updates_to_chat_response_multiple asserted "I'm doing well, thank you!" (two spaces) for chunks ["I'm doing well, ", "thank you!"] — i.e. the trailing space inside the chunk plus an injected space from " ".join.

test_workflow_as_agent_yield_output_with_list_of_chat_messages is the streaming/non-streaming disagreement above.

In each case the new assertion is what the model actually emits.

Real-world impact (already in the PR body): Message.text corruption surfaces in production via OpenRouter→Gemini structured output, where the OpenAI-compatible chat client builds TextContent per output_text item on the non-streaming path and OpenRouter forwards Gemini text in multiple items. Failures show as Field required ... input_value={'action ': ...} (space injected into a JSON key) or Input should be 'not_started', ... input_value='not_started ' (space injected into a value). The .value-only fix would address the parsing call site for these but leave the same corrupted string reaching telemetry, logs, and any downstream .text consumer.

Happy to break the change up if a reviewer prefers the historical sequence (e.g. land Message.text first, then a follow-up that drops the now-redundant raw-text walk if _parse_structured_response_value is later refactored), but the substantive position is that fixing Message.text is the root cause and the smaller surface area in the long run.

Cite the existing separator-free coalescing path (Content.__add__ via _finalize_response) as the precedent, and use the production failure example for concreteness instead of the abstract "streaming-coalescing artefact" framing. No behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… PR)

markwallace-microsoft added the python label Mar 30, 2026

t-anjan force-pushed the fix/structured-output-space-join branch from 75581f5 to 6576cd7 Compare May 15, 2026 03:18

Copilot AI review requested due to automatic review settings May 15, 2026 03:18

t-anjan changed the title ~~Python: Fix structured output parsing when text contents are not coalesced~~ Python: Fix structured-output corruption by joining Message.text without spaces May 15, 2026

Copilot started reviewing on behalf of t-anjan May 15, 2026 03:21 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

t-anjan and others added 2 commits May 15, 2026 09:17

Revert stray uv.lock churn (specifier-ordering only, not part of this…

5847af4

… PR)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Fix structured-output corruption by joining Message.text without spaces#4897

Python: Fix structured-output corruption by joining Message.text without spaces#4897
t-anjan wants to merge 3 commits into
microsoft:mainfrom
t-anjan:fix/structured-output-space-join

t-anjan commented Mar 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

t-anjan May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

t-anjan commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why fix Message.text rather than just ChatResponse.value / AgentResponse.value

Real-world failures

Pre-existing assertions updated

Changes

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

t-anjan May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

t-anjan commented Mar 25, 2026 •

edited

Loading

Why fix `Message.text` rather than just `ChatResponse.value` / `AgentResponse.value`