-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Python: Fix structured-output corruption by joining Message.text without spaces #4897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
t-anjan
wants to merge
3
commits into
microsoft:main
Choose a base branch
from
t-anjan:fix/structured-output-space-join
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+52
−7
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @copilot — your scope concern is the right thing to flag on first pass. Pushing back on the substance, with line references so a reviewer can verify quickly:
The "natural-language segments get merged" risk doesn't survive contact with the coalescing pipeline.
_finalize_response(_types.py:1976-1980) calls_coalesce_text_content(_types.py:1944) on every message after both streaming and non-streaming construction, merging adjacent same-typeTextContentblocks into one viaContent.__add__(_types.py:1429) — which is itself already separator-free. So in any caller that reads.textfrom a finalized response message, the join separator is irrelevant because there's only one element to join. I audited the in-repo consumers (foundry/_memory_provider.py,foundry/_agent.py,gemini/_chat_client.py,redis/_context_provider.py,core/_compaction.py,core/_harness/_memory.py,github_copilot/_agent.py,claude/_agent.py,orchestrations/_concurrent.py,azurefunctions/_workflow.py,durabletask/_shim.py,ag-ui/_client.py,lab/tau2/_message_utils.py) and every one of them reads.textpost-finalization.The separator-free behaviour is what the rest of the codebase already does for streamed content. Any change to make
Message.textagree with_coalesce_text_contenton multi-TextContentmessages is necessarily a switch to"".join— they're the same operation, expressed twice.Why the narrower
.value-only fix is genuinely insufficient, not just stylistically different:Streaming and non-streaming paths currently disagree on
.text._coalesce_text_contentonly fires fromfrom_updates/from_update_generator, so the streaming path produces a single coalescedTextContentand.textreturns it intact. Non-streaming providers (e.g.openai/_chat_client.py:1921-1928) build aMessagedirectly from Noutput_textitems with no coalescing, so.textwas hitting" ".joinon a multi-element list. The pre-fixtest_workflow_as_agent_yield_output_with_list_of_chat_messagesliterally asserted["thirdfourth"](streaming, line 536) and["third fourth"](non-streaming, line 544) on the same logical input — the test itself was a screenshot of the bug. Fixing only.valueleaves that disagreement in.textuntouched..value-only doesn't help anyone reading.textdirectly. Logging, telemetry, downstream serializers, theChatResponse.text/__str__paths (_types.py:2289-2292,2312), and providers that pre-build messages would all keep producing whitespace-corrupted strings.The existing
.valuefallback already routes throughMessage.text(_parse_structured_response_value(self.text, ...)at_types.py:2308,2572). A.value-only patch that bypassesself.textintroduces a second concatenation pathway that can drift fromMessage.text's behaviour over time. Fixing the source is structurally simpler.Updated tests are not regressions, they were the bug specification.
test_chat_message_contentspreviously assertedMessage.text == "Hello, how are you? I'm fine, thank you!"for["Hello, how are you?", "I'm fine, thank you!"]— i.e. it asserted thatMessage.textwould inject a space between two complete sentences. The model never produced that space.test_chat_response_updates_to_chat_response_multipleasserted"I'm doing well, thank you!"(two spaces) for chunks["I'm doing well, ", "thank you!"]— i.e. the trailing space inside the chunk plus an injected space from" ".join.test_workflow_as_agent_yield_output_with_list_of_chat_messagesis the streaming/non-streaming disagreement above.In each case the new assertion is what the model actually emits.
Real-world impact (already in the PR body):
Message.textcorruption surfaces in production via OpenRouter→Gemini structured output, where the OpenAI-compatible chat client buildsTextContentperoutput_textitem on the non-streaming path and OpenRouter forwards Gemini text in multiple items. Failures show asField required ... input_value={'action ': ...}(space injected into a JSON key) orInput should be 'not_started', ... input_value='not_started '(space injected into a value). The.value-only fix would address the parsing call site for these but leave the same corrupted string reaching telemetry, logs, and any downstream.textconsumer.Happy to break the change up if a reviewer prefers the historical sequence (e.g. land
Message.textfirst, then a follow-up that drops the now-redundant raw-text walk if_parse_structured_response_valueis later refactored), but the substantive position is that fixingMessage.textis the root cause and the smaller surface area in the long run.