fix(api): forward extra_body.chat_template_kwargs on /v1/messages#1276
Open
sufubao wants to merge 1 commit intoModelTC:mainfrom
Open
fix(api): forward extra_body.chat_template_kwargs on /v1/messages#1276sufubao wants to merge 1 commit intoModelTC:mainfrom
sufubao wants to merge 1 commit intoModelTC:mainfrom
Conversation
The Anthropic Messages request translator in _anthropic_to_chat_request unconditionally strips extra_body, so clients had no way to pass LightLLM-specific ChatCompletionRequest options through the /v1/messages endpoint. Models that default thinking on (Qwen3, DeepSeek) could not be told to skip thinking, and with short max_tokens this yielded empty content: [] responses because all output tokens went to the <think> block. Merge any dict-valued extra_body into the translated openai_dict before the _UNKNOWN_FIELDS cleanup, with setdefault so fields produced by the Anthropic->OpenAI translation keep precedence. Unknown keys are dropped by Pydantic (extra='ignore' default), so this does not risk validation errors for arbitrary SDK pass-through. Unit test locks in the four shapes: single field, multiple fields, top-level override wins, missing extra_body is a no-op, and non-dict extra_body is ignored.
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces support for the extra_body field in the Anthropic-to-OpenAI request translation, enabling the forwarding of LightLLM-specific parameters such as chat_template_kwargs. The logic correctly prioritizes existing translated fields and ignores non-dictionary extra_body inputs. Comprehensive unit tests have been included to validate field forwarding, precedence rules, and edge cases. I have no feedback to provide.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/v1/messagesrequest translator stripsextra_bodybefore building theChatCompletionRequest, so clients cannot reach LightLLM fields that Anthropic's schema does not expose — most notablychat_template_kwargs. On engines where thinking defaults to on (Qwen3, DeepSeek), this means Anthropic-protocol callers have no way to disable thinking; with shortmax_tokensthe response degenerates tocontent: []because every output token went into the<think>block.extra_bodyinto the translatedopenai_dictbefore the_UNKNOWN_FIELDScleanup, withsetdefaultso fields produced by the Anthropic→OpenAI translation keep precedence. Unknown keys fall off naturally inChatCompletionRequest(**chat_dict)since the Pydantic default isextra='ignore'.{ \"model\": \"...\", \"max_tokens\": 512, \"messages\": [...], \"extra_body\": { \"chat_template_kwargs\": { \"enable_thinking\": false } } }Test plan
pytest test/test_api/test_anthropic_extra_body.py— 5 new unit tests covering: single field, multi-field, top-level field beatsextra_bodyduplicate, missingextra_bodyis no-op, non-dictextra_bodyis ignored.black+flake8pre-commit hooks pass.extra_body.chat_template_kwargs.enable_thinking=falseto a Qwen3 deployment and verifycontentis non-empty.