Skip to content

fix(api): forward extra_body.chat_template_kwargs on /v1/messages#1276

Open
sufubao wants to merge 1 commit intoModelTC:mainfrom
sufubao:fix_anthropic_extra_body
Open

fix(api): forward extra_body.chat_template_kwargs on /v1/messages#1276
sufubao wants to merge 1 commit intoModelTC:mainfrom
sufubao:fix_anthropic_extra_body

Conversation

@sufubao
Copy link
Copy Markdown
Collaborator

@sufubao sufubao commented Apr 18, 2026

Summary

  • The /v1/messages request translator strips extra_body before building the ChatCompletionRequest, so clients cannot reach LightLLM fields that Anthropic's schema does not expose — most notably chat_template_kwargs. On engines where thinking defaults to on (Qwen3, DeepSeek), this means Anthropic-protocol callers have no way to disable thinking; with short max_tokens the response degenerates to content: [] because every output token went into the <think> block.
  • Merge any dict-valued extra_body into the translated openai_dict before the _UNKNOWN_FIELDS cleanup, with setdefault so fields produced by the Anthropic→OpenAI translation keep precedence. Unknown keys fall off naturally in ChatCompletionRequest(**chat_dict) since the Pydantic default is extra='ignore'.
  • Usage example for clients:
    {
      \"model\": \"...\",
      \"max_tokens\": 512,
      \"messages\": [...],
      \"extra_body\": { \"chat_template_kwargs\": { \"enable_thinking\": false } }
    }

Test plan

  • pytest test/test_api/test_anthropic_extra_body.py — 5 new unit tests covering: single field, multi-field, top-level field beats extra_body duplicate, missing extra_body is no-op, non-dict extra_body is ignored.
  • black + flake8 pre-commit hooks pass.
  • Manual: send an Anthropic request with extra_body.chat_template_kwargs.enable_thinking=false to a Qwen3 deployment and verify content is non-empty.

The Anthropic Messages request translator in _anthropic_to_chat_request
unconditionally strips extra_body, so clients had no way to pass
LightLLM-specific ChatCompletionRequest options through the
/v1/messages endpoint. Models that default thinking on (Qwen3,
DeepSeek) could not be told to skip thinking, and with short
max_tokens this yielded empty content: [] responses because all
output tokens went to the <think> block.

Merge any dict-valued extra_body into the translated openai_dict
before the _UNKNOWN_FIELDS cleanup, with setdefault so fields produced
by the Anthropic->OpenAI translation keep precedence. Unknown keys are
dropped by Pydantic (extra='ignore' default), so this does not risk
validation errors for arbitrary SDK pass-through.

Unit test locks in the four shapes: single field, multiple fields,
top-level override wins, missing extra_body is a no-op, and non-dict
extra_body is ignored.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the extra_body field in the Anthropic-to-OpenAI request translation, enabling the forwarding of LightLLM-specific parameters such as chat_template_kwargs. The logic correctly prioritizes existing translated fields and ignores non-dictionary extra_body inputs. Comprehensive unit tests have been included to validate field forwarding, precedence rules, and edge cases. I have no feedback to provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant