fix(provider/openai): normalize oversized tool_call IDs to avoid 400 from strict upstreams#8155
Conversation
…from strict upstreams
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The new
_normalize_tool_call_idsstaticmethod appears to have an extra leading space before the@staticmethoddecorator and some misaligned indentation inside the method body (e.g., after the+markers and in the extra_content block), which will cause style issues or even anIndentationErrorin Python—please align the decorator and body with the surrounding class methods. - The logic for shortening oversized IDs (
'call_' + hashlib.md5(...).hexdigest()) is duplicated in both_normalize_tool_call_idsand_parse_openai_completion; consider extracting this into a small private helper (e.g.,_shorten_tool_call_id(raw_id: str) -> str) to keep the behavior consistent and easier to maintain. - The warning log in
_parse_openai_completionincludes the first 80 characters of the original tool_call ID, which might leak opaque or sensitive identifiers into logs; consider logging only the length and hash (or a redacted prefix) instead of part of the raw ID.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `_normalize_tool_call_ids` staticmethod appears to have an extra leading space before the `@staticmethod` decorator and some misaligned indentation inside the method body (e.g., after the `+` markers and in the extra_content block), which will cause style issues or even an `IndentationError` in Python—please align the decorator and body with the surrounding class methods.
- The logic for shortening oversized IDs (`'call_' + hashlib.md5(...).hexdigest()`) is duplicated in both `_normalize_tool_call_ids` and `_parse_openai_completion`; consider extracting this into a small private helper (e.g., `_shorten_tool_call_id(raw_id: str) -> str`) to keep the behavior consistent and easier to maintain.
- The warning log in `_parse_openai_completion` includes the first 80 characters of the original tool_call ID, which might leak opaque or sensitive identifiers into logs; consider logging only the length and hash (or a redacted prefix) instead of part of the raw ID.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="984-985" />
<code_context>
- tool_call_ids.append(tool_call.id)
+
+ raw_id = tool_call.id
+ if raw_id and len(raw_id) > 64:
+ safe_id = "call_" + hashlib.md5(
+ raw_id.encode("utf-8")
+ ).hexdigest()
</code_context>
<issue_to_address>
**suggestion:** Consider explicitly marking md5 usage as non-security-sensitive or centralizing the ID-shortening logic.
Since this md5 use is only for deterministic ID shortening, it can easily be misread as security-related later. Either move the `call_<md5>` generation into a shared helper with `_normalize_tool_call_ids`, or wrap this in a small function that clearly documents it as a non-cryptographic helper and keeps all callsites consistent.
Suggested implementation:
```python
elif role == "tool":
tid = msg.get("tool_call_id")
if tid and tid not in id_map:
id_map[tid] = self._shorten_tool_call_id(tid)
```
```python
self._normalize_tool_call_ids(payloads)
stream = await self.client.chat.completions.create(
**payloads,
args = tool_call.function.arguments
args_ls.append(args)
func_name_ls.append(tool_call.function.name)
raw_id = tool_call.id
safe_id = self._shorten_tool_call_id(raw_id) if raw_id else None
```
```python
def _shorten_tool_call_id(self, raw_id: str) -> str:
"""
Deterministically shorten a potentially long tool call id.
This helper is explicitly non-cryptographic and MUST NOT be used for any
security-sensitive purpose. It exists only to normalize/shorten IDs whose
length may exceed storage or protocol limits.
"""
if not raw_id:
return raw_id
# Keep already-short IDs as-is to avoid unnecessary transformations.
if len(raw_id) <= 64:
return raw_id
# Using md5 here is acceptable because we only need a deterministic,
# compact representation and are not relying on any cryptographic property.
return "call_" + hashlib.md5(raw_id.encode("utf-8")).hexdigest()
def _normalize_tool_call_ids(
```
1. Ensure `hashlib` is imported at the top of `openai_source.py` if it is not already:
- `import hashlib`
2. The new `safe_id` computed from `raw_id` should be wired into the subsequent logic in the streaming handler (e.g., where IDs are appended or mapped). For example, if you currently do `tool_call_ids.append(tool_call.id)`, you likely want to change that to use `safe_id or raw_id` so both code paths share the same normalization behavior:
- `tool_call_ids.append(safe_id or raw_id)`
3. If `_normalize_tool_call_ids` is a `@staticmethod` or exists outside the class, adjust `_shorten_tool_call_id` accordingly (e.g., make it a `@staticmethod` or a module-level function and call it without `self.`).
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to normalize oversized tool_call IDs that exceed the 64-character limit enforced by the OpenAI API, preventing HTTP 400 errors. It adds a _normalize_tool_call_ids method to hash long IDs into a deterministic 37-character format and applies this logic during request preparation and response parsing. Feedback suggests refactoring the ID hashing logic into a shared helper function to eliminate code duplication across different roles and methods.
| if tid and len(tid) > 64 and tid not in id_map: | ||
| id_map[tid] = "call_" + hashlib.md5( | ||
| tid.encode("utf-8") | ||
| ).hexdigest() | ||
|
|
||
| elif role == "tool": | ||
| tid = msg.get("tool_call_id") | ||
| if tid and len(tid) > 64 and tid not in id_map: | ||
| id_map[tid] = "call_" + hashlib.md5( | ||
| tid.encode("utf-8") | ||
| ).hexdigest() |
There was a problem hiding this comment.
The logic for generating a deterministic short ID from an oversized tool call ID is repeated in multiple places (here, for the 'tool' role, and in _parse_openai_completion). Following the repository's general rules, this logic should be refactored into a shared helper function to improve maintainability and avoid code duplication.
References
- When implementing similar functionality for different cases, refactor the logic into a shared helper function to avoid code duplication.
…ging raw IDs - Extract the shortening logic into a dedicated `_shorten_tool_call_id` helper, shared by `_parse_openai_completion` and `_normalize_tool_call_ids`, and document it as explicitly non-cryptographic. Addresses sourcery-ai and gemini-code-assist review comments about duplicated hashing logic. - Log only the original ID length and the normalized short form when a long tool_call ID is encountered; avoid including the raw ID prefix, which may be provider-specific and opaque. Addresses sourcery-ai review comment about potentially leaking raw IDs. - Apply `_normalize_tool_call_ids` through a small internal registration closure so each role's collection path is a single line. No behavior change when all IDs are already ≤64 chars.
|
Thanks for the review. Addressed in the follow-up commit:
|
Normalize oversized
tool_callIDs to avoid 400 from strict upstreamsProblem
Some OpenAI-compatible relay services (e.g. "溏心", oneapi-like proxies) return
tool_call.idstrings that far exceed the 64-character limit enforced by theOpenAI API spec. Observed lengths in production traffic: 660 and 1650+ characters.
Because AstrBot round-trips those IDs back into the next request's
messages[].tool_calls[].id/tool_call_idfields, the upstream responds with:Some relays internally translate Chat Completions payloads into the Responses API
format, which renames
tool_call_idtocall_id— but the root cause is thesame: the ID AstrBot emits is too long to begin with.
Fix
Two layers of defense in
core/provider/sources/openai_source.py:1. Response-parsing layer
In
_parse_openai_completion, any freshly-receivedtool_call.idlonger than64 chars is deterministically shortened to
call_<md5>(37 chars) before itenters the conversation history. The associated
extra_content(used byGemini-2.5/3 pass-through) is re-keyed with the normalized ID.
2. Pre-send fallback layer
A new static method
_normalize_tool_call_idsscanspayloads["messages"]before every Chat Completions request (both_queryand_query_stream). It collects every oversized ID across assistanttool_calls[].idand tooltool_call_idinto a shared map, then rewrites themin a second pass. This guarantees assistant-call / tool-result pairing stays
intact, and also rescues long IDs persisted in the history before the fix was
deployed.
Shortening is deterministic (MD5 of the original ID), so a single relay session
stays internally consistent across retries. No new dependencies are introduced
—
hashlibis already imported at the top of the file.Verification
[tool_call_id 规范化] 发送前兜底:共 N 个超长 ID 被短化when oversizedIDs are encountered, and nothing when they are not.
Risk assessment
openai_source.py; only executes on oversized IDs.are no-ops (the fallback method returns early on an empty map).
and Gemini native sources are untouched — they have their own ID schemes.
Summary by Sourcery
Normalize and sanitize oversized OpenAI tool_call IDs to prevent upstream 400 errors and keep assistant/tool message IDs consistent.
Bug Fixes:
Enhancements: