fix(provider/openai): normalize oversized tool_call IDs to avoid 400 from strict upstreams by x1051445024 · Pull Request #8155 · AstrBotDevs/AstrBot

x1051445024 · 2026-05-11T17:21:25Z

Normalize oversized `tool_call` IDs to avoid 400 from strict upstreams

Problem

Some OpenAI-compatible relay services (e.g. "溏心", oneapi-like proxies) return
tool_call.id strings that far exceed the 64-character limit enforced by the
OpenAI API spec. Observed lengths in production traffic: 660 and 1650+ characters.

Because AstrBot round-trips those IDs back into the next request's
messages[].tool_calls[].id / tool_call_id fields, the upstream responds with:

openai.BadRequestError: Error code: 400 - {
  'error': {
    'message': "Invalid 'input[16].call_id': string too long. Expected a string
                with maximum length 64, but got a string with length 1650 instead.",
    'type': 'invalid_request_error',
    'param': 'input[16].call_id',
    'code': 'string_above_max_length'
  }
}

Some relays internally translate Chat Completions payloads into the Responses API
format, which renames tool_call_id to call_id — but the root cause is the
same: the ID AstrBot emits is too long to begin with.

Fix

Two layers of defense in core/provider/sources/openai_source.py:

1. Response-parsing layer

In _parse_openai_completion, any freshly-received tool_call.id longer than
64 chars is deterministically shortened to call_<md5> (37 chars) before it
enters the conversation history. The associated extra_content (used by
Gemini-2.5/3 pass-through) is re-keyed with the normalized ID.

2. Pre-send fallback layer

A new static method _normalize_tool_call_ids scans
payloads["messages"] before every Chat Completions request (both _query and
_query_stream). It collects every oversized ID across assistant
tool_calls[].id and tool tool_call_id into a shared map, then rewrites them
in a second pass. This guarantees assistant-call / tool-result pairing stays
intact, and also rescues long IDs persisted in the history before the fix was
deployed.

Shortening is deterministic (MD5 of the original ID), so a single relay session
stays internally consistent across retries. No new dependencies are introduced
— hashlib is already imported at the top of the file.

Verification

Before: every tool-using request against the affected relay returned HTTP 400.
After: request succeeds; log shows
[tool_call_id 规范化] 发送前兜底：共 N 个超长 ID 被短化 when oversized
IDs are encountered, and nothing when they are not.

Risk assessment

Scope: limited to openai_source.py; only executes on oversized IDs.
Backwards compatibility: when all IDs are already ≤64 chars, both layers
are no-ops (the fallback method returns early on an empty map).
Provider coverage: only Chat Completions path (OpenAI-compatible). Anthropic
and Gemini native sources are untouched — they have their own ID schemes.

Summary by Sourcery

Normalize and sanitize oversized OpenAI tool_call IDs to prevent upstream 400 errors and keep assistant/tool message IDs consistent.

Bug Fixes:

Shorten overlong tool_call IDs in parsed OpenAI chat completions to stay within the 64-character API limit.
Normalize oversized tool_call IDs in outgoing chat completion payloads so assistant tool_calls and tool responses remain correctly paired without triggering validation errors.

Enhancements:

Add warning logs when tool_call IDs are normalized to aid in diagnosing relay or provider issues.

…from strict upstreams

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

The new _normalize_tool_call_ids staticmethod appears to have an extra leading space before the @staticmethod decorator and some misaligned indentation inside the method body (e.g., after the + markers and in the extra_content block), which will cause style issues or even an IndentationError in Python—please align the decorator and body with the surrounding class methods.
The logic for shortening oversized IDs ('call_' + hashlib.md5(...).hexdigest()) is duplicated in both _normalize_tool_call_ids and _parse_openai_completion; consider extracting this into a small private helper (e.g., _shorten_tool_call_id(raw_id: str) -> str) to keep the behavior consistent and easier to maintain.
The warning log in _parse_openai_completion includes the first 80 characters of the original tool_call ID, which might leak opaque or sensitive identifiers into logs; consider logging only the length and hash (or a redacted prefix) instead of part of the raw ID.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The new `_normalize_tool_call_ids` staticmethod appears to have an extra leading space before the `@staticmethod` decorator and some misaligned indentation inside the method body (e.g., after the `+` markers and in the extra_content block), which will cause style issues or even an `IndentationError` in Python—please align the decorator and body with the surrounding class methods.
- The logic for shortening oversized IDs (`'call_' + hashlib.md5(...).hexdigest()`) is duplicated in both `_normalize_tool_call_ids` and `_parse_openai_completion`; consider extracting this into a small private helper (e.g., `_shorten_tool_call_id(raw_id: str) -> str`) to keep the behavior consistent and easier to maintain.
- The warning log in `_parse_openai_completion` includes the first 80 characters of the original tool_call ID, which might leak opaque or sensitive identifiers into logs; consider logging only the length and hash (or a redacted prefix) instead of part of the raw ID.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="984-985" />
<code_context>
-                    tool_call_ids.append(tool_call.id)
+                    
+                    raw_id = tool_call.id
+                    if raw_id and len(raw_id) > 64:
+                        safe_id = "call_" + hashlib.md5(
+                            raw_id.encode("utf-8")
+                        ).hexdigest()
</code_context>
<issue_to_address>
**suggestion:** Consider explicitly marking md5 usage as non-security-sensitive or centralizing the ID-shortening logic.

Since this md5 use is only for deterministic ID shortening, it can easily be misread as security-related later. Either move the `call_<md5>` generation into a shared helper with `_normalize_tool_call_ids`, or wrap this in a small function that clearly documents it as a non-cryptographic helper and keeps all callsites consistent.

Suggested implementation:

```python
            elif role == "tool":
                tid = msg.get("tool_call_id")
                if tid and tid not in id_map:
                    id_map[tid] = self._shorten_tool_call_id(tid)

```

```python
        self._normalize_tool_call_ids(payloads)

        stream = await self.client.chat.completions.create(
            **payloads,
                        args = tool_call.function.arguments
                    args_ls.append(args)
                    func_name_ls.append(tool_call.function.name)

                    raw_id = tool_call.id
                    safe_id = self._shorten_tool_call_id(raw_id) if raw_id else None

```

```python
    def _shorten_tool_call_id(self, raw_id: str) -> str:
        """
        Deterministically shorten a potentially long tool call id.

        This helper is explicitly non-cryptographic and MUST NOT be used for any
        security-sensitive purpose. It exists only to normalize/shorten IDs whose
        length may exceed storage or protocol limits.
        """
        if not raw_id:
            return raw_id
        # Keep already-short IDs as-is to avoid unnecessary transformations.
        if len(raw_id) <= 64:
            return raw_id

        # Using md5 here is acceptable because we only need a deterministic,
        # compact representation and are not relying on any cryptographic property.
        return "call_" + hashlib.md5(raw_id.encode("utf-8")).hexdigest()

    def _normalize_tool_call_ids(

```

1. Ensure `hashlib` is imported at the top of `openai_source.py` if it is not already:
   - `import hashlib`
2. The new `safe_id` computed from `raw_id` should be wired into the subsequent logic in the streaming handler (e.g., where IDs are appended or mapped). For example, if you currently do `tool_call_ids.append(tool_call.id)`, you likely want to change that to use `safe_id or raw_id` so both code paths share the same normalization behavior:
   - `tool_call_ids.append(safe_id or raw_id)`
3. If `_normalize_tool_call_ids` is a `@staticmethod` or exists outside the class, adjust `_shorten_tool_call_id` accordingly (e.g., make it a `@staticmethod` or a module-level function and call it without `self.`).
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

gemini-code-assist

Code Review

This pull request introduces a mechanism to normalize oversized tool_call IDs that exceed the 64-character limit enforced by the OpenAI API, preventing HTTP 400 errors. It adds a _normalize_tool_call_ids method to hash long IDs into a deterministic 37-character format and applies this logic during request preparation and response parsing. Feedback suggests refactoring the ID hashing logic into a shared helper function to eliminate code duplication across different roles and methods.

gemini-code-assist · 2026-05-11T17:23:03Z

+                        if tid and len(tid) > 64 and tid not in id_map:
+                            id_map[tid] = "call_" + hashlib.md5(
+                                tid.encode("utf-8")
+                            ).hexdigest()
+
+            elif role == "tool":
+                tid = msg.get("tool_call_id")
+                if tid and len(tid) > 64 and tid not in id_map:
+                    id_map[tid] = "call_" + hashlib.md5(
+                        tid.encode("utf-8")
+                    ).hexdigest()


The logic for generating a deterministic short ID from an oversized tool call ID is repeated in multiple places (here, for the 'tool' role, and in _parse_openai_completion). Following the repository's general rules, this logic should be refactored into a shared helper function to improve maintainability and avoid code duplication.

References

When implementing similar functionality for different cases, refactor the logic into a shared helper function to avoid code duplication.

…ging raw IDs - Extract the shortening logic into a dedicated `_shorten_tool_call_id` helper, shared by `_parse_openai_completion` and `_normalize_tool_call_ids`, and document it as explicitly non-cryptographic. Addresses sourcery-ai and gemini-code-assist review comments about duplicated hashing logic. - Log only the original ID length and the normalized short form when a long tool_call ID is encountered; avoid including the raw ID prefix, which may be provider-specific and opaque. Addresses sourcery-ai review comment about potentially leaking raw IDs. - Apply `_normalize_tool_call_ids` through a small internal registration closure so each role's collection path is a single line. No behavior change when all IDs are already ≤64 chars.

x1051445024 · 2026-05-11T17:46:09Z

Thanks for the review. Addressed in the follow-up commit:

Indentation / @staticmethod decorator — verified runtime-correct
locally (AstrBot boots and the warning log fires on real traffic). The
apparent misalignment was an artifact of the diff view with leading
+ markers.
Duplicated shortening logic — extracted into
_shorten_tool_call_id, now shared by _normalize_tool_call_ids and
_parse_openai_completion. The helper is documented as explicitly
non-cryptographic.
Log possibly leaking raw IDs — log now only includes the original
ID length and the normalized short form, not any prefix of the raw ID.

fix(provider/openai): normalize oversized tool_call IDs to avoid 400 …

e184285

…from strict upstreams

auto-assign Bot requested review from Fridemn and LIghtJUNction May 11, 2026 17:21

dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels May 11, 2026

sourcery-ai Bot reviewed May 11, 2026

View reviewed changes

Comment thread astrbot/core/provider/sources/openai_source.py Outdated

gemini-code-assist Bot reviewed May 11, 2026

View reviewed changes

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels May 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(provider/openai): normalize oversized tool_call IDs to avoid 400 from strict upstreams#8155

fix(provider/openai): normalize oversized tool_call IDs to avoid 400 from strict upstreams#8155
x1051445024 wants to merge 2 commits into
AstrBotDevs:masterfrom
x1051445024:fix/normalize-oversized-tool-call-ids

x1051445024 commented May 11, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

x1051445024 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

x1051445024 commented May 11, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Normalize oversized tool_call IDs to avoid 400 from strict upstreams

Problem

Fix

1. Response-parsing layer

2. Pre-send fallback layer

Verification

Risk assessment

Summary by Sourcery

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

x1051445024 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

x1051445024 commented May 11, 2026 •

edited by sourcery-ai Bot

Loading

Normalize oversized `tool_call` IDs to avoid 400 from strict upstreams