Skip to content

[bot] OpenRouter streaming drops reasoning content from reasoning-capable models #1883

@braintrust-bot

Description

@braintrust-bot

Summary

The OpenRouter chat API returns reasoning, reasoning_content, and reasoning_details fields in chat completion responses when using reasoning-capable models (DeepSeek R1, Claude with extended thinking, etc.), but the OpenRouter plugin's streaming aggregation silently drops all reasoning content. The aggregateOpenRouterChatChunks function only accumulates delta.content (text) and delta.tool_calls; reasoning fields fall through and are lost. The vendor types also omit reasoning fields entirely. This is a direct parity gap with Anthropic, Google GenAI, OpenAI, Cohere, and AI SDK reasoning instrumentation in this repo.

What instrumentation is missing

1. Streaming aggregation: reasoning content silently dropped

In js/src/instrumentation/plugins/openrouter-plugin.ts, the aggregateOpenRouterChatChunks function (lines 810–933) only accumulates two fields from streaming deltas:

if (typeof delta.content === "string") {
  content += delta.content;  // only text content captured
}

// ... tool_calls handling ...

When OpenRouter sends delta.reasoning or delta.reasoning_details in streaming chunks, these fields are completely ignored. The final aggregated output (lines 918–931) only includes role, content, and tool_calls:

return {
  output: [{
    index: 0,
    message: {
      role,
      content: content || undefined,
      ...(toolCalls ? { tool_calls: toolCalls } : {}),
      // no reasoning field
    },
    // ...
  }],
  metrics,
};

2. Vendor types: no reasoning fields defined

The OpenRouterChatCompletionChunk type (js/src/vendor-sdk-types/openrouter.ts, lines 45–60) does not include reasoning, reasoning_content, or reasoning_details in the delta object:

export type OpenRouterChatCompletionChunk = {
  choices?: Array<{
    delta?: {
      role?: string;
      content?: string;
      tool_calls?: OpenRouterChatToolCallDelta[];
      toolCalls?: OpenRouterChatToolCallDelta[];
      finish_reason?: string | null;
      finishReason?: string | null;
      // missing: reasoning, reasoning_content, reasoning_details
    };
    // ...
  }>;
  // ...
};

Similarly, OpenRouterChatChoice.message (lines 24–33) only declares role, content, and tool_calls — no reasoning field for non-streaming responses.

3. Non-streaming: reasoning present but not explicitly extracted

For non-streaming responses, extractOutput returns result.choices directly (line 59), so any reasoning field on the message object would pass through in the raw output. However, it is not explicitly extracted or normalized, meaning it may be inconsistently represented compared to providers that explicitly handle reasoning content.

Upstream API format

OpenRouter surfaces reasoning content through documented fields:

  • reasoning — plaintext reasoning string on choices[].message.reasoning (non-streaming) and choices[].delta.reasoning (streaming)
  • reasoning_content — alias that works identically to reasoning
  • reasoning_details — structured array with reasoning.summary, reasoning.encrypted, and reasoning.text types, available at choices[].delta.reasoning_details in streaming

These fields are populated when using reasoning-capable models through OpenRouter (DeepSeek R1, Claude with extended thinking, QwQ, etc.).

Comparison with other providers in this repo

Provider Reasoning content captured Reasoning token metrics
Anthropic thinking_delta aggregated in streaming Cache tokens tracked
Google GenAI thought parts handled in content thoughtsTokenCount metric
OpenAI Reasoning tokens tracked via completion_tokens_details completion_reasoning_tokens metric
Cohere thinking content blocks aggregated reasoning_tokens metric
AI SDK reasoning-delta chunks aggregated into output.reasoning completion_reasoning_tokens metric
Mistral Silently dropped (open issue #1857) Token metrics captured
OpenRouter Silently dropped Token metrics captured via completion_tokens_details

Note: Reasoning token metrics ARE correctly captured by parseOpenRouterMetricsFromUsage through its generic TOKEN_DETAIL_PREFIX_MAP handling of completionTokensDetails.reasoning_tokens. The gap is specifically in reasoning content capture during streaming.

Braintrust docs status

not_found — The Braintrust docs list OpenRouter as a supported provider at https://www.braintrust.dev/docs/instrument/wrap-providers but do not mention reasoning content handling.

Upstream references

Local files inspected

  • js/src/instrumentation/plugins/openrouter-plugin.ts (lines 810–933: aggregateOpenRouterChatChunks only accumulates content and tool_calls; lines 343–367: TOKEN_NAME_MAP/TOKEN_DETAIL_PREFIX_MAP handle token metrics but not content)
  • js/src/vendor-sdk-types/openrouter.ts (lines 45–60: OpenRouterChatCompletionChunk delta missing reasoning fields; lines 24–33: OpenRouterChatChoice message missing reasoning field)
  • js/src/wrappers/openrouter.ts (wrapper proxies pass through correctly — issue is in the plugin's aggregation)
  • e2e/scenarios/openrouter-instrumentation/ (no reasoning test scenarios)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions