Skip to content

BraintrustMiddleware drops token / cost metrics for AI SDK v3 providers that return nested usage (e.g. @ai-sdk/openai 3.x Responses API) #1909

@Ludusss

Description

@Ludusss

Summary

BraintrustMiddleware (model-level, from wrapLanguageModel) emits a span with empty metrics when the AI SDK provider returns a nested usage shape, which is the format @ai-sdk/openai@3.x uses on its Responses API path (gpt-4o, gpt-4.1, gpt-5, etc.).

Result: every span produced this way for OpenAI models has neither prompt_tokens, completion_tokens, nor tokens populated. Cost is therefore not computed in the dashboard. Anthropic spans capture fine because @ai-sdk/anthropic returns the flat shape.

Versions

Package Version
braintrust 3.9.0
ai 6.0.85
@ai-sdk/openai 3.0.52
@ai-sdk/anthropic 3.0.68

Reproduction

import { wrapLanguageModel, generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { BraintrustMiddleware } from "braintrust";

const model = wrapLanguageModel({
  model: openai("gpt-4.1-mini"),
  middleware: BraintrustMiddleware({ spanInfo: { name: "demo" } }),
});

await generateText({ model, prompt: "Say hi" });
// → A "demo" span lands in Braintrust with metrics = { start, end } only.
//   No prompt_tokens / completion_tokens / tokens / cost.

Swap openai("gpt-4.1-mini") for anthropic("claude-sonnet-4-6") and metrics populate correctly.

Expected

prompt_tokens, completion_tokens, tokens (and where applicable prompt_cached_tokens, completion_reasoning_tokens) should be populated for every provider whose AI SDK adapter reports usage, regardless of whether the adapter returns the flat or nested shape.

Actual

For any @ai-sdk/openai@3.x call (chat-completions or Responses API path) the resulting span looks like:

{
  "span_attributes": { "name": "demo", "type": "llm" },
  "metadata": {
    "model": "gpt-4.1-mini-2025-04-14",
    "provider": "openai",
    "finish_reason": { "unified": "stop" }
  },
  "metrics": { "start": 1777226818.85, "end": 1777226822.226 }
}

No tokens, no cost.

Root cause

@ai-sdk/openai@3.x (Responses API path — used for ALL chat-completions models, including gpt-4o-mini, gpt-4.1-mini, gpt-5-nano, etc.) normalizes usage into a nested shape, see node_modules/@ai-sdk/openai/dist/index.js:2602 convertOpenAIResponsesUsage:

return {
  inputTokens:  { total, noCache, cacheRead, cacheWrite },
  outputTokens: { total, text, reasoning },
  raw,
};

BraintrustMiddleware's wrapGenerate extracts metrics via normalizeUsageMetrics (braintrust/dist/index.js:21622), which reads usage.inputTokens / usage.outputTokens as numbers:

function normalizeUsageMetrics(usage, provider, providerMetadata) {
  const metrics = {};
  const inputTokens = getNumberProperty2(usage, "inputTokens");   // <- gets {total,...}, returns undefined
  if (inputTokens !== void 0) metrics.prompt_tokens = inputTokens;
  // …same for outputTokens, totalTokens, reasoningTokens, cachedInputTokens
  return metrics;
}

getNumberProperty2 returns undefined when the property is an object, so every metric is silently skipped.

Note that extractTokenMetrics in the same file (line 13670) — used by the higher-level wrapAISDK / wrapGenerateText path — does handle the nested shape via _optionalChain([usage, 'access', _ => _.inputTokens, 'optionalAccess', _ => _.total]). Only the model-level normalizeUsageMetrics is missing this case.

Proposed fix

normalizeUsageMetrics should fall back to the nested-shape read when the flat read returns undefined. Patch:

 function normalizeUsageMetrics(usage, provider, providerMetadata) {
   const metrics = {};
-  const inputTokens = getNumberProperty2(usage, "inputTokens");
+  const inputTokensFlat = getNumberProperty2(usage, "inputTokens");
+  const inputTokens = inputTokensFlat !== undefined
+    ? inputTokensFlat
+    : getNumberProperty2(usage?.inputTokens, "total");
   if (inputTokens !== undefined) metrics.prompt_tokens = inputTokens;

-  const outputTokens = getNumberProperty2(usage, "outputTokens");
+  const outputTokensFlat = getNumberProperty2(usage, "outputTokens");
+  const outputTokens = outputTokensFlat !== undefined
+    ? outputTokensFlat
+    : getNumberProperty2(usage?.outputTokens, "total");
   if (outputTokens !== undefined) metrics.completion_tokens = outputTokens;

-  const totalTokens = getNumberProperty2(usage, "totalTokens");
+  const totalTokensFlat = getNumberProperty2(usage, "totalTokens");
+  const totalTokens = totalTokensFlat !== undefined
+    ? totalTokensFlat
+    : (typeof inputTokens === "number" && typeof outputTokens === "number"
+        ? inputTokens + outputTokens : undefined);
   if (totalTokens !== undefined) metrics.tokens = totalTokens;

-  const reasoningTokens = getNumberProperty2(usage, "reasoningTokens");
+  const reasoningFlat = getNumberProperty2(usage, "reasoningTokens");
+  const reasoningTokens = reasoningFlat !== undefined
+    ? reasoningFlat
+    : getNumberProperty2(usage?.outputTokens, "reasoning");
   if (reasoningTokens !== undefined) metrics.completion_reasoning_tokens = reasoningTokens;

-  const cachedInputTokens = getNumberProperty2(usage, "cachedInputTokens");
+  const cachedInputFlat = getNumberProperty2(usage, "cachedInputTokens");
+  const cachedInputTokens = cachedInputFlat !== undefined
+    ? cachedInputFlat
+    : getNumberProperty2(usage?.inputTokens, "cacheRead");
   if (cachedInputTokens !== undefined) metrics.prompt_cached_tokens = cachedInputTokens;

+  const cacheWriteTokens = getNumberProperty2(usage?.inputTokens, "cacheWrite");
+  if (cacheWriteTokens !== undefined) metrics.prompt_cache_creation_tokens = cacheWriteTokens;
+
   if (provider === "anthropic") { /* …unchanged… */ }
   return metrics;
 }

This is the same dual-shape strategy already used by extractTokenMetrics higher in the file, so the two extractors agree.

Workaround (for anyone hitting this before a fix lands)

bun patch braintrust against dist/index.js and dist/index.mjs with the diff above. Confirmed working locally — OpenAI spans now show prompt_tokens, completion_tokens, tokens, and prompt_cached_tokens, and the dashboard computes cost.

Scope

Affects every consumer of BraintrustMiddleware (model-level wrapping, the path the docs recommend for AI SDK integrations) when paired with @ai-sdk/openai@3.x. Likely also affects future providers that adopt the nested shape. Not specific to reasoning models — observed across gpt-4o-mini, gpt-4.1-mini, gpt-5*-nano.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions