Summary
BraintrustMiddleware (model-level, from wrapLanguageModel) emits a span with empty metrics when the AI SDK provider returns a nested usage shape, which is the format @ai-sdk/openai@3.x uses on its Responses API path (gpt-4o, gpt-4.1, gpt-5, etc.).
Result: every span produced this way for OpenAI models has neither prompt_tokens, completion_tokens, nor tokens populated. Cost is therefore not computed in the dashboard. Anthropic spans capture fine because @ai-sdk/anthropic returns the flat shape.
Versions
| Package |
Version |
braintrust |
3.9.0 |
ai |
6.0.85 |
@ai-sdk/openai |
3.0.52 |
@ai-sdk/anthropic |
3.0.68 |
Reproduction
import { wrapLanguageModel, generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { BraintrustMiddleware } from "braintrust";
const model = wrapLanguageModel({
model: openai("gpt-4.1-mini"),
middleware: BraintrustMiddleware({ spanInfo: { name: "demo" } }),
});
await generateText({ model, prompt: "Say hi" });
// → A "demo" span lands in Braintrust with metrics = { start, end } only.
// No prompt_tokens / completion_tokens / tokens / cost.
Swap openai("gpt-4.1-mini") for anthropic("claude-sonnet-4-6") and metrics populate correctly.
Expected
prompt_tokens, completion_tokens, tokens (and where applicable prompt_cached_tokens, completion_reasoning_tokens) should be populated for every provider whose AI SDK adapter reports usage, regardless of whether the adapter returns the flat or nested shape.
Actual
For any @ai-sdk/openai@3.x call (chat-completions or Responses API path) the resulting span looks like:
{
"span_attributes": { "name": "demo", "type": "llm" },
"metadata": {
"model": "gpt-4.1-mini-2025-04-14",
"provider": "openai",
"finish_reason": { "unified": "stop" }
},
"metrics": { "start": 1777226818.85, "end": 1777226822.226 }
}
No tokens, no cost.
Root cause
@ai-sdk/openai@3.x (Responses API path — used for ALL chat-completions models, including gpt-4o-mini, gpt-4.1-mini, gpt-5-nano, etc.) normalizes usage into a nested shape, see node_modules/@ai-sdk/openai/dist/index.js:2602 convertOpenAIResponsesUsage:
return {
inputTokens: { total, noCache, cacheRead, cacheWrite },
outputTokens: { total, text, reasoning },
raw,
};
BraintrustMiddleware's wrapGenerate extracts metrics via normalizeUsageMetrics (braintrust/dist/index.js:21622), which reads usage.inputTokens / usage.outputTokens as numbers:
function normalizeUsageMetrics(usage, provider, providerMetadata) {
const metrics = {};
const inputTokens = getNumberProperty2(usage, "inputTokens"); // <- gets {total,...}, returns undefined
if (inputTokens !== void 0) metrics.prompt_tokens = inputTokens;
// …same for outputTokens, totalTokens, reasoningTokens, cachedInputTokens
return metrics;
}
getNumberProperty2 returns undefined when the property is an object, so every metric is silently skipped.
Note that extractTokenMetrics in the same file (line 13670) — used by the higher-level wrapAISDK / wrapGenerateText path — does handle the nested shape via _optionalChain([usage, 'access', _ => _.inputTokens, 'optionalAccess', _ => _.total]). Only the model-level normalizeUsageMetrics is missing this case.
Proposed fix
normalizeUsageMetrics should fall back to the nested-shape read when the flat read returns undefined. Patch:
function normalizeUsageMetrics(usage, provider, providerMetadata) {
const metrics = {};
- const inputTokens = getNumberProperty2(usage, "inputTokens");
+ const inputTokensFlat = getNumberProperty2(usage, "inputTokens");
+ const inputTokens = inputTokensFlat !== undefined
+ ? inputTokensFlat
+ : getNumberProperty2(usage?.inputTokens, "total");
if (inputTokens !== undefined) metrics.prompt_tokens = inputTokens;
- const outputTokens = getNumberProperty2(usage, "outputTokens");
+ const outputTokensFlat = getNumberProperty2(usage, "outputTokens");
+ const outputTokens = outputTokensFlat !== undefined
+ ? outputTokensFlat
+ : getNumberProperty2(usage?.outputTokens, "total");
if (outputTokens !== undefined) metrics.completion_tokens = outputTokens;
- const totalTokens = getNumberProperty2(usage, "totalTokens");
+ const totalTokensFlat = getNumberProperty2(usage, "totalTokens");
+ const totalTokens = totalTokensFlat !== undefined
+ ? totalTokensFlat
+ : (typeof inputTokens === "number" && typeof outputTokens === "number"
+ ? inputTokens + outputTokens : undefined);
if (totalTokens !== undefined) metrics.tokens = totalTokens;
- const reasoningTokens = getNumberProperty2(usage, "reasoningTokens");
+ const reasoningFlat = getNumberProperty2(usage, "reasoningTokens");
+ const reasoningTokens = reasoningFlat !== undefined
+ ? reasoningFlat
+ : getNumberProperty2(usage?.outputTokens, "reasoning");
if (reasoningTokens !== undefined) metrics.completion_reasoning_tokens = reasoningTokens;
- const cachedInputTokens = getNumberProperty2(usage, "cachedInputTokens");
+ const cachedInputFlat = getNumberProperty2(usage, "cachedInputTokens");
+ const cachedInputTokens = cachedInputFlat !== undefined
+ ? cachedInputFlat
+ : getNumberProperty2(usage?.inputTokens, "cacheRead");
if (cachedInputTokens !== undefined) metrics.prompt_cached_tokens = cachedInputTokens;
+ const cacheWriteTokens = getNumberProperty2(usage?.inputTokens, "cacheWrite");
+ if (cacheWriteTokens !== undefined) metrics.prompt_cache_creation_tokens = cacheWriteTokens;
+
if (provider === "anthropic") { /* …unchanged… */ }
return metrics;
}
This is the same dual-shape strategy already used by extractTokenMetrics higher in the file, so the two extractors agree.
Workaround (for anyone hitting this before a fix lands)
bun patch braintrust against dist/index.js and dist/index.mjs with the diff above. Confirmed working locally — OpenAI spans now show prompt_tokens, completion_tokens, tokens, and prompt_cached_tokens, and the dashboard computes cost.
Scope
Affects every consumer of BraintrustMiddleware (model-level wrapping, the path the docs recommend for AI SDK integrations) when paired with @ai-sdk/openai@3.x. Likely also affects future providers that adopt the nested shape. Not specific to reasoning models — observed across gpt-4o-mini, gpt-4.1-mini, gpt-5*-nano.
Summary
BraintrustMiddleware(model-level, fromwrapLanguageModel) emits a span with emptymetricswhen the AI SDK provider returns a nestedusageshape, which is the format@ai-sdk/openai@3.xuses on its Responses API path (gpt-4o, gpt-4.1, gpt-5, etc.).Result: every span produced this way for OpenAI models has neither
prompt_tokens,completion_tokens, nortokenspopulated. Cost is therefore not computed in the dashboard. Anthropic spans capture fine because@ai-sdk/anthropicreturns the flat shape.Versions
braintrust3.9.0ai6.0.85@ai-sdk/openai3.0.52@ai-sdk/anthropic3.0.68Reproduction
Swap
openai("gpt-4.1-mini")foranthropic("claude-sonnet-4-6")and metrics populate correctly.Expected
prompt_tokens,completion_tokens,tokens(and where applicableprompt_cached_tokens,completion_reasoning_tokens) should be populated for every provider whose AI SDK adapter reports usage, regardless of whether the adapter returns the flat or nested shape.Actual
For any
@ai-sdk/openai@3.xcall (chat-completions or Responses API path) the resulting span looks like:{ "span_attributes": { "name": "demo", "type": "llm" }, "metadata": { "model": "gpt-4.1-mini-2025-04-14", "provider": "openai", "finish_reason": { "unified": "stop" } }, "metrics": { "start": 1777226818.85, "end": 1777226822.226 } }No tokens, no cost.
Root cause
@ai-sdk/openai@3.x(Responses API path — used for ALL chat-completions models, including gpt-4o-mini, gpt-4.1-mini, gpt-5-nano, etc.) normalizes usage into a nested shape, seenode_modules/@ai-sdk/openai/dist/index.js:2602convertOpenAIResponsesUsage:BraintrustMiddleware'swrapGenerateextracts metrics vianormalizeUsageMetrics(braintrust/dist/index.js:21622), which readsusage.inputTokens/usage.outputTokensas numbers:getNumberProperty2returnsundefinedwhen the property is an object, so every metric is silently skipped.Note that
extractTokenMetricsin the same file (line 13670) — used by the higher-levelwrapAISDK/wrapGenerateTextpath — does handle the nested shape via_optionalChain([usage, 'access', _ => _.inputTokens, 'optionalAccess', _ => _.total]). Only the model-levelnormalizeUsageMetricsis missing this case.Proposed fix
normalizeUsageMetricsshould fall back to the nested-shape read when the flat read returnsundefined. Patch:function normalizeUsageMetrics(usage, provider, providerMetadata) { const metrics = {}; - const inputTokens = getNumberProperty2(usage, "inputTokens"); + const inputTokensFlat = getNumberProperty2(usage, "inputTokens"); + const inputTokens = inputTokensFlat !== undefined + ? inputTokensFlat + : getNumberProperty2(usage?.inputTokens, "total"); if (inputTokens !== undefined) metrics.prompt_tokens = inputTokens; - const outputTokens = getNumberProperty2(usage, "outputTokens"); + const outputTokensFlat = getNumberProperty2(usage, "outputTokens"); + const outputTokens = outputTokensFlat !== undefined + ? outputTokensFlat + : getNumberProperty2(usage?.outputTokens, "total"); if (outputTokens !== undefined) metrics.completion_tokens = outputTokens; - const totalTokens = getNumberProperty2(usage, "totalTokens"); + const totalTokensFlat = getNumberProperty2(usage, "totalTokens"); + const totalTokens = totalTokensFlat !== undefined + ? totalTokensFlat + : (typeof inputTokens === "number" && typeof outputTokens === "number" + ? inputTokens + outputTokens : undefined); if (totalTokens !== undefined) metrics.tokens = totalTokens; - const reasoningTokens = getNumberProperty2(usage, "reasoningTokens"); + const reasoningFlat = getNumberProperty2(usage, "reasoningTokens"); + const reasoningTokens = reasoningFlat !== undefined + ? reasoningFlat + : getNumberProperty2(usage?.outputTokens, "reasoning"); if (reasoningTokens !== undefined) metrics.completion_reasoning_tokens = reasoningTokens; - const cachedInputTokens = getNumberProperty2(usage, "cachedInputTokens"); + const cachedInputFlat = getNumberProperty2(usage, "cachedInputTokens"); + const cachedInputTokens = cachedInputFlat !== undefined + ? cachedInputFlat + : getNumberProperty2(usage?.inputTokens, "cacheRead"); if (cachedInputTokens !== undefined) metrics.prompt_cached_tokens = cachedInputTokens; + const cacheWriteTokens = getNumberProperty2(usage?.inputTokens, "cacheWrite"); + if (cacheWriteTokens !== undefined) metrics.prompt_cache_creation_tokens = cacheWriteTokens; + if (provider === "anthropic") { /* …unchanged… */ } return metrics; }This is the same dual-shape strategy already used by
extractTokenMetricshigher in the file, so the two extractors agree.Workaround (for anyone hitting this before a fix lands)
bun patch braintrustagainstdist/index.jsanddist/index.mjswith the diff above. Confirmed working locally — OpenAI spans now showprompt_tokens,completion_tokens,tokens, andprompt_cached_tokens, and the dashboard computes cost.Scope
Affects every consumer of
BraintrustMiddleware(model-level wrapping, the path the docs recommend for AI SDK integrations) when paired with@ai-sdk/openai@3.x. Likely also affects future providers that adopt the nested shape. Not specific to reasoning models — observed acrossgpt-4o-mini,gpt-4.1-mini,gpt-5*-nano.