Expose per-turn token usage and cost in the after_llm_call hook payload

## Overview

Add per-turn token usage and cost to the `after_llm_call` hook payload. Today the payload carries only:

```
&hooks.Input{
    SessionID:       sess.ID,
    AgentName:       a.Name(),
    StopResponse:    responseContent,
    LastUserMessage: sess.GetLastUserMessageContent(),
}
```

This proposal adds, additively:
- `usage` — the per-call `chat.Usage` (input / cached input / cache write / output tokens), nested to avoid colliding with existing flat token fields
- `cost` — the per-call cost in USD
- ~~`model_id` — the model actually used for this call~~ *(edit: already landed in #2911 — remaining scope is `usage` + `cost` only)*

## Motivation

There is currently no supported way to observe per-LLM-call usage or cost from outside the core. The only place spend is materialized is the session store, and the session store only records a sub-session **when it completes successfully**.

When a sub-session (delegated agent / skill / background agent) **fails**, its turns are dropped from persistence:
- `pkg/runtime/agent_delegation.go` `runForwarding` returns early on the first `ErrorEvent` (~line 289), so `parent.AddSubSession(s)` and `SubSessionCompletedEvent` are never reached.
- `runCollecting` has the same shape (returns on `errMsg != ""` before `AddSubSession`).
- The persistence observer only writes a sub-session in response to `SubSessionCompletedEvent`, so a failed sub-session's turns never reach the DB.

This is reasonable for the **session DB** (a session is resumable conversation state, and a failed sub-session isn't part of it). But the API calls were **already billed**. The cost the session reports can be far below the actual provider invoice.

Concretely: in a real run against Vertex AI with several failing sub-sessions, the actual invoice was **~9× the cost shown by the session**. Each failed sub-session made multiple billed model calls that never appeared in any total. This makes spend impossible to reconcile and is a real safety concern when running against a metered provider.

The root cause is that cost recording is coupled to session lifecycle (`SubSessionCompletedEvent`). The cleanest place to expose spend is the layer where it actually happens — once per LLM call. `executeAfterLLMCallHooks` already fires inside the shared turn loop (`pkg/runtime/loop.go:572`) on every successful model call, for sub-sessions too, and **before** the sub-session failure handling in `agent_delegation.go`. A per-turn payload with usage and cost therefore captures failed sub-session spend automatically, with no coupling to session completion.

## Use Cases

1. **Spend reconciliation / cost ledger** — a sidecar (agent YAML + a shell / Node / Python script) appends every billed turn to its own store, so the total matches the provider invoice even when sub-sessions fail.
2. **Budget guard** — a handler that warns or stops a run once cumulative cost crosses a threshold.
3. **Alerting / telemetry** — forward per-turn usage and cost to an external monitoring system, independent of session persistence.

## Proposed Solution

Widen the `after_llm_call` payload at `executeAfterLLMCallHooks` to include `usage` and `cost`. The data already exists locally at the call site in the turn loop; this is plumbing, not new computation. With this primitive, a cost ledger lives entirely outside the core — no new core subsystem and no new storage owned by the core.

Because this widens a **public, irreversible** payload contract, the schema shape should be settled first. Open questions:

1. **Unpriced vs free.** A flat `cost: 0` can't distinguish a free call from a model with no pricing. Prefer a structural signal (e.g. nullable cost = unpriced) over a documented "check `usage != nil`" convention.
2. **Token field naming.** Nesting under `usage` avoids collision with the existing compaction-related token fields — is this the preferred convention?
3. **Coverage.** Main runtime and harness paths are straightforward; compaction sub-runtimes and the chatserver / a2a / acp paths likely need follow-up. Should the initial change scope to the main loop?

I have a working implementation of the payload widening plus an example sidecar ledger, and am happy to open a PR once the schema is agreed.

## Alternatives

- **Internal Observer + built-in cost ledger** (a new `pkg/costlog` package with its own SQLite store). This works and needs no public API change, but it makes the core own a new subsystem, storage format, and config surface, and is Go-only. The hook approach keeps the core to a primitive and lets the ledger be any language. (I have prototyped both.)
- **Persist failed sub-sessions into the session DB.** This conflicts with the intended meaning of the session store as resumable state, so it's a poorer fit.

## Related Issues

- [#504](https://github.com/docker/docker-agent/issues/504) Token tracking when using subagents (closed)
- [#90](https://github.com/docker/docker-agent/issues/90) Cost tracking for multi-agent runs sometimes resets to zero (closed)
- [#1345](https://github.com/docker/docker-agent/issues/1345) Stats on usage + interaction summary (open)

## Additional Context

Line references are against `main` at the time of writing and may drift. ~~The proposal also reconciles the existing doc that says `after_llm_call` populates the model id, which today it does not.~~ *(edit: that was fixed in #2911; `model_id` is already populated. The remaining scope of this issue is `usage` + `cost`.)*





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose per-turn token usage and cost in the after_llm_call hook payload #2948

Overview

Motivation

Use Cases

Proposed Solution

Alternatives

Related Issues

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Expose per-turn token usage and cost in the after_llm_call hook payload #2948

Description

Overview

Motivation

Use Cases

Proposed Solution

Alternatives

Related Issues

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions