Skip to content

[bot] Cohere: Chat tool-use responses lack child TOOL spans #377

@braintrust-bot

Description

@braintrust-bot

Summary

The Cohere integration captures tool_calls in the LLM span's output dictionary but does not create child SpanTypeAttribute.TOOL spans for individual tool calls. This is a tracing depth gap compared to the OpenAI, Anthropic, and Google GenAI integrations, which all decompose tool-use responses into dedicated child tool spans.

Cohere's Chat API (both v1 and v2) supports tool use with parallel calling, multi-step tool reasoning, and citations. When a model response includes tool calls, users currently see them only as opaque entries in the LLM span's output.tool_calls array — they cannot drill into individual tool invocations as separate spans in the Braintrust UI.

What is missing

The Cohere tracing module (py/src/braintrust/integrations/cohere/tracing.py) accumulates tool calls into the output dictionary but never creates child spans:

Non-streaming path (line ~270):

tool_calls = _get(result, "tool_calls")
if tool_calls:
    output = {
        "role": role,
        "content": text,
        "tool_calls": tool_calls,  # Stored flat in LLM span output
    }

Streaming path (lines ~395-537): _upsert_tool_calls() merges streaming tool call deltas into a tool_calls_by_index dict, then stores them as output["tool_calls"] — again, flat in the LLM span.

Span creation (line ~550): Only SpanTypeAttribute.LLM spans are ever created. No SpanTypeAttribute.TOOL child spans exist anywhere in the file.

Comparison with other integrations in this repo

Integration Tool call handling Child TOOL spans?
OpenAI (Chat Completions) _log_response_tool_spans() in tracing.py Yes
OpenAI (Responses API) _log_response_tool_spans() in tracing.py Yes
Anthropic _log_server_tool_spans() in tracing.py Yes
Google GenAI _finalize_interaction_tool_spans() in tracing.py Yes
Cohere tool_calls stored in LLM span output dict No
Mistral tool_calls stored in LLM span output dict No (tracked separately)

What child TOOL spans should capture

For each tool call in the response:

  • Span name: Tool function name (e.g. tool: search_documents)
  • Span type: SpanTypeAttribute.TOOL
  • Input: Tool call arguments / parameters
  • Output: (empty for client-side tool calls, populated for server-side)
  • Metadata: tool_call_id, tool type

This applies to both v1 (client.chat() / client.chat_stream()) and v2 (client.v2.chat() / client.v2.chat_stream()) tool-use responses.

Braintrust docs status

not_found — Cohere is not mentioned on the Braintrust integrations directory regarding tool span decomposition.

Upstream sources

  • Cohere Tool Use overview: https://docs.cohere.com/docs/tool-use-overview
  • Cohere supports parallel tool calling, multi-step tool use, and citation generation from tool results
  • Tool calls in responses include id, type, function.name, function.arguments

Local files inspected

  • py/src/braintrust/integrations/cohere/tracing.py — no SpanTypeAttribute.TOOL usage; tool calls stored flat in output dict
  • py/src/braintrust/integrations/cohere/patchers.py — Chat, ChatStream, Embed, Rerank patchers defined; no tool-span-related patchers
  • py/src/braintrust/integrations/openai/tracing.py_log_response_tool_spans() creates child TOOL spans (for comparison)
  • py/src/braintrust/integrations/anthropic/tracing.py_log_server_tool_spans() creates child TOOL spans (for comparison)
  • py/src/braintrust/integrations/google_genai/tracing.py_finalize_interaction_tool_spans() creates child TOOL spans (for comparison)

Metadata

Metadata

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions