feat: add [ LiteLLM AI Gateway ] for provider independence by RheagalFire · Pull Request #186 · braintrustdata/autoevals

Aarish Alam (RheagalFire) · 2026-04-21T12:48:48Z

Summary

Add LiteLLMClient / AsyncLiteLLMClient in py/autoevals/litellm.py: OpenAI-compatible adapters backed by litellm.completion() / litellm.acompletion() (plus embeddings and moderation).
Export both from py/autoevals/init.py so users can do init(client=LiteLLMClient()).
Add litellm to extras_require: install with pip install 'autoevals[litellm]'.
Add py/autoevals/test_litellm.py with 9 mocked unit tests covering chat, embeddings, moderation, async, end-to-end init() wiring, and the Responses-API shim.
Followup commit adds a Responses-API shim in LiteLLMClient.responses.create / AsyncLiteLLMClient.responses.create. Without it, init(client=LiteLLMClient()) with autoevals' default gpt-5-mini model would crash: oai.py routes gpt-5 models through prepare_responses_params which sends input=... and a flat tool schema, but litellm.completion expects messages=... with nested tool schema. The shim translates back.

Fits cleanly into the existing LLMClient architecture (py/autoevals/oai.py:129) which is duck-typed on the OpenAI v1 protocol. The adapter implements that surface; no changes to core.

Changes

py/autoevals/litellm.py: LiteLLMClient / AsyncLiteLLMClient + _LiteLLMResponses adapter that translates Responses-API params (input=, flat tool schema) back to Chat-Completions params (messages=, nested tool schema) before calling litellm.completion.
py/autoevals/init.py: re-exports the new clients.
setup.py: litellm optional extra.
py/autoevals/test_litellm.py: 9 mocked tests (adds coverage for Responses-API shim input→messages translation and flat→nested tool-schema translation).

Testing & Usage

Unit tests (all pass):

  $ pytest py/autoevals/test_litellm.py -v
  py/autoevals/test_litellm.py::test_litellm_client_exposes_openai_v1_surface PASSED                                                                                                                                                                                                                                                                                      
  py/autoevals/test_litellm.py::test_litellm_chat_completions_forwards_to_litellm PASSED                                                                                                                                                                                                                                                                                  
  py/autoevals/test_litellm.py::test_litellm_client_without_api_key_does_not_forward_key PASSED                                                                                                                                                                                                                                                                           
  py/autoevals/test_litellm.py::test_litellm_embeddings_forwards_to_litellm PASSED                                                                                                                                                                                                                                                                                        
  py/autoevals/test_litellm.py::test_litellm_moderations_forwards_to_litellm PASSED                                                                                                                                                                                                                                                                                       
  py/autoevals/test_litellm.py::test_litellm_responses_create_translates_input_to_messages PASSED                                                                                                                                                                                                                                                                         
  py/autoevals/test_litellm.py::test_litellm_responses_create_translates_responses_api_tool_schema PASSED                                                                                                                                                                                                                                                                 
  py/autoevals/test_litellm.py::test_async_litellm_chat_completions_forwards PASSED                                                                                                                                                                                                                                                                                       
  py/autoevals/test_litellm.py::test_init_accepts_litellm_client PASSED                                                                                                                                                                                                                                                                                                   
  ============================== 9 passed in 0.61s ===============================

Live end-to-end smoke test against Azure OpenAI (azure/gpt-4o):

  [Test 1] LiteLLMClient.chat.completions.create, model=azure/gpt-4o                                                                                                                                                                                                                                                                                                      
    response: '4'

  [Test 2] Factuality scorer with init(client=LiteLLMClient())
    score: 0.6
    metadata: {'choice': 'B', 'rationale': 'Step 1: The expert answer states "George Washington." ... Step 3: Therefore, the submitted answer includes the information found in the expert answer and expresses it in a broader form, but remains fully consistent with the expert answer. Conclusion: The submitted answer is a superset of the expert answer and is
  fully consistent with it.'}
  
  [Test 3] Responses-API shim: client.responses.create(input=..., model=azure/gpt-4o)                                                                                                                                                                                                                                                                                     
           (Path autoevals takes for gpt-5 models. Shim translates input=                                                                                                                                                                                                                                                                                                 
           back to messages= before calling litellm.completion.)                                                                                                                                                                                                                                                                                                          
    response: '10'                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                          
  autoevals LiteLLM live test PASSED (chat + scorer + responses-shim).

This exercised three paths. (1) raw chat.completions.create routed to litellm.completion. (2) full scorer path init(client=LiteLLMClient()) → Factuality.eval() → LLMClient.complete → shim → litellm.completion → parsed score with rationale. (3) Responses-API shim with input=... kwarg, which translates to messages=... before reaching LiteLLM (exercises the fix for the default gpt-5-mini routing).

Example usage

from autoevals import init
from autoevals.litellm import LiteLLMClient
from autoevals.llm import Factuality

init(
client=LiteLLMClient(),
default_model="anthropic/claude-3-5-sonnet-20241022",
)

evaluator = Factuality()
result = evaluator.eval(input="...", output="...", expected="...")

init(client=LiteLLMClient(), default_model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0")
init(client=LiteLLMClient(), default_model="gemini/gemini-1.5-pro")
init(client=LiteLLMClient(), default_model="ollama/llama3")

from autoevals.litellm import AsyncLiteLLMClient
init(client=AsyncLiteLLMClient(), default_model="openai/gpt-4o-mini")

Aarish Alam (RheagalFire) · 2026-04-21T16:39:29Z

cc Ankur Goyal (@ankrgyl) Olmo Maldonado (@ibolmo). would like your review here.

Aarish Alam (RheagalFire) · 2026-06-02T19:45:24Z

ekeith (@evanmkeith) do you have any update on this PR?

github-actions · 2026-06-04T20:38:54Z

Braintrust eval report

Autoevals (main-1780605537)

Score	Average	Improvements	Regressions
NumericDiff	76.3% (-1pp)	8 🟢	10 🔴
Time_to_first_token	11.83tok (-0.12tok)	39 🟢	77 🔴
Llm_calls	1.09 (-0.45)	-	100 🔴
Tool_calls	0 (+0)	-	-
Errors	0 (+0)	-	-
Llm_errors	0 (+0)	-	-
Tool_errors	0 (+0)	-	-
Prompt_tokens	308.38tok (-220.04tok)	103 🟢	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_5m_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_1h_tokens	0tok (+0tok)	-	-
Completion_tokens	257.38tok (-226.07tok)	157 🟢	52 🔴
Completion_reasoning_tokens	0tok (-371.2tok)	219 🟢	-
Total_tokens	565.76tok (-446.11tok)	157 🟢	52 🔴
Estimated_cost	0$ (0$)	52 🟢	51 🔴
Duration	11.57s (-0.38s)	64 🟢	152 🔴
Llm_duration	13.1s (-1.15s)	83 🟢	35 🔴

feat: add LiteLLM adapter for provider independence

7304c1a

Aarish Alam (RheagalFire) added 2 commits April 21, 2026 23:16

fix(litellm): responses-API shim for gpt-5 model routing

0268d00

chore: pin litellm to >=1.60,<1.85

26d367d

ekeith (evanmkeith) requested a review from Stephen Belanger (Qard) June 3, 2026 18:54

Stephen Belanger (Qard) approved these changes Jun 4, 2026

View reviewed changes

ekeith (evanmkeith) merged commit 0278eff into braintrustdata:main Jun 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add [ LiteLLM AI Gateway ] for provider independence#186

feat: add [ LiteLLM AI Gateway ] for provider independence#186
ekeith (evanmkeith) merged 3 commits into
braintrustdata:mainfrom
RheagalFire:feat/add-litellm-provider

Aarish Alam (RheagalFire) commented Apr 21, 2026 •

edited

Loading

Uh oh!

Aarish Alam (RheagalFire) commented Apr 21, 2026

Uh oh!

Aarish Alam (RheagalFire) commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Aarish Alam (RheagalFire) commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing & Usage

Example usage

Uh oh!

Aarish Alam (RheagalFire) commented Apr 21, 2026

Uh oh!

Aarish Alam (RheagalFire) commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Aarish Alam (RheagalFire) commented Apr 21, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading