Skip to content

Add --matrix-param argument to eval#141

Merged
Nathan Selvidge (nselvidge) merged 4 commits into
mainfrom
nate/matrix-params
Apr 28, 2026
Merged

Add --matrix-param argument to eval#141
Nathan Selvidge (nselvidge) merged 4 commits into
mainfrom
nate/matrix-params

Conversation

@nselvidge
Copy link
Copy Markdown
Contributor

@nselvidge Nathan Selvidge (nselvidge) commented Apr 23, 2026

Adds the --matrix-param flag to bt eval

will create all combinations of arguments when multiple flags used.

--matrix-param hashes are appended to eval name and experiment name to ensure uniqueness and make it easier to differentiate between multiple experiments in the CLI and links.

 framework-evals git:(main) ✗ bt eval rewrite.eval.ts --matrix-param model=claude-haiku-4-5,claude-sonnet-4-6 --matrix-param includeSkill=true,false --sample 1
Processing 4 evaluators...
▶ Experiment Rewrite test [model="claude-sonnet-4-6", includeSkill=false] is running at https://www.braintrust.dev/app/braintrustdata.com/p/Rewrite%20Evals/experiments/Rewrite%20test%20%5Bmodel%3D%22claude-sonnet-4-6%22%2C%20includeSkill%3Dfalse%5D
▶ Experiment Rewrite test [model="claude-haiku-4-5", includeSkill=true] is running at https://www.braintrust.dev/app/braintrustdata.com/p/Rewrite%20Evals/experiments/Rewrite%20test%20%5Bmodel%3D%22claude-haiku-4-5%22%2C%20includeSkill%3Dtrue%5D
▶ Experiment Rewrite test [model="claude-haiku-4-5", includeSkill=false] is running at https://www.braintrust.dev/app/braintrustdata.com/p/Rewrite%20Evals/experiments/Rewrite%20test%20%5Bmodel%3D%22claude-haiku-4-5%22%2C%20includeSkill%3Dfalse%5D
▶ Experiment Rewrite test [model="claude-sonnet-4-6", includeSkill=true] is running at https://www.braintrust.dev/app/braintrustdata.com/p/Rewrite%20Evals/experiments/Rewrite%20test%20%5Bmodel%3D%22claude-sonnet-4-6%22%2C%20includeSkill%3Dtrue%5D
⠁ Rewrite Evals [exp...includeSkill=false]
⠁ Rewrite Evals [exp... includeSkill=true]
⠁ Rewrite Evals [exp...includeSkill=false]
⠁ Rewrite Evals [exp... includeSkill=true]  

If the value contains a comma, must use a JSON string containing an array

framework-evals git:(main) ✗ bt eval rewrite.eval.ts --matrix-param model="[\"claude-haiku-4-5\",\"claude-sonnet-4-6\"]"  --sample 1                                    
Processing 2 evaluators...
▶ Experiment Rewrite test [model="claude-sonnet-4-6"] is running at https://www.braintrust.dev/app/braintrustdata.com/p/Rewrite%20Evals/experiments/Rewrite%20test%20%5Bmodel%3D%22claude-sonnet-4-6%22%5D
▶ Experiment Rewrite test [model="claude-haiku-4-5"] is running at https://www.braintrust.dev/app/braintrustdata.com/p/Rewrite%20Evals/experiments/Rewrite%20test%20%5Bmodel%3D%22claude-haiku-4-5%22%5D
⠁ Rewrite Evals [exp...claude-sonnet-4-6"]
⠁ Rewrite Evals [exp..."claude-haiku-4-5"]                 

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

Latest downloadable build artifacts for this PR commit a95ddd60198a:

Available artifact names
  • ``artifacts-build-global
  • ``artifacts-build-local-x86_64-apple-darwin
  • ``artifacts-build-local-x86_64-pc-windows-msvc
  • ``artifacts-build-local-x86_64-unknown-linux-musl
  • ``artifacts-build-local-x86_64-unknown-linux-gnu
  • ``artifacts-build-local-aarch64-apple-darwin
  • ``artifacts-build-local-aarch64-unknown-linux-gnu
  • ``artifacts-plan-dist-manifest
  • ``cargo-dist-cache

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tests/evals/js/eval-matrix-param-terminate/ needs a fixture.json.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops! added

Comment thread scripts/eval-runner.ts Outdated
});
}

if (btEvalMains.length > 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need to honour the matrix on this path too? We should alteast print a warning if someone tries to use a matrix and they fall back into this path.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks this is a good callout, I didn't even realize this existed. as I was thinking through how we would support it if felt like it would be a lot of added complexity. I think we can add this sort of support if needed but to keep things simple while nobody is using this feature I think we can ship without support for params on btEvalMain and add support if we get feedback from users. updated the PR to fail and log an error message if using params with btEvalMain

@nselvidge Nathan Selvidge (nselvidge) merged commit 66d95b8 into main Apr 28, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants