feat(bench): archive release performance reports#150
Conversation
- Add an archive-performance utility that promotes curated benchmark reports into docs/PERFORMANCE.md while archiving prior release comparisons - Generate release comparisons in isolated temporary worktrees, including legacy command fallback for published tags - Wire release and historical archive recipes into just, Python packaging, and release documentation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #150 +/- ##
=======================================
Coverage 98.23% 98.23%
=======================================
Files 7 7
Lines 3228 3228
=======================================
Hits 3171 3171
Misses 57 57
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
@coderabbitai full review |
✅ Action performedFull review finished. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds a performance comparison release workflow: docs and release steps, a packaged Python CLI ( ChangesPerformance Comparison Release Workflow
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/tests/test_archive_performance.py (1)
81-562:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd explicit type annotations to pytest fixtures and local test doubles.
New test functions and nested fake runners leave parameters untyped (
tmp_path,monkeypatch,capsys,args,cwd, etc.), which conflicts with the strict typed-new-code requirement.Suggested direction
-def test_main_generates_report_in_temp_worktree(tmp_path, monkeypatch, capsys) -> None: +def test_main_generates_report_in_temp_worktree( + tmp_path: Path, + monkeypatch: pytest.MonkeyPatch, + capsys: pytest.CaptureFixture[str], +) -> None:- def fake_run_git(args, cwd=None, **kwargs): + def fake_run_git(args: list[str], cwd: Path | None = None, **kwargs: object) -> object:As per coding guidelines, "
**/*.py: ... Add type hints to new Python code and include type checking in validation (blocking - all code must pass type checks)."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/tests/test_archive_performance.py` around lines 81 - 562, Tests and nested fake runner functions lack explicit type annotations for pytest fixtures and local doubles (e.g., tmp_path, monkeypatch, capsys and fake_run_git/fake_run_safe signatures), violating the project's typed-new-code rule; add precise type hints: annotate fixtures as tmp_path: Path, current/paths: Path, monkeypatch: pytest.MonkeyPatch, capsys: pytest.CaptureFixture[str]; annotate fake runners like fake_run_git(args: Sequence[str], cwd: Path | None = None, **kwargs: Any) -> object (or subprocess.CompletedProcess | types.SimpleNamespace) and fake_run_safe(command: str, args: Sequence[str], cwd: Path | None = None, **kwargs: Any) -> object; also annotate local variables like calls: list[tuple[str, tuple[str, ...], Path | None]] and any input_data parameters as str | bytes where appropriate in functions such as fake_run_git_with_input; update imports (from pathlib import Path, from typing import Sequence, Any) and adjust return type hints to match the lightweight result objects used in tests.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/RELEASING.md`:
- Around line 104-122: The insertion of the new "Update the release performance
comparison" step in docs/RELEASING.md shifts numbered steps and breaks external
references; update any docs that link to a specific step number to instead link
to a stable section anchor or heading (e.g., the "Update the release performance
comparison" heading) or change references to call out the unique commands shown
(performance-release, performance-archive-published) and docs/PERFORMANCE.md,
and/or update those links in this PR so they point to the new step numbering
consistently.
---
Outside diff comments:
In `@scripts/tests/test_archive_performance.py`:
- Around line 81-562: Tests and nested fake runner functions lack explicit type
annotations for pytest fixtures and local doubles (e.g., tmp_path, monkeypatch,
capsys and fake_run_git/fake_run_safe signatures), violating the project's
typed-new-code rule; add precise type hints: annotate fixtures as tmp_path:
Path, current/paths: Path, monkeypatch: pytest.MonkeyPatch, capsys:
pytest.CaptureFixture[str]; annotate fake runners like fake_run_git(args:
Sequence[str], cwd: Path | None = None, **kwargs: Any) -> object (or
subprocess.CompletedProcess | types.SimpleNamespace) and fake_run_safe(command:
str, args: Sequence[str], cwd: Path | None = None, **kwargs: Any) -> object;
also annotate local variables like calls: list[tuple[str, tuple[str, ...], Path
| None]] and any input_data parameters as str | bytes where appropriate in
functions such as fake_run_git_with_input; update imports (from pathlib import
Path, from typing import Sequence, Any) and adjust return type hints to match
the lightweight result objects used in tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b271a5fc-99ff-40bf-9ae3-bbd06e08edfc
📒 Files selected for processing (8)
CHANGELOG.mddocs/BENCHMARKING.mddocs/RELEASING.mdjustfilepyproject.tomlscripts/README.mdscripts/archive_performance.pyscripts/tests/test_archive_performance.py
- Track the latest curated release comparison in docs/PERFORMANCE.md and archive older comparisons under docs/archive/performance/ - Let performance-archive-published discover the latest stable GitHub release and previous stable baseline automatically - Generate release comparisons in isolated temporary worktrees, with release-asset restore and local baseline fallback paths - Update benchmark and release docs to use the scripted workflow instead of manual checkout steps
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/archive_performance.py`:
- Around line 329-337: The code currently derives RUSTUP_TOOLCHAIN from
config.repo_root; change the logic so _benchmark_env uses the actual checkout
directory being benchmarked (pass the checkout Path into _benchmark_env or call
_current_rust_toolchain with the checkout path) so it resolves
rust-toolchain.toml from that detached worktree, not the repo root. Update all
call sites that build the environment for runs (the temp worktree run and the
local baseline fallback) to pass the checkout Path into
_benchmark_env/_current_rust_toolchain (locations that currently call
_benchmark_env or _current_rust_toolchain) so the correct toolchain is set for
historical tags and temporary checkouts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 29346d00-ff3a-4e14-a20b-633bb006e5cb
📒 Files selected for processing (12)
.gitignoredocs/BENCHMARKING.mddocs/PERFORMANCE.mddocs/RELEASING.mddocs/archive/performance/README.mddocs/archive/performance/v0.4.1-vs-v0.4.0.mdjustfilescripts/README.mdscripts/archive_performance.pyscripts/bench_compare.pyscripts/tests/test_archive_performance.pyscripts/tests/test_bench_compare.py
💤 Files with no reviewable changes (1)
- .gitignore
✅ Files skipped from review due to trivial changes (3)
- docs/archive/performance/README.md
- docs/RELEASING.md
- scripts/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/BENCHMARKING.md
- justfile
- Add default performance-local and performance-release workflows that infer the relevant release tags and run in temporary worktrees. - Add a performance-github-assets workflow for comparing stored GitHub Actions release benchmark assets without local cargo runs. - Normalize release tags before fetching, downloading assets, or checking out detached worktrees. - Update performance docs, release guidance, and generated report instructions to use the new benchmark workflows.
There was a problem hiding this comment.
♻️ Duplicate comments (2)
scripts/archive_performance.py (2)
568-574:⚠️ Potential issue | 🟠 Major | ⚡ Quick winResolve
rust-toolchain.tomlfrom the worktree being benchmarked.Similar to the baseline worktree issue, this passes
config.repo_rootinstead ofworktreewhen the benchmarks run in the temp worktree. This could use the wrong Rust toolchain if the worktree'srust-toolchain.tomldiffers.def _run_benchmarks_and_render_report(*, worktree: Path, report: Path, config: GenerationConfig) -> None: - benchmark_env = _benchmark_env(config.repo_root) + benchmark_env = _benchmark_env(worktree) if _has_current_release_signal_tooling(worktree):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/archive_performance.py` around lines 568 - 574, The call to _benchmark_env in _run_benchmarks_and_render_report incorrectly uses config.repo_root, which can pick up the wrong rust-toolchain.toml for a temporary worktree; change the argument so _benchmark_env resolves the toolchain from the provided worktree (i.e., pass worktree instead of config.repo_root) so benchmarks run with the worktree's rust-toolchain.toml before invoking _run_tool and _render_report.
470-486:⚠️ Potential issue | 🟠 Major | ⚡ Quick winResolve
rust-toolchain.tomlfrom the baseline worktree being benchmarked.This call site still passes
repo_rootto_benchmark_env, but the benchmark runs insidebaseline_worktree. If a historical tag pins a different toolchain, this will use the wrong compiler for the baseline.- _run_tool("just", ["bench-save-baseline", baseline_tag], cwd=baseline_worktree, timeout=_BENCH_TIMEOUT_SECONDS, env=_benchmark_env(repo_root)) + _run_tool("just", ["bench-save-baseline", baseline_tag], cwd=baseline_worktree, timeout=_BENCH_TIMEOUT_SECONDS, env=_benchmark_env(baseline_worktree))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/archive_performance.py` around lines 470 - 486, The benchmark environment is being created for the current repo_root instead of the detached baseline worktree, so _generate_release_baseline should resolve rust-toolchain.toml from the baseline worktree; change the call site that invokes _benchmark_env (currently _benchmark_env(repo_root)) when calling _run_tool("just", ["bench-save-baseline", ...], cwd=baseline_worktree) to pass baseline_worktree (i.e. _benchmark_env(baseline_worktree)) so the historical tag’s toolchain is used when running the baseline benchmark.
🧹 Nitpick comments (1)
scripts/archive_performance.py (1)
962-964: 💤 Low valueConsider narrowing the exception type or re-raising unexpected errors.
The broad
Exceptioncatch provides user-friendly error messages but could mask unexpected programming errors (e.g.,TypeError,AttributeError). Since this is a CLI, this trade-off may be acceptable, but you could consider catching specific expected exceptions (e.g.,ValueError,RuntimeError,FileNotFoundError,subprocess.CalledProcessError) to surface unexpected bugs during development.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/archive_performance.py` around lines 962 - 964, The current broad except Exception block in scripts/archive_performance.py (the "except Exception as exc:" that prints "archive-performance: {exc}" and returns 1) should be narrowed: replace it with specific exception handlers for expected error types (e.g., ValueError, RuntimeError, FileNotFoundError, subprocess.CalledProcessError) and keep the same stderr print/return behavior for those; for any other unexpected exceptions, re-raise them so they surface during development (or optionally log and raise). Locate the except block around the main/entry logic where "return 1" is used and implement per-exception handlers and a final "except Exception:" that raises the exception instead of swallowing it.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@scripts/archive_performance.py`:
- Around line 568-574: The call to _benchmark_env in
_run_benchmarks_and_render_report incorrectly uses config.repo_root, which can
pick up the wrong rust-toolchain.toml for a temporary worktree; change the
argument so _benchmark_env resolves the toolchain from the provided worktree
(i.e., pass worktree instead of config.repo_root) so benchmarks run with the
worktree's rust-toolchain.toml before invoking _run_tool and _render_report.
- Around line 470-486: The benchmark environment is being created for the
current repo_root instead of the detached baseline worktree, so
_generate_release_baseline should resolve rust-toolchain.toml from the baseline
worktree; change the call site that invokes _benchmark_env (currently
_benchmark_env(repo_root)) when calling _run_tool("just",
["bench-save-baseline", ...], cwd=baseline_worktree) to pass baseline_worktree
(i.e. _benchmark_env(baseline_worktree)) so the historical tag’s toolchain is
used when running the baseline benchmark.
---
Nitpick comments:
In `@scripts/archive_performance.py`:
- Around line 962-964: The current broad except Exception block in
scripts/archive_performance.py (the "except Exception as exc:" that prints
"archive-performance: {exc}" and returns 1) should be narrowed: replace it with
specific exception handlers for expected error types (e.g., ValueError,
RuntimeError, FileNotFoundError, subprocess.CalledProcessError) and keep the
same stderr print/return behavior for those; for any other unexpected
exceptions, re-raise them so they surface during development (or optionally log
and raise). Locate the except block around the main/entry logic where "return 1"
is used and implement per-exception handlers and a final "except Exception:"
that raises the exception instead of swallowing it.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1a3ef3d6-0ab7-40d4-9542-71ee465d7cd2
📒 Files selected for processing (11)
CONTRIBUTING.mddocs/BENCHMARKING.mddocs/PERFORMANCE.mddocs/RELEASING.mddocs/archive/performance/v0.4.1-vs-v0.4.0.mdjustfilescripts/README.mdscripts/archive_performance.pyscripts/bench_compare.pyscripts/tests/test_archive_performance.pyscripts/tests/test_bench_compare.py
✅ Files skipped from review due to trivial changes (5)
- CONTRIBUTING.md
- docs/RELEASING.md
- scripts/bench_compare.py
- docs/PERFORMANCE.md
- docs/BENCHMARKING.md
🚧 Files skipped from review as they are similar to previous changes (4)
- docs/archive/performance/v0.4.1-vs-v0.4.0.md
- scripts/README.md
- scripts/tests/test_bench_compare.py
- scripts/tests/test_archive_performance.py
- Limit archive-performance CLI error handling to expected validation, filesystem, subprocess, and runtime failures. - Let unexpected exceptions propagate so benchmark archiving bugs surface during development.
Summary by CodeRabbit
New Features
Documentation
Tests
Chores