Skip to content

feat(bench): archive release performance reports#150

Merged
acgetchell merged 4 commits into
mainfrom
feat/release-performance-archive
Jun 8, 2026
Merged

feat(bench): archive release performance reports#150
acgetchell merged 4 commits into
mainfrom
feat/release-performance-archive

Conversation

@acgetchell

@acgetchell acgetchell commented Jun 8, 2026

Copy link
Copy Markdown
Owner
  • Add an archive-performance utility that promotes curated benchmark reports into docs/PERFORMANCE.md while archiving prior release comparisons
  • Generate release comparisons in isolated temporary worktrees, including legacy command fallback for published tags
  • Wire release and historical archive recipes into just, Python packaging, and release documentation

Summary by CodeRabbit

  • New Features

    • Integrated performance comparison workflows to generate, promote, and archive benchmark reports for releases, with temporary-worktree and GitHub-asset options.
  • Documentation

    • Updated benchmarking and release docs with step-by-step performance workflows, promotion/archival guidance, and an updated current performance report.
  • Tests

    • Added tests covering report generation, promotion/archival, safety checks, and release-pair resolution.
  • Chores

    • Added a CLI to archive/promote performance reports and removed the ignore for the committed performance report.

- Add an archive-performance utility that promotes curated benchmark reports into docs/PERFORMANCE.md while archiving prior release comparisons
- Generate release comparisons in isolated temporary worktrees, including legacy command fallback for published tags
- Wire release and historical archive recipes into just, Python packaging, and release documentation
@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.23%. Comparing base (aca8a37) to head (7938386).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #150   +/-   ##
=======================================
  Coverage   98.23%   98.23%           
=======================================
  Files           7        7           
  Lines        3228     3228           
=======================================
  Hits         3171     3171           
  Misses         57       57           
Flag Coverage Δ
unittests 98.23% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@acgetchell

Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Full review finished.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0fb98efb-a50d-4273-9388-46ab50777184

📥 Commits

Reviewing files that changed from the base of the PR and between 7258525 and 7938386.

📒 Files selected for processing (2)
  • scripts/archive_performance.py
  • scripts/tests/test_archive_performance.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • scripts/tests/test_archive_performance.py
  • scripts/archive_performance.py

📝 Walkthrough

Walkthrough

Adds a performance comparison release workflow: docs and release steps, a packaged Python CLI (archive-performance), Just recipes for temp-worktree comparisons, a CLI to generate/promote/archive reports safely, and comprehensive pytest coverage.

Changes

Performance Comparison Release Workflow

Layer / File(s) Summary
Documentation and released report updates
CHANGELOG.md, docs/BENCHMARKING.md, docs/RELEASING.md, scripts/README.md, docs/PERFORMANCE.md, docs/archive/performance/*, .gitignore, scripts/bench_compare.py, CONTRIBUTING.md
Updates release and benchmarking docs, adds/commits docs/PERFORMANCE.md and archived reports, removes docs/PERFORMANCE.md from .gitignore, and rewrites generated report “How to Update” guidance to just performance-* commands.
Packaging and Just recipes
pyproject.toml, justfile
Adds archive-performance CLI entry point and archive_performance py-module metadata, and adds performance-github-assets, performance-local, performance-release, plus help text entries.
Archive performance CLI implementation
scripts/archive_performance.py
New CLI implementing data contracts, tag normalization/parsing, GitHub release discovery, safe baseline download/extraction, temp worktree orchestration, tooling-aware benchmark generation, optional diff application, atomic promotion/archival writes, strict promotion validation, CLI parser, and main dispatch.
Archive performance tests and snapshots
scripts/tests/test_archive_performance.py, scripts/tests/test_bench_compare.py
Extensive pytest coverage validating tag parsing/normalization, report-id parsing, GitHub release selection, baseline safety checks, temp-worktree generation/cleanup, promotion/archival semantics and idempotency, atomic replace error handling, legacy vs new tooling paths, and updated generated report snapshot expectations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • acgetchell/la-stack#136: LU/LDLT solve internals and overflow-coordinate changes related to this PR’s preserved-coordinate and LU/LDLT solve adjustments.
  • acgetchell/la-stack#70: Earlier bench_compare baseline/report generation flow that the new just performance-* guidance replaces.
  • acgetchell/la-stack#144: Related release workflow that uploads Criterion baseline assets consumed by --github-assets mode.

Suggested labels

documentation, enhancement, testing

Poem

🐰 I hop through worktrees, tarballs, and tags,
I stitch old reports into tidy archive bags,
I normalize tags and render the run,
I promote the new and tuck the old one.
A cheerful hop — performance is done!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.32% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat(bench): archive release performance reports' directly and clearly summarizes the main change: adding functionality to archive release performance reports in the benchmark system.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/release-performance-archive

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/tests/test_archive_performance.py (1)

81-562: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add explicit type annotations to pytest fixtures and local test doubles.

New test functions and nested fake runners leave parameters untyped (tmp_path, monkeypatch, capsys, args, cwd, etc.), which conflicts with the strict typed-new-code requirement.

Suggested direction
-def test_main_generates_report_in_temp_worktree(tmp_path, monkeypatch, capsys) -> None:
+def test_main_generates_report_in_temp_worktree(
+    tmp_path: Path,
+    monkeypatch: pytest.MonkeyPatch,
+    capsys: pytest.CaptureFixture[str],
+) -> None:
-    def fake_run_git(args, cwd=None, **kwargs):
+    def fake_run_git(args: list[str], cwd: Path | None = None, **kwargs: object) -> object:

As per coding guidelines, "**/*.py: ... Add type hints to new Python code and include type checking in validation (blocking - all code must pass type checks)."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/tests/test_archive_performance.py` around lines 81 - 562, Tests and
nested fake runner functions lack explicit type annotations for pytest fixtures
and local doubles (e.g., tmp_path, monkeypatch, capsys and
fake_run_git/fake_run_safe signatures), violating the project's typed-new-code
rule; add precise type hints: annotate fixtures as tmp_path: Path,
current/paths: Path, monkeypatch: pytest.MonkeyPatch, capsys:
pytest.CaptureFixture[str]; annotate fake runners like fake_run_git(args:
Sequence[str], cwd: Path | None = None, **kwargs: Any) -> object (or
subprocess.CompletedProcess | types.SimpleNamespace) and fake_run_safe(command:
str, args: Sequence[str], cwd: Path | None = None, **kwargs: Any) -> object;
also annotate local variables like calls: list[tuple[str, tuple[str, ...], Path
| None]] and any input_data parameters as str | bytes where appropriate in
functions such as fake_run_git_with_input; update imports (from pathlib import
Path, from typing import Sequence, Any) and adjust return type hints to match
the lightweight result objects used in tests.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/RELEASING.md`:
- Around line 104-122: The insertion of the new "Update the release performance
comparison" step in docs/RELEASING.md shifts numbered steps and breaks external
references; update any docs that link to a specific step number to instead link
to a stable section anchor or heading (e.g., the "Update the release performance
comparison" heading) or change references to call out the unique commands shown
(performance-release, performance-archive-published) and docs/PERFORMANCE.md,
and/or update those links in this PR so they point to the new step numbering
consistently.

---

Outside diff comments:
In `@scripts/tests/test_archive_performance.py`:
- Around line 81-562: Tests and nested fake runner functions lack explicit type
annotations for pytest fixtures and local doubles (e.g., tmp_path, monkeypatch,
capsys and fake_run_git/fake_run_safe signatures), violating the project's
typed-new-code rule; add precise type hints: annotate fixtures as tmp_path:
Path, current/paths: Path, monkeypatch: pytest.MonkeyPatch, capsys:
pytest.CaptureFixture[str]; annotate fake runners like fake_run_git(args:
Sequence[str], cwd: Path | None = None, **kwargs: Any) -> object (or
subprocess.CompletedProcess | types.SimpleNamespace) and fake_run_safe(command:
str, args: Sequence[str], cwd: Path | None = None, **kwargs: Any) -> object;
also annotate local variables like calls: list[tuple[str, tuple[str, ...], Path
| None]] and any input_data parameters as str | bytes where appropriate in
functions such as fake_run_git_with_input; update imports (from pathlib import
Path, from typing import Sequence, Any) and adjust return type hints to match
the lightweight result objects used in tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b271a5fc-99ff-40bf-9ae3-bbd06e08edfc

📥 Commits

Reviewing files that changed from the base of the PR and between aca8a37 and 2817d01.

📒 Files selected for processing (8)
  • CHANGELOG.md
  • docs/BENCHMARKING.md
  • docs/RELEASING.md
  • justfile
  • pyproject.toml
  • scripts/README.md
  • scripts/archive_performance.py
  • scripts/tests/test_archive_performance.py

Comment thread docs/RELEASING.md
- Track the latest curated release comparison in docs/PERFORMANCE.md and archive older comparisons under docs/archive/performance/
- Let performance-archive-published discover the latest stable GitHub release and previous stable baseline automatically
- Generate release comparisons in isolated temporary worktrees, with release-asset restore and local baseline fallback paths
- Update benchmark and release docs to use the scripted workflow instead of manual checkout steps

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/archive_performance.py`:
- Around line 329-337: The code currently derives RUSTUP_TOOLCHAIN from
config.repo_root; change the logic so _benchmark_env uses the actual checkout
directory being benchmarked (pass the checkout Path into _benchmark_env or call
_current_rust_toolchain with the checkout path) so it resolves
rust-toolchain.toml from that detached worktree, not the repo root. Update all
call sites that build the environment for runs (the temp worktree run and the
local baseline fallback) to pass the checkout Path into
_benchmark_env/_current_rust_toolchain (locations that currently call
_benchmark_env or _current_rust_toolchain) so the correct toolchain is set for
historical tags and temporary checkouts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 29346d00-ff3a-4e14-a20b-633bb006e5cb

📥 Commits

Reviewing files that changed from the base of the PR and between 2817d01 and d31e26a.

📒 Files selected for processing (12)
  • .gitignore
  • docs/BENCHMARKING.md
  • docs/PERFORMANCE.md
  • docs/RELEASING.md
  • docs/archive/performance/README.md
  • docs/archive/performance/v0.4.1-vs-v0.4.0.md
  • justfile
  • scripts/README.md
  • scripts/archive_performance.py
  • scripts/bench_compare.py
  • scripts/tests/test_archive_performance.py
  • scripts/tests/test_bench_compare.py
💤 Files with no reviewable changes (1)
  • .gitignore
✅ Files skipped from review due to trivial changes (3)
  • docs/archive/performance/README.md
  • docs/RELEASING.md
  • scripts/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/BENCHMARKING.md
  • justfile

Comment thread scripts/archive_performance.py Outdated
- Add default performance-local and performance-release workflows that infer the relevant release tags and run in temporary worktrees.
- Add a performance-github-assets workflow for comparing stored GitHub Actions release benchmark assets without local cargo runs.
- Normalize release tags before fetching, downloading assets, or checking out detached worktrees.
- Update performance docs, release guidance, and generated report instructions to use the new benchmark workflows.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
scripts/archive_performance.py (2)

568-574: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve rust-toolchain.toml from the worktree being benchmarked.

Similar to the baseline worktree issue, this passes config.repo_root instead of worktree when the benchmarks run in the temp worktree. This could use the wrong Rust toolchain if the worktree's rust-toolchain.toml differs.

 def _run_benchmarks_and_render_report(*, worktree: Path, report: Path, config: GenerationConfig) -> None:
-    benchmark_env = _benchmark_env(config.repo_root)
+    benchmark_env = _benchmark_env(worktree)
     if _has_current_release_signal_tooling(worktree):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/archive_performance.py` around lines 568 - 574, The call to
_benchmark_env in _run_benchmarks_and_render_report incorrectly uses
config.repo_root, which can pick up the wrong rust-toolchain.toml for a
temporary worktree; change the argument so _benchmark_env resolves the toolchain
from the provided worktree (i.e., pass worktree instead of config.repo_root) so
benchmarks run with the worktree's rust-toolchain.toml before invoking _run_tool
and _render_report.

470-486: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve rust-toolchain.toml from the baseline worktree being benchmarked.

This call site still passes repo_root to _benchmark_env, but the benchmark runs inside baseline_worktree. If a historical tag pins a different toolchain, this will use the wrong compiler for the baseline.

-        _run_tool("just", ["bench-save-baseline", baseline_tag], cwd=baseline_worktree, timeout=_BENCH_TIMEOUT_SECONDS, env=_benchmark_env(repo_root))
+        _run_tool("just", ["bench-save-baseline", baseline_tag], cwd=baseline_worktree, timeout=_BENCH_TIMEOUT_SECONDS, env=_benchmark_env(baseline_worktree))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/archive_performance.py` around lines 470 - 486, The benchmark
environment is being created for the current repo_root instead of the detached
baseline worktree, so _generate_release_baseline should resolve
rust-toolchain.toml from the baseline worktree; change the call site that
invokes _benchmark_env (currently _benchmark_env(repo_root)) when calling
_run_tool("just", ["bench-save-baseline", ...], cwd=baseline_worktree) to pass
baseline_worktree (i.e. _benchmark_env(baseline_worktree)) so the historical
tag’s toolchain is used when running the baseline benchmark.
🧹 Nitpick comments (1)
scripts/archive_performance.py (1)

962-964: 💤 Low value

Consider narrowing the exception type or re-raising unexpected errors.

The broad Exception catch provides user-friendly error messages but could mask unexpected programming errors (e.g., TypeError, AttributeError). Since this is a CLI, this trade-off may be acceptable, but you could consider catching specific expected exceptions (e.g., ValueError, RuntimeError, FileNotFoundError, subprocess.CalledProcessError) to surface unexpected bugs during development.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/archive_performance.py` around lines 962 - 964, The current broad
except Exception block in scripts/archive_performance.py (the "except Exception
as exc:" that prints "archive-performance: {exc}" and returns 1) should be
narrowed: replace it with specific exception handlers for expected error types
(e.g., ValueError, RuntimeError, FileNotFoundError,
subprocess.CalledProcessError) and keep the same stderr print/return behavior
for those; for any other unexpected exceptions, re-raise them so they surface
during development (or optionally log and raise). Locate the except block around
the main/entry logic where "return 1" is used and implement per-exception
handlers and a final "except Exception:" that raises the exception instead of
swallowing it.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@scripts/archive_performance.py`:
- Around line 568-574: The call to _benchmark_env in
_run_benchmarks_and_render_report incorrectly uses config.repo_root, which can
pick up the wrong rust-toolchain.toml for a temporary worktree; change the
argument so _benchmark_env resolves the toolchain from the provided worktree
(i.e., pass worktree instead of config.repo_root) so benchmarks run with the
worktree's rust-toolchain.toml before invoking _run_tool and _render_report.
- Around line 470-486: The benchmark environment is being created for the
current repo_root instead of the detached baseline worktree, so
_generate_release_baseline should resolve rust-toolchain.toml from the baseline
worktree; change the call site that invokes _benchmark_env (currently
_benchmark_env(repo_root)) when calling _run_tool("just",
["bench-save-baseline", ...], cwd=baseline_worktree) to pass baseline_worktree
(i.e. _benchmark_env(baseline_worktree)) so the historical tag’s toolchain is
used when running the baseline benchmark.

---

Nitpick comments:
In `@scripts/archive_performance.py`:
- Around line 962-964: The current broad except Exception block in
scripts/archive_performance.py (the "except Exception as exc:" that prints
"archive-performance: {exc}" and returns 1) should be narrowed: replace it with
specific exception handlers for expected error types (e.g., ValueError,
RuntimeError, FileNotFoundError, subprocess.CalledProcessError) and keep the
same stderr print/return behavior for those; for any other unexpected
exceptions, re-raise them so they surface during development (or optionally log
and raise). Locate the except block around the main/entry logic where "return 1"
is used and implement per-exception handlers and a final "except Exception:"
that raises the exception instead of swallowing it.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a3ef3d6-0ab7-40d4-9542-71ee465d7cd2

📥 Commits

Reviewing files that changed from the base of the PR and between d31e26a and 7258525.

📒 Files selected for processing (11)
  • CONTRIBUTING.md
  • docs/BENCHMARKING.md
  • docs/PERFORMANCE.md
  • docs/RELEASING.md
  • docs/archive/performance/v0.4.1-vs-v0.4.0.md
  • justfile
  • scripts/README.md
  • scripts/archive_performance.py
  • scripts/bench_compare.py
  • scripts/tests/test_archive_performance.py
  • scripts/tests/test_bench_compare.py
✅ Files skipped from review due to trivial changes (5)
  • CONTRIBUTING.md
  • docs/RELEASING.md
  • scripts/bench_compare.py
  • docs/PERFORMANCE.md
  • docs/BENCHMARKING.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • docs/archive/performance/v0.4.1-vs-v0.4.0.md
  • scripts/README.md
  • scripts/tests/test_bench_compare.py
  • scripts/tests/test_archive_performance.py

- Limit archive-performance CLI error handling to expected validation, filesystem, subprocess, and runtime failures.
- Let unexpected exceptions propagate so benchmark archiving bugs surface during development.
@acgetchell acgetchell enabled auto-merge June 8, 2026 23:31
@acgetchell acgetchell merged commit 9a869fc into main Jun 8, 2026
17 checks passed
@acgetchell acgetchell deleted the feat/release-performance-archive branch June 8, 2026 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant