Skip to content

feat: add Spark commit audit process#4206

Open
andygrove wants to merge 9 commits intoapache:mainfrom
andygrove:skill-audit-spark-commits
Open

feat: add Spark commit audit process#4206
andygrove wants to merge 9 commits intoapache:mainfrom
andygrove:skill-audit-spark-commits

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented May 4, 2026

Which issue does this PR close?

Closes 4188

Rationale for this change

Comet emulates Spark behavior across many subsystems: expressions, the optimizer, Parquet read and write, shuffle, joins, aggregates, and more. When Spark changes behavior on master, Comet may need to follow. Today there is no documented, repeatable process for the community to notice those changes commit-by-commit. This PR introduces that process so the project can stay aware of upstream Spark changes since branch-4.2 was cut and not silently diverge.

The work was scaffolded with the project superpowers:brainstorming skill, with the spec and plan kept on disk only.

What changes are included in this PR?

  • docs/source/contributor-guide/spark_commit_audit.md: human-facing process page with rubric, scope, states, and workflow. Linked from the contributor guide index.
  • dev/spark-commit-audit.md: the audit log itself, populated with the 2 in-scope sql/ commits on apache/spark master since branch-4.2 was cut. Each line carries a short hash, date, state, and subject.
  • dev/regenerate-spark-audit.py: bootstrap and incremental update script. Idempotent; preserves existing verdicts and prose notes by short hash. Reuses the existing dev/release/venv (PyGithub).
  • dev/test_regenerate_spark_audit.py: 15 unit tests over the script's pure helpers (parse_existing_block, format_new_line, is_in_scope, merge_lines, replace_block).
  • .claude/skills/audit-spark-commit/SKILL.md: thin Claude skill that audits one commit per invocation, reads the contributor guide for the rubric, proposes a verdict, and updates the audit log line in place after user review.

How are these changes tested?

  • python3 dev/test_regenerate_spark_audit.py: 15 unit tests over the script's pure functions, all pass.
  • Smoke test via python dev/regenerate-spark-audit.py --dry-run --limit 5, then full bootstrap.
  • Manual idempotency check: edited a populated line, re-ran the script, confirmed the manual edit was preserved by short hash, then reverted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant