diff --git a/rfcs/0061-ci-2-0.md b/rfcs/0061-ci-2-0.md new file mode 100644 index 0000000..8fa665f --- /dev/null +++ b/rfcs/0061-ci-2-0.md @@ -0,0 +1,82 @@ +- Start Date: 2026-06-08 +- Authors: @AdamGS +- RFC PR: [vortex-data/rfcs#61](https://github.com/vortex-data/rfcs/pull/61) + +# CI 2.0 + +## Summary + +This RFC intends to propose a cohesive architecture for the project's CI and workflow system. There have been many conversations (both public and private) on this topic, +and I would like to try and propose a holistic vision that can be discussed and iterated in a public forum. + +I've tried to include the current state of the project to the best of my understanding, and the issues I see that motivate my proposed changes. + +## Current state + +The current solution includes (but is not limited) to CI, publishing and benchmarking. It is fully implemented using Github Actions workflows and apps, and relies on +a mix [runs-on](https://runs-on.com/) powered self-hosted runners and GitHub hosted runners. + +### Permission Model + +There are four different scopes of permissions: + +1. Admins - very limited group of users that can manage the organization-wide settings, and the core repo settings. +2. Maintainers - All TSC members, can push branches and open pull requests directly from within the core repo, and approve and merge pull requests. + CI runs automatically for them, and they can trigger CI for external contributions. +3. Members - Non-TSC members that are explicitly trusted. They have the same development experience a maintainer has, aside from the ability to approve and merge pull requests. +4. Everyone else - have to open PRs using a fork in a classic GH flow. Requires a maintainer to run any actions-based workflow, label their PR etc. + +Everyone on the maintainers group is automatically subscribed to every repo event, which creates a lot of noise in both emails and notifications. The distinction between the two +groups is hard to communicate externally. + +### Testing and CI + +Our current test suite includes many workflows, I think they can be grouped into the following categories: + +1. Linters - these include rustfmt, clippy, cargo-deny and other language-specific tooling. +2. Core tests - the full test suite, coverage tests +3. Integration-focused tests - Python, Java/JNI, SLT, C/C++ +4. Environment tests - making sure Vortex works on various environments like wasm and Windows. + +### Benchmarking + +Benchmarks run on dedicated runs-on runners, orchestrated by Github, they run both on a nightly schedule, on every commit and can be triggered as-required by a maintainer +labeling the PR. + +Any user that can run CI has potential unbounded access to the AWS account, allowing them (or a 3rd party that took control of their account) to hijack resources. + +### Versioning and publishing + +Every pull request must include a label indicating the sort of change it includes (feature, break, fix, chore, performance, ci or skip). +Those changes are accumulated into a draft release. What that draft is published, a workflow is triggers that gradually publishes all rust crates +and all language or framework bindings in order. Those include: + +1. Rust crates (including integrations like `vortex-datafusion`). +2. Python package +3. JNI bindings +4. Spark bindings + +Our versioning is tightly coupled between the rust implementation and external integrations. Fixing any issue in the Java bindings or DataFusion integration +requires a full release, which might include many unrelated changes. It also means that changes in the periphery of the project gets included in the changelog AND +the versioning scheme for the core of the project. + +This also makes the meaning of the version itself murky. We currently vaguely follow semantic versioning, but we effectively bump or minor version on every release, which means that +a breaking change is always acceptable. + +We also don't have clear and consistent definitions of what each kind of changes means, making the changelog inconsistent and somewhat confusing. + +## Suggested changes + +### Permissions + +I suggest we get rid of the two-tier member status, moving the GitHub's built-in permission model that allows users that contributed to the project in the past to run CI by-default, which will make the committers group redundant. + +Their CI flow will not run on the project's runs-on infrastructure, which we can limit by making CI check `github.event.pull_request.head.repo.full_name` instead of `github.repository`. We can further reduce the risk of external contributors by adding more restrictions attempted changes to the `.github` directory. + +### Benchmarks Bot + +For benchmarks, we'll create a new dedicated bot that will have a limited list of allowed users that can trigger benchmark run using a comment (or any other preferred control flow). The bot will be hosted in a different repo, so the surface area for runs-on for external contributors is reduced. Hosting the bot externally will also make it easier to have it run on PRs coming from forks, and it'll be easier to give it a more flexible scheduling system and more powerful permissions. + +### Versioning + +I suggest we move some of the binding into their own