vortex-data · AdamGS · Jun 8, 2026 · Jun 8, 2026 · Jun 8, 2026 · Jun 8, 2026
diff --git a/rfcs/0061-ci-2-0.md b/rfcs/0061-ci-2-0.md
@@ -0,0 +1,82 @@
+- Start Date: 2026-06-08
+- Authors: @AdamGS
+- RFC PR: [vortex-data/rfcs#61](https://github.com/vortex-data/rfcs/pull/61)
+
+# CI 2.0
+
+## Summary
+
+This RFC intends to propose a cohesive architecture for the project's CI and workflow system. There have been many conversations (both public and private) on this topic,
+and I would like to try and propose a holistic vision that can be discussed and iterated in a public forum.
+
+I've tried to include the current state of the project to the best of my understanding, and the issues I see that motivate my proposed changes.
+
+## Current state
+
+The current solution includes (but is not limited) to CI, publishing and benchmarking. It is fully implemented using Github Actions workflows and apps, and relies on
+a mix [runs-on](https://runs-on.com/) powered self-hosted runners and GitHub hosted runners.
+
+### Permission Model
+
+There are four different scopes of permissions:
+
+1. Admins - very limited group of users that can manage the organization-wide settings, and the core repo settings.
+2. Maintainers - All TSC members, can push branches and open pull requests directly from within the core repo, and approve and merge pull requests.
+   CI runs automatically for them, and they can trigger CI for external contributions.
+3. Members - Non-TSC members that are explicitly trusted. They have the same development experience a maintainer has, aside from the ability to approve and merge pull requests.
+4. Everyone else - have to open PRs using a fork in a classic GH flow. Requires a maintainer to run any actions-based workflow, label their PR etc.
+
+Everyone on the maintainers group is automatically subscribed to every repo event, which creates a lot of noise in both emails and notifications. The distinction between the two
+groups is hard to communicate externally.
+
+### Testing and CI
+
+Our current test suite includes many workflows, I think they can be grouped into the following categories:
+
+1. Linters - these include rustfmt, clippy, cargo-deny and other language-specific tooling.
+2. Core tests - the full test suite, coverage tests
+3. Integration-focused tests - Python, Java/JNI, SLT, C/C++
+4. Environment tests - making sure Vortex works on various environments like wasm and Windows.
+
+### Benchmarking
+
+Benchmarks run on dedicated runs-on runners, orchestrated by Github, they run both on a nightly schedule, on every commit and can be triggered as-required by a maintainer
+labeling the PR.
+
+Any user that can run CI has potential unbounded access to the AWS account, allowing them (or a 3rd party that took control of their account) to hijack resources.
+
+### Versioning and publishing
+
+Every pull request must include a label indicating the sort of change it includes (feature, break, fix, chore, performance, ci or skip).
+Those changes are accumulated into a draft release. What that draft is published, a workflow is triggers that gradually publishes all rust crates
+and all language or framework bindings in order. Those include:
+
+1. Rust crates (including integrations like `vortex-datafusion`).
+2. Python package
+3. JNI bindings
+4. Spark bindings
+
+Our versioning is tightly coupled between the rust implementation and external integrations. Fixing any issue in the Java bindings or DataFusion integration
+requires a full release, which might include many unrelated changes. It also means that changes in the periphery of the project gets included in the changelog AND
+the versioning scheme for the core of the project.
+
+This also makes the meaning of the version itself murky. We currently vaguely follow semantic versioning, but we effectively bump or minor version on every release, which means that
+a breaking change is always acceptable.
+
+We also don't have clear and consistent definitions of what each kind of changes means, making the changelog inconsistent and somewhat confusing.
+
+## Suggested changes
+
+### Permissions
+
+I suggest we get rid of the two-tier member status, moving the GitHub's built-in permission model that allows users that contributed to the project in the past to run CI by-default, which will make the committers group redundant.
+
+Their CI flow will not run on the project's runs-on infrastructure, which we can limit by making CI check `github.event.pull_request.head.repo.full_name` instead of `github.repository`. We can further reduce the risk of external contributors by adding more restrictions attempted changes to the `.github` directory.
+
+### Benchmarks Bot
+
+For benchmarks, we'll create a new dedicated bot that will have a limited list of allowed users that can trigger benchmark run using a comment (or any other preferred control flow). The bot will be hosted in a different repo, so the surface area for runs-on for external contributors is reduced. Hosting the bot externally will also make it easier to have it run on PRs coming from forks, and it'll be easier to give it a more flexible scheduling system and more powerful permissions.
+
+### Versioning
+
+I suggest we move some of the binding into their own