Skip to content

commit-reach: terminate merge-base walk when one paint side is exhausted#2149

Draft
spkrka wants to merge 6 commits into
gitgitgadget:masterfrom
spkrka:side-exhaust-pr
Draft

commit-reach: terminate merge-base walk when one paint side is exhausted#2149
spkrka wants to merge 6 commits into
gitgitgadget:masterfrom
spkrka:side-exhaust-pr

Conversation

@spkrka

@spkrka spkrka commented Jun 13, 2026

Copy link
Copy Markdown

paint_down_to_common() computes merge bases by walking the commit
graph backwards from two sets of tips, painting commits with PARENT1
and PARENT2 flags. The walk currently terminates when no non-stale
commits remain in the queue.

This termination condition forces the walk to visit commits that
cannot contribute new merge-base candidates. A new merge base can
only appear when a path carrying only PARENT1 paint meets a path
carrying only PARENT2 paint. Once one side has no exclusive commits
remaining in the queue, no new merge base can form, and the walk
can stop.

This is particularly visible in repositories that use merge
commits. Each merge in the first-parent history introduces a side
branch whose ancestry is painted with only one side. The existing
termination condition keeps the walk alive until that entire
ancestry is drained, even though the merge base was found long ago.

The same effect is triggered by repository imports. In a large
monorepo, importing a project brings in a separate history with its
own root at generation zero. Any merge-base query that straddles
an import boundary must then drain all the way to that root.

This was discussed in an earlier RFC thread [1], where Derrick
Stolee confirmed the correctness of the per-side approach and
provided the "release branch" shape as another motivating example.
Elijah Newren independently discovered the same optimization and
shared his implementation in PR #2150.

How it works

This series replaces the single max_nonstale pointer with a
paint_queue struct containing per-side integer counters (p1_count,
p2_count, pending_merge_bases). A centralized
paint_count_transition() function maintains the counters on every
flag change.

The walk terminates early when all three conditions hold:

(a) the walk has entered the finite-generation region
(generation < GENERATION_NUMBER_INFINITY),
(b) no pending merge-base candidates remain in the queue, and
(c) one side's exclusive count has reached zero.

Condition (a) is essential: in the finite-generation region,
generation ordering guarantees topological traversal, so paint on
already-visited commits is final. In the INFINITY region (commits
not in the commit-graph), commit-date ordering can violate this, so
the optimization is disabled and the walk falls back to the existing
termination condition.

Patch organization

1/6 commit-reach: introduce struct paint_queue with per-side counters
2/6 commit-reach: terminate merge-base walk when one paint side is exhausted
3/6 t6099: test merge-base with ancestor among candidates
4/6 t6600: add test cases for side-exhaustion edge cases
5/6 p6012: add perf test for merge-base with deep side branch
6/6 Documentation/technical: add paint-down-to-common.adoc

Patch 4 includes test cases from Elijah Newren's branch, with
his authorship and Signed-off-by preserved.

Benchmarks

Measured on a 2.6M-commit monorepo with commit-graph. Baseline
is v2.55-rc1.

Import boundary scenario: merge-base across a repository import
where the imported history has its own root at generation zero.

merge-base --all ABOVE BELOW_IMPORT     4.293s ->    8ms  (537x)
merge-base --all HEAD  BELOW_IMPORT     5.590s ->   85ms   (66x)
merge-tree       ABOVE BELOW_IMPORT     5.345s ->   13ms  (411x)

Merge-heavy history: merge-base --all between commits on the
same first-parent line, where merge commits create many one-sided
subtrees.

merge-base --all HEAD HEAD~1000         5.404s ->    7ms  (772x)
merge-base --all HEAD HEAD~5000         5.310s ->   14ms  (379x)

No regression for single merge-base and ancestry checks:

merge-base       HEAD HEAD~100           7ms ->    9ms   (~1x)
merge-base       HEAD HEAD~1000          7ms ->   11ms   (~1x)
merge-base --is-ancestor ~1000 HEAD      9ms ->    8ms   (~1x)

[1] https://lore.kernel.org/git/CAL71e4Ps-2_0+uuZu43N9pFnXBemoAohPs_eyRJf8taXHJPAXQ@mail.gmail.com/T/#u

@spkrka spkrka force-pushed the side-exhaust-pr branch 2 times, most recently from c6e85dd to bf8f525 Compare June 14, 2026 11:14
spkrka and others added 6 commits June 18, 2026 14:35
Replace the nonstale_queue abstraction in paint_down_to_common() with
a new paint_queue struct that tracks per-side commit counts. Each
non-stale queued commit occupies exactly one counter bucket based on
its paint flags: PARENT1-only, PARENT2-only, or both sides (a pending
merge-base candidate).

The counters are maintained by paint_count_transition() which handles
all flag changes as bucket transfers: remove from the old bucket, add
to the new one. Either step is a no-op when the respective state has
no bucket (stale or zero).

The loop condition changes from pointer-based (max_nonstale) to
counter-based (while any counter is positive), which is equivalent.

ahead_behind() is decoupled to use its own local tracking, since it
does not need per-side counters. No behavior change.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add an early termination check to paint_down_to_common() using the
per-side counters introduced in the previous commit. Once the walk
enters the finite-generation region, terminate early when one side's
exclusive count drops to zero -- no new merge-base can form without
both paint sides meeting.

The check also waits for pending_merge_bases to reach zero, ensuring
all merge-base candidates have been popped and recorded before
exiting. This is necessary for FIND_ALL to return all merge bases
in criss-cross merge topologies.

The INFINITY gate ensures correctness: commits without a commit-graph
entry have GENERATION_NUMBER_INFINITY and are ordered by commit date,
which is not topologically reliable. The optimization only fires
once the walk enters the finite-generation region where ordering
guarantees hold.

On large repositories with commit-graph, this yields 100-1000x
speedups for merge-base queries where one side (e.g. a PR branch) is
much smaller than the other.

Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add tests for the case where multiple merge-base candidates exist
and one is an ancestor of another. This exercises the side-exhaustion
optimization in paint_down_to_common together with the
remove_redundant safety net in get_merge_bases_many_0.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add test cases to t6600-test-reach.sh that exercise edge cases in the
side-exhaustion optimization for paint_down_to_common():

 - in_merge_bases_many:self: commit is both A and one of the X inputs
 - get_merge_bases_many:duplicate-twos: duplicate entries in X list
 - get_merge_bases_many:pending-stale: STALE transition on an
   already-painted commit (ps-* diamond topology)
 - get_merge_bases_many:infinity-both-sides: both tips outside the
   commit-graph with non-monotonic dates (pi-* topology)

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add a perf test that benchmarks merge-base on a synthetic 500k-commit
repository with a deep side branch.

Includes variants with and without a commit-graph to cover the case
where both tips are outside the commit-graph (INFINITY region).

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add a technical document describing merge-base computation
and specifically paint_down_to_common() implementation.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
@spkrka spkrka force-pushed the side-exhaust-pr branch from bf8f525 to edd6ee5 Compare June 18, 2026 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants