Add group join physical optimizer#21995
Conversation
73f4713 to
5fe7219
Compare
|
run benchmark tpch tpcds tpch10 |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing groupjoin-eliminate-extra-hash-build (5fe7219) to 2f2fe8f (merge-base) diff using: tpch10 File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing groupjoin-eliminate-extra-hash-build (5fe7219) to 2f2fe8f (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing groupjoin-eliminate-extra-hash-build (5fe7219) to 2f2fe8f (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch10 — base (merge-base)
tpch10 — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
|
@Dandandan Thanks for running the benchmarks. There are additional equivalencies/optimizations of hashjoin + groupby that can be turned into groupjoin from the paper. I wanted to make this PR just the initial optimization + create the groupjoin rule. If its okay, I will ping you in another draft PR which will contain all optimizations in the paper so you can review + benchmark. I listed the optimizations im talking about at the bottom of this PRs description. Ignore the cost based one I think this is not really currently applicable |
Sounds great! |
|
This looks awesome! I have a question, why physical optimizer rule? It looks simpler to implement a logical optimizer rule instead. |
|
@2010YOUY01 So introduce the groupjoin logically by detecting the opportunity there then edit the hash join exec so that it does the goupby within? |
On second thought, I figured it’s better to keep this in the physical optimizer, please ignore that. If we perform this transformation in the logical optimizer (by detecting Aggregate + equi-join), we remove the flexibility for the physical optimizer to choose among multiple applicable rules (that are only possible in physical optimizer) and search for a globally optimal plan. |
|
sounds good, thanks. Yep, physical rule preferred because there is some complexity on additional groupjoins optimizations possible which im not sure is possible just in logical layer When groupjoin opportunity is found we always do the Memoizing GroupJoin so simply build the hash table on the left side with accumulators embedded, then update them inplace during probe. One big one from the papers is "Eager Right Aggregation" which is just pre-aggregate the probe side before the join, reducing its cardinality from |S| to |distinct(S.join_key)|. Ideal when most right-side groups have a corresponding value (one way to verify this is foreign key constraint which can be added with eager aggregation) |
Rationale for this change
Some queries combining a join and a group-by on the same key can be executed as a single groupjoin operator. This optimization targets a common analytical pattern — dimension-fact joins where a smaller dimension table is joined with a larger fact table and aggregated by the join key.
This is based on research: Moerkotte & Neumann PVLDB 2011. The paper introduces the groupjoin algebraic equivalence and proves its correctness for both inner and outer joins, provided the join key is a key of the build side.
This PR implements the groupjoin operator and optimizer rule, using the memoizing groupjoin strategy from the paper: a single hash table serves as both the join lookup and the aggregation group table, with probe-side rows updating accumulators in-place. This eliminates the redundant hash table construction and intermediate result materialization that occur when the join and aggregate run as separate operators. This addresses #13243.
locally I saw: TPC-H Q13 (SF10): 299ms → 254ms (~15% faster), not zero regressions (with just groupjoin avoiding materialization so not including additional optimizations below)
What changes are included in this PR?
New physical operator —
GroupJoinExec(physical-plan/src/joins/group_join.rs):GroupValueshash table from the left (build) sideGroupsAccumulators in-place for matching rowsNew physical optimizer rule —
GroupJoinOptimizer(physical-optimizer/src/group_join.rs):AggregateExecaboveHashJoinExec(looking through intermediateProjectionExec)GroupsAccumulatorCombinePartialFinalAggregatein the optimizer pipelineHow can this be extended?
The paper describes three additional strategies and optimizations we did not implement:
Eager Right Aggregation (Strategy 1) — Pre-aggregate the probe side before the join, reducing its cardinality. For Q13, this would reduce the 15M order rows to ~1.5M pre-aggregated groups before joining with 1.5M customers. The paper reports >2x improvement on Q13 with this strategy.
Superset GROUP BY (Theorem 3 in the paper) — Handle cases where GROUP BY keys are a superset of the join keys (extra keys from the build side). This would enable queries like Q3 (
GROUP BY l_orderkey, o_orderdate, o_shipprioritywith join ono_orderkey = l_orderkey). Requires the probe side to look up by the join key subset while the hash table is keyed by the full GROUP BY.Cost-model strategy selection (Section 4 of Fent et al.) — Choose between the four strategies at optimization time based on input cardinalities and selectivities, rather than always using Strategy 2.