HIVE-29503: Prevent Join cardinality overestimation of joins with NDV(0) columns by konstantinb · Pull Request #6356 · apache/hive

konstantinb · 2026-03-12T07:18:12Z

What changes were proposed in this pull request?

HIVE-29503: Use existing conservative heuristic for estimating join product for queries with unknown (0) NDV(s)

Why are the changes needed?

The result file generated with the original code shows a massive cardinality explosion of a fairly small query. This frequently leads to suboptimal query plans

Does this PR introduce any user-facing change?

No

How was this patch tested?

Nest code, new test files, and a private Hive implementation

konstantinb · 2026-03-23T23:26:41Z

                        input vertices:
                          1 Map 2
-                        Statistics: Num rows: 4 Data size: 1184 Basic stats: COMPLETE Column stats: COMPLETE
+                        Statistics: Num rows: 2 Data size: 325 Basic stats: COMPLETE Column stats: NONE


at the moment. NDV of datetime/timestamp columns is not being assigned to colstats even if available. Changing that will make this estimate better; however, doing impacts over 100 .out files so perhaps doing so belongs to a separate story?

konstantinb · 2026-03-24T15:13:25Z

Quality Gate passed

Issues 4 New issues 0 Accepted issues

Measures 0 Security Hotspots 0.0% Coverage on New Code 0.0% Duplication on New Code

See analysis details on SonarQube Cloud

these were introduced by #6361

…ng the join product row count with an NDV of 0

…ables

sonarqubecloud · 2026-04-09T21:59:23Z

Quality Gate passed

Issues
5 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

konstantinb · 2026-05-11T23:44:06Z

Hi @zabetak — this is a companion to #6418, addressing the same NDV=0 "unknown stats" problem but in the join cardinality estimator rather than GROUP BY.

The bug: when join keys have NDV=0 on both sides (common for binary columns, columns without populated NDV, or giant tables with incomplete stats), getDenominator returns 0, which computeRowCountAssumingInnerJoin replaces with 1. The join formula then becomes:

  result = otherSideRows × (maxRows / 1)

For two 100M-row tables, that's 100M × 100M = 10^16 — a full cartesian product estimate for an equi-join. This cascades into downstream operators (aggregations, subsequent joins) and typically forces suboptimal plans by making the join output appear astronomically larger than it actually is.

The fix intercepts after PK-FK inference fails but before the NDV-based denominator path, and applies the existing hive.stats.join.factor heuristic (default 1.1× the largest input). This is the same conservative estimate already used in the "no column statistics at all" branch — just triggered earlier when we can detect that NDV=0 makes the denominator meaningless.

Would you be willing to take a look when you have time? Happy to provide additional context or adjust the approach. Thanks!

asf-ci-hive added tests pending tests unstable and removed tests pending tests unstable labels Mar 12, 2026

asf-ci-hive added tests pending tests passed and removed tests unstable tests pending labels Mar 23, 2026

konstantinb changed the title ~~HIVE-29503: Use the fallback of half the number of rows when estimating the join product row count with an NDV of 0~~ HIVE-29503: Prevent Join cardinality overestimation of joins with NDV(0) columns Mar 23, 2026

asf-ci-hive added tests pending and removed tests passed labels Mar 23, 2026

konstantinb marked this pull request as ready for review March 23, 2026 23:19

konstantinb commented Mar 23, 2026

View reviewed changes

asf-ci-hive added tests passed and removed tests pending labels Mar 24, 2026

asf-ci-hive added tests pending tests passed and removed tests passed tests pending labels Mar 24, 2026

asf-ci-hive added tests pending and removed tests passed tests pending labels Apr 1, 2026

asf-ci-hive added the tests passed label Apr 1, 2026

konstantinb added 3 commits April 9, 2026 08:01

HIVE-29503: Use the fallback of half the number of rows when estimati…

9a1bd8d

…ng the join product row count with an NDV of 0

HIVE-29503: trying to trigger the fallback logic differently

cac7932

HIVE-29503: don't trigger "NDV==0 fallback" on empty or super small t…

ac63731

…ables

konstantinb force-pushed the HIVE-29503 branch from 7664757 to ead56ae Compare April 9, 2026 15:47

asf-ci-hive added tests pending and removed tests passed labels Apr 9, 2026

konstantinb added 5 commits April 9, 2026 08:52

HIVE-29503: .out files impacted after rebasing onto latest master

c0788bd

HIVE-29503: .q file comments' cleanup

b431bdb

HIVE-29503: refactored ndv join estimation logic to preserve statistics

784084a

HIVE-29503; SonarQube feedback

01b2998

HIVE-29503; .out files impacted after rebasing onto latest master

c34c234

konstantinb force-pushed the HIVE-29503 branch from ead56ae to c34c234 Compare April 9, 2026 15:53

asf-ci-hive added tests failed tests pending and removed tests pending tests failed labels Apr 9, 2026

asf-ci-hive added tests passed and removed tests pending labels Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-29503: Prevent Join cardinality overestimation of joins with NDV(0) columns#6356

HIVE-29503: Prevent Join cardinality overestimation of joins with NDV(0) columns#6356
konstantinb wants to merge 8 commits into
apache:masterfrom
konstantinb:HIVE-29503

konstantinb commented Mar 12, 2026 •

edited

Loading

Uh oh!

konstantinb Mar 23, 2026

Uh oh!

konstantinb commented Mar 24, 2026

Quality Gate passed

Uh oh!

sonarqubecloud Bot commented Apr 9, 2026

Uh oh!

konstantinb commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

konstantinb commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

konstantinb Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

konstantinb commented Mar 24, 2026

Quality Gate passed

Uh oh!

sonarqubecloud Bot commented Apr 9, 2026

Quality Gate passed

Uh oh!

konstantinb commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

konstantinb commented Mar 12, 2026 •

edited

Loading

konstantinb commented May 11, 2026 •

edited

Loading