Skip to content

[SPARK-57437][SQL] Infer additional constraints by substituting attribute-to-literal bindings#56499

Open
xumingming wants to merge 1 commit into
apache:masterfrom
xumingming:catalyst-infer-constraints-from-literal-bindings
Open

[SPARK-57437][SQL] Infer additional constraints by substituting attribute-to-literal bindings#56499
xumingming wants to merge 1 commit into
apache:masterfrom
xumingming:catalyst-infer-constraints-from-literal-bindings

Conversation

@xumingming

Copy link
Copy Markdown

What changes were proposed in this pull request?

When a predicate binds an attribute to a literal (e.g. a.pt = '20260610') and another predicate references that attribute (e.g. b.pt >= f(a.pt)), Catalyst previously did not exploit the literal binding to derive a simpler, pushable predicate.

   SELECT *
   FROM a
   LEFT JOIN b
     ON a.key = b.key
    AND b.pt >= f(a.pt)
   WHERE a.pt = '20260610';

Here table b is full scaned, thus very bad performance.

This change extends ConstraintHelper.inferAdditionalConstraints with a second pass that:

  1. Collects Attribute = Literal bindings from the constraint set.
  2. Substitutes the literal into non-equality predicates that reference those attributes.
  3. Adds the resulting deterministic expressions as new inferred constraints.

After constant folding, the inferred predicates can be pushed into scans as partition filters, avoiding full-table scans in cases where only a small subset of partitions can match.

Why are the changes needed?

Currently query of the following pattern causes full table scan for table b:

   SELECT *
   FROM a
   LEFT JOIN b
     ON a.key = b.key
    AND b.pt >= f(a.pt)
   WHERE a.pt = '20260610';

With this optimization table b can get very good partition pruning.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

…bute-to-literal bindings

When a predicate binds an attribute to a literal (e.g. a.pt = '20260610') and another predicate references that attribute (e.g. b.pt >= f(a.pt)), Catalyst previously did not exploit the literal binding to derive a simpler, pushable predicate.

This change extends ConstraintHelper.inferAdditionalConstraints with a second pass that:

  1. Collects Attribute = Literal bindings from the constraint set.

  2. Substitutes the literal into non-equality predicates that reference those attributes.

  3. Adds the resulting deterministic expressions as new inferred constraints.

After constant folding, the inferred predicates can be pushed into scans as partition filters, avoiding full-table scans in cases where only a small subset of partitions can match.
@xumingming

Copy link
Copy Markdown
Author

@cloud-fan Can you help taking a look at this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant