Skip to content

Add MERGE INTO types to datafusion-expr#20763

Open
wirybeaver wants to merge 1 commit intoapache:mainfrom
wirybeaver:feature/mergeinto-type
Open

Add MERGE INTO types to datafusion-expr#20763
wirybeaver wants to merge 1 commit intoapache:mainfrom
wirybeaver:feature/mergeinto-type

Conversation

@wirybeaver
Copy link
Copy Markdown

@wirybeaver wirybeaver commented Mar 6, 2026

Which issue does this PR close?

The task 1 of #20746

Rationale for this change

support MERGE INTO DML statements in the datafusion-expr

What changes are included in this PR?

Add new types to support MERGE INTO DML statements

  • MergeIntoOp: Carries ON condition and WHEN clauses
  • MergeIntoClause: Single WHEN clause (kind, predicate, action)
  • MergeIntoClauseKind: Matched/NotMatched/NotMatchedByTarget/NotMatchedBySource
  • MergeIntoAction: Update/Insert/Delete actions

Add WriteOp::MergeInto variant to WriteOp enum.

Are these changes tested?

Unit Test

Copy link
Copy Markdown
Contributor

@ethan-tyler ethan-tyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for kicking this off @wirybeaver. Type definitions look reasonable as a first step for #20746. Left a couple of comments, and this can also use some tests for the new public types.

Comment on lines +242 to +243
/// `MERGE INTO` operation
MergeInto(MergeIntoOp),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is breaking datafusion-proto. The proto conversion for WriteOp is non-exhaustive and the proto schema can't carry MergeIntoOp.on or clauses. I would either land proto changes together or add an explicit serialization error path.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the amended commit. Extended DmlNode with a MERGE_INTO type tag and an optional MergeIntoOpNode payload (carrying on, clauses, per-clause predicates, and Update/Insert/Delete actions with their embedded Exprs). Wired through to_proto/from_proto and added a pub fn parse_write_op(&DmlNode, registry, codec) helper that the LogicalPlanType::Dml deserializer uses. The tag-only From<dml_node::Type> for WriteOp impl is kept for back-compat with a documented unreachable! arm for MergeInto — callers must use parse_write_op. Round-trip test added in roundtrip_logical_plan_dml_merge_into.

pub kind: MergeIntoClauseKind,
/// Optional additional predicate (`AND <expr>`).
pub predicate: Option<Expr>,
/// The action to take.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the planner lands, apply_expressions / with_new_exprs will need to be wired through DML to pick up the Expr payloads stored here. Not a concern for this PR since these are just type definitions.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack — will wire the on Expr and per-clause predicate / action Exprs through apply_expressions and with_new_exprs in the planner PR (task 2 of #20746). Noting it here so it doesn't get lost.

Comment on lines +325 to +333
pub enum MergeIntoClauseKind {
/// WHEN MATCHED
Matched,
/// WHEN NOT MATCHED (synonymous with NOT MATCHED BY TARGET)
NotMatched,
/// WHEN NOT MATCHED BY TARGET
NotMatchedByTarget,
/// WHEN NOT MATCHED BY SOURCE
NotMatchedBySource,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NotMatched and NotMatchedByTarget look synonymous. If both stay, document how downstream consumers should handle them

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept both variants and added a type-level doc on MergeIntoClauseKind calling out that they are semantically identical (source row with no matching target row) — NotMatched is the SQL standard short form (Snowflake/Postgres/SQL Server), NotMatchedByTarget is BigQuery's explicit form. The doc states downstream consumers (planners, table providers, optimizers) MUST treat them identically. Mirroring sqlparser keeps the parsed SQL spelling round-tripping losslessly. Per-variant docs cross-link to the type-level note. Happy to revisit and collapse to a single canonical variant if you'd prefer that — just wanted to keep this PR scoped to type defs + proto wiring.

@wirybeaver wirybeaver force-pushed the feature/mergeinto-type branch from 4112116 to 68e8c58 Compare April 24, 2026 06:06
@github-actions github-actions Bot added the proto Related to proto crate label Apr 24, 2026
@wirybeaver
Copy link
Copy Markdown
Author

Thanks for the review @ethan-tyler. Addressed the three inline comments in the amended commit and added tests for the new public types: write_op_merge_into_name_and_display in datafusion-expr and roundtrip_logical_plan_dml_merge_into in datafusion-proto (full LogicalPlan::Dml → bytes → LogicalPlan::Dml round-trip exercising the new proto wiring), plus two narrower tests for the MergeInto proto tag and the missing-payload error path.

@wirybeaver wirybeaver requested a review from ethan-tyler April 25, 2026 02:39
@wirybeaver wirybeaver force-pushed the feature/mergeinto-type branch from 68e8c58 to ff1fbd4 Compare May 5, 2026 05:03
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-expr v53.1.0 (current)
       Built [  29.444s] (current)
     Parsing datafusion-expr v53.1.0 (current)
      Parsed [   0.073s] (current)
    Building datafusion-expr v53.1.0 (baseline)
       Built [  26.965s] (baseline)
     Parsing datafusion-expr v53.1.0 (baseline)
      Parsed [   0.071s] (baseline)
    Checking datafusion-expr v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   1.324s] 222 checks: 220 pass, 2 fail, 0 warn, 30 skip

--- failure auto_trait_impl_removed: auto trait no longer implemented ---

Description:
A public type has stopped implementing one or more auto traits. This can break downstream code that depends on the traits being implemented.
        ref: https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/auto_trait_impl_removed.ron

Failed in:
  type WriteOp is no longer UnwindSafe, in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/dml.rs:231
  type WriteOp is no longer RefUnwindSafe, in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/dml.rs:231
  type WriteOp is no longer UnwindSafe, in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/dml.rs:231
  type WriteOp is no longer RefUnwindSafe, in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/dml.rs:231
  type WriteOp is no longer UnwindSafe, in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/dml.rs:231
  type WriteOp is no longer RefUnwindSafe, in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/dml.rs:231
  type WriteOp is no longer UnwindSafe, in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/dml.rs:231
  type WriteOp is no longer RefUnwindSafe, in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/dml.rs:231

--- failure enum_variant_added: enum variant added on exhaustive enum ---

Description:
A publicly-visible enum without #[non_exhaustive] has a new variant.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#enum-variant-new
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/enum_variant_added.ron

Failed in:
  variant WriteOp:MergeInto in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/dml.rs:243
  variant WriteOp:MergeInto in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/dml.rs:243
  variant WriteOp:MergeInto in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/dml.rs:243
  variant WriteOp:MergeInto in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/dml.rs:243

     Summary semver requires new major version: 2 major and 0 minor checks failed
    Finished [  59.254s] datafusion-expr
    Building datafusion-proto v53.1.0 (current)
       Built [  55.402s] (current)
     Parsing datafusion-proto v53.1.0 (current)
      Parsed [   0.135s] (current)
    Building datafusion-proto v53.1.0 (baseline)
       Built [  54.763s] (baseline)
     Parsing datafusion-proto v53.1.0 (baseline)
      Parsed [   0.135s] (baseline)
    Checking datafusion-proto v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   1.897s] 222 checks: 220 pass, 2 fail, 0 warn, 30 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field DmlNode.merge_into in /home/runner/work/datafusion/datafusion/datafusion/proto/src/generated/prost.rs:438
  field DmlNode.merge_into in /home/runner/work/datafusion/datafusion/datafusion/proto/src/generated/prost.rs:438

--- failure enum_variant_added: enum variant added on exhaustive enum ---

Description:
A publicly-visible enum without #[non_exhaustive] has a new variant.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#enum-variant-new
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/enum_variant_added.ron

Failed in:
  variant Type:MergeInto in /home/runner/work/datafusion/datafusion/datafusion/proto/src/generated/prost.rs:462
  variant Type:MergeInto in /home/runner/work/datafusion/datafusion/datafusion/proto/src/generated/prost.rs:462

     Summary semver requires new major version: 2 major and 0 minor checks failed
    Finished [ 115.182s] datafusion-proto

@github-actions github-actions Bot added the auto detected api change Auto detected API change label May 5, 2026
Add new types to support MERGE INTO DML statements:
- MergeIntoOp: Carries ON condition and WHEN clauses
- MergeIntoClause: Single WHEN clause (kind, predicate, action)
- MergeIntoClauseKind: Matched/NotMatched/NotMatchedByTarget/NotMatchedBySource
- MergeIntoAction: Update/Insert/Delete actions

Add WriteOp::MergeInto variant to WriteOp enum.

Wire MergeInto through datafusion-proto: extend the DmlNode message with
a MERGE_INTO type tag and a MergeIntoOpNode payload field, plus matching
serializers in to_proto.rs and from_proto.rs. The tag-only conversion
From<dml_node::Type> for WriteOp cannot reconstruct the MergeInto payload,
so callers must use the new fallible parse_write_op helper.

Document that MergeIntoClauseKind::NotMatched and NotMatchedByTarget are
semantically equivalent and must be treated identically downstream.

Add unit tests in datafusion-expr and proto round-trip tests in
datafusion-proto.
@wirybeaver wirybeaver force-pushed the feature/mergeinto-type branch from ff1fbd4 to 5cba49f Compare May 6, 2026 05:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change logical-expr Logical plan and expressions proto Related to proto crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants