Add MERGE INTO types to datafusion-expr#20763
Conversation
ethan-tyler
left a comment
There was a problem hiding this comment.
Thanks for kicking this off @wirybeaver. Type definitions look reasonable as a first step for #20746. Left a couple of comments, and this can also use some tests for the new public types.
| /// `MERGE INTO` operation | ||
| MergeInto(MergeIntoOp), |
There was a problem hiding this comment.
This is breaking datafusion-proto. The proto conversion for WriteOp is non-exhaustive and the proto schema can't carry MergeIntoOp.on or clauses. I would either land proto changes together or add an explicit serialization error path.
There was a problem hiding this comment.
Done in the amended commit. Extended DmlNode with a MERGE_INTO type tag and an optional MergeIntoOpNode payload (carrying on, clauses, per-clause predicates, and Update/Insert/Delete actions with their embedded Exprs). Wired through to_proto/from_proto and added a pub fn parse_write_op(&DmlNode, registry, codec) helper that the LogicalPlanType::Dml deserializer uses. The tag-only From<dml_node::Type> for WriteOp impl is kept for back-compat with a documented unreachable! arm for MergeInto — callers must use parse_write_op. Round-trip test added in roundtrip_logical_plan_dml_merge_into.
| pub kind: MergeIntoClauseKind, | ||
| /// Optional additional predicate (`AND <expr>`). | ||
| pub predicate: Option<Expr>, | ||
| /// The action to take. |
There was a problem hiding this comment.
Once the planner lands, apply_expressions / with_new_exprs will need to be wired through DML to pick up the Expr payloads stored here. Not a concern for this PR since these are just type definitions.
There was a problem hiding this comment.
Ack — will wire the on Expr and per-clause predicate / action Exprs through apply_expressions and with_new_exprs in the planner PR (task 2 of #20746). Noting it here so it doesn't get lost.
| pub enum MergeIntoClauseKind { | ||
| /// WHEN MATCHED | ||
| Matched, | ||
| /// WHEN NOT MATCHED (synonymous with NOT MATCHED BY TARGET) | ||
| NotMatched, | ||
| /// WHEN NOT MATCHED BY TARGET | ||
| NotMatchedByTarget, | ||
| /// WHEN NOT MATCHED BY SOURCE | ||
| NotMatchedBySource, |
There was a problem hiding this comment.
NotMatched and NotMatchedByTarget look synonymous. If both stay, document how downstream consumers should handle them
There was a problem hiding this comment.
Kept both variants and added a type-level doc on MergeIntoClauseKind calling out that they are semantically identical (source row with no matching target row) — NotMatched is the SQL standard short form (Snowflake/Postgres/SQL Server), NotMatchedByTarget is BigQuery's explicit form. The doc states downstream consumers (planners, table providers, optimizers) MUST treat them identically. Mirroring sqlparser keeps the parsed SQL spelling round-tripping losslessly. Per-variant docs cross-link to the type-level note. Happy to revisit and collapse to a single canonical variant if you'd prefer that — just wanted to keep this PR scoped to type defs + proto wiring.
4112116 to
68e8c58
Compare
|
Thanks for the review @ethan-tyler. Addressed the three inline comments in the amended commit and added tests for the new public types: |
68e8c58 to
ff1fbd4
Compare
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
Add new types to support MERGE INTO DML statements: - MergeIntoOp: Carries ON condition and WHEN clauses - MergeIntoClause: Single WHEN clause (kind, predicate, action) - MergeIntoClauseKind: Matched/NotMatched/NotMatchedByTarget/NotMatchedBySource - MergeIntoAction: Update/Insert/Delete actions Add WriteOp::MergeInto variant to WriteOp enum. Wire MergeInto through datafusion-proto: extend the DmlNode message with a MERGE_INTO type tag and a MergeIntoOpNode payload field, plus matching serializers in to_proto.rs and from_proto.rs. The tag-only conversion From<dml_node::Type> for WriteOp cannot reconstruct the MergeInto payload, so callers must use the new fallible parse_write_op helper. Document that MergeIntoClauseKind::NotMatched and NotMatchedByTarget are semantically equivalent and must be treated identically downstream. Add unit tests in datafusion-expr and proto round-trip tests in datafusion-proto.
ff1fbd4 to
5cba49f
Compare
Which issue does this PR close?
The task 1 of #20746
Rationale for this change
support MERGE INTO DML statements in the datafusion-expr
What changes are included in this PR?
Add new types to support MERGE INTO DML statements
Add WriteOp::MergeInto variant to WriteOp enum.
Are these changes tested?
Unit Test