[SPARK-56054][SQL] Undo handling aliased assignments in MERGE INTO schema evolution and add tests by johanl-db · Pull Request #55239 · apache/spark

johanl-db · 2026-04-07T16:58:50Z

What changes were proposed in this pull request?

Follow up from #54891 in particular this comment.

Adds more tests covering schema evolution in MERGE INTO when the assignment contains an alias.
This change also undoes the fix from #54891. On closer inspection, this isn't needed in Spark as Spark already removes trivial aliases on nested field accessors in MERGE in resolveExprInAssignment, which is what the change aimed to allow. See this test added here covering trivial aliases implictily added on nested field access.

Delta doesn't strip aliases during resolution in that way and will require special handling, but that's up to Delta to implement in its custom resolution logic to replicate what Spark already does

How was this patch tested?

Adds tests covering MERGE INTO schema evolution with aliases in assignments

szehon-ho

these are all DataFrame API, wondering is there any equivalent in SQL? Be good to abstract it out if possible

If not, maybe we can we make a new file MergeIntoSchemaEvolutionExtraDataFrameTests to make them only run in DataFrame mode?

It is a bit of a mess now due to the inheritance patterns

johanl-db · 2026-04-09T15:00:48Z

I moved the tests to a dedicated trait for dataframe tests.

There's no SQL equivalent, it's not possible to specify an alias in an assignment expression using SQL afaict: SET col = source.col AS other_col

aokolnychyi · 2026-04-09T17:14:46Z

+        spark.table("source")
+          .mergeInto(tableNameAsString,
+            col(s"$tableNameAsString.pk") === col("source.pk"))
+          .whenMatched().update(Map("info" -> col("source.info").as("info")))


Okay, this seems like a behavior change compared to the initial PR then? The code in master would evolve the schema in this case because we had that alias support?

Correct. The main motivation for the initial PR adding support for alias was to support nested assignments SET col.x = s.col.x, which translates to Assignment("col.x", Alias(GetStructField("col", "x"), "x")
I've noticed in the meantime that Spark actually strips the alias on struct field access after resolution (Delta doesn't).

We don't have a strong argument to support the remaining cases where the user provides an explicit alias using the dataframe API, so I'd rather not allow these

aokolnychyi · 2026-04-09T17:15:40Z

+          .mergeInto(tableNameAsString,
+            col(s"$tableNameAsString.pk") === col("source.pk"))
+          .whenMatched().update(Map(
+            "info" -> col("source.info").as("something_else")))


How does Delta behave in this case?

Delta allows schema evolution here, but only because it allows arbitrary expressions on the right hand side of the assignment. col("source.info").as("something_else") can qualifies for schema evolution the same way col("source.other_column").plus(lit(1)) does.
Spark rejects both (with this change)

aokolnychyi · 2026-04-09T17:17:01Z

@johanl-db, what about cases when the source query has an alias?

MERGE INTO t
USING (SELECT 'blah' AS col1, 1 AS col2 FROM source) AS s
ON false
WHEN NOT MATCHED THEN INSERT *

johanl-db · 2026-04-14T06:32:59Z

@johanl-db, what about cases when the source query has an alias?
MERGE INTO t
USING (SELECT 'blah' AS col1, 1 AS col2 FROM source) AS s
ON false
WHEN NOT MATCHED THEN INSERT *

Discussed directly with @aokolnychyi and @szehon-ho :
The shape of source doesn't matter at all, the only parts that matter when performing schema evolution are:

The schema of the source
The assignment expressions in the match clauses

MERGE doesn't look inside the source query to see what it contains, so the aliases there are irrelevant.
This case will trigger schema evolution for all new columns in the source schema, since there's an INSERT * assignment.

johanl-db · 2026-04-14T10:28:57Z

CI is failing due to formatting on files that are not touched in this PR. Likely from #55281

szehon-ho

I think this one is ready to go @cloud-fan

i also added the test requested by @aokolnychyi in #55304 (i didnt see the comment), we can add it separately , or optionally as part this pr, i guess it doesnt matter. As we looked into it together, its a bit orthogonal to this pr (the aliases within source doens't matter, to MERGE it just comes as a source table)

szehon-ho · 2026-04-15T21:12:21Z

also , apparently this ci error is fixed by #55334

johanl-db · 2026-04-16T08:00:02Z

I think this one is ready to go @cloud-fan

i also added the test requested by @aokolnychyi in #55304 (i didnt see the comment), we can add it separately , or optionally as part this pr, i guess it doesnt matter. As we looked into it together, its a bit orthogonal to this pr (the aliases within source doens't matter, to MERGE it just comes as a source table)

@szehon-ho I picked up your test from #55304 in this PR

cloud-fan

Summary

This is a follow-up to #54891 that reverts the generic case Alias(child, _) branch in MergeIntoTable.extractFieldPath and expands the test surface. The rationale: the implicit trivial Alias(ExtractValue, _) that AttributeSeq.resolve adds around nested field accesses is already stripped by resolveExprInAssignment (ColumnResolutionHelper.scala:489) before values reach extractFieldPath. So nested assignments like SET target.info.b = source.info.b keep working (and are already guarded by existing tests in MergeIntoSchemaEvolutionBasicTests.scala:629/691). The generic Alias case in #54891 additionally picked up explicit user aliases on top-level struct columns (col("src.info").as("info")) — an unintended side-effect that isn't a pattern Spark core needs to support. Delta can (and does) handle that via its own custom resolution.

Net effect: alias handling for schema evolution is consolidated in the resolution layer; top-level explicit aliased struct columns revert to the pre-#54891 behavior (no evolution, incompatible-schema error). #54891 is on master only and unreleased, so no shipped behavior regresses.

The four new negative tests in MergeIntoSchemaEvolutionExtraScalaTests pin down the now-unsupported patterns (top-level aliased struct with matching/mismatched name, nested field through an aliased struct, complex expression value). The new SQL test covers the source-subquery-alias case raised by @aokolnychyi.

LGTM.

cloud-fan · 2026-04-20T15:05:36Z

thanks, merging to master!

Add test for MERGE INTO schema evolution with aliased assignments

c68d16d

szehon-ho reviewed Apr 7, 2026

View reviewed changes

Comment thread .../org/apache/spark/sql/connector/MergeIntoSchemaEvolutionTypeWideningAndExtraFieldTests.scala Outdated

Move tests to dedicated trait

8bd48ed

johanl-db requested a review from szehon-ho April 9, 2026 15:00

aokolnychyi reviewed Apr 9, 2026

View reviewed changes

szehon-ho mentioned this pull request Apr 11, 2026

[SPARK-56450][SQL] Add merge into schema evolution test with source subquery alias #55304

Closed

johanl-db changed the title ~~[SPARK-56054][SQL] Add test for MERGE INTO schema evolution with aliased assignments~~ [SPARK-56054][SQL] Undo handling aliased assignments in MERGE INTO schema evolution and add tests Apr 14, 2026

Merge branch 'master' into SPARK-56054-merge-into-schema-evolution-alias

87bcfa0

johanl-db requested a review from aokolnychyi April 14, 2026 06:33

szehon-ho approved these changes Apr 15, 2026

View reviewed changes

Add test from apache#55304

1f51e18

cloud-fan approved these changes Apr 20, 2026

View reviewed changes

cloud-fan closed this in 0cf7c2a Apr 20, 2026

Conversation

johanl-db commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

szehon-ho left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johanl-db commented Apr 9, 2026

Uh oh!

aokolnychyi Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johanl-db Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

johanl-db Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

aokolnychyi commented Apr 9, 2026

Uh oh!

johanl-db commented Apr 14, 2026

Uh oh!

johanl-db commented Apr 14, 2026

Uh oh!

szehon-ho left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented Apr 15, 2026

Uh oh!

johanl-db commented Apr 16, 2026

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Summary

Uh oh!

cloud-fan commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

johanl-db commented Apr 7, 2026 •

edited

Loading

aokolnychyi Apr 9, 2026 •

edited

Loading

szehon-ho left a comment •

edited

Loading