Skip to content

[SPARK-37019][SQL][FOLLOWUP] Resolve nested higher-order function arguments first#56507

Draft
sunchao wants to merge 1 commit into
apache:masterfrom
sunchao:dev/chao/codex/hof-gate-analyzer-behavior-oss
Draft

[SPARK-37019][SQL][FOLLOWUP] Resolve nested higher-order function arguments first#56507
sunchao wants to merge 1 commit into
apache:masterfrom
sunchao:dev/chao/codex/hof-gate-analyzer-behavior-oss

Conversation

@sunchao

@sunchao sunchao commented Jun 14, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Resolve each higher-order function's argument expressions before checking their data types and binding its lambda functions.

The analyzer now follows this order:

  1. Resolve the argument expressions using the current outer lambda scope.
  2. Rebuild the higher-order function with those resolved arguments.
  3. If the arguments are ready and valid, bind the lambda functions immediately.
  4. Otherwise, resolve only the function expressions and defer binding.

This is intentionally narrow. It does not change ArrayAggregate accumulator types, casts, code generation, or runtime execution.

The PR also adds a focused regression test for the nested transform / filter / aggregate expression that exposed the bug.

Why are the changes needed?

ResolveLambdaVariables previously bound a higher-order function only when its arguments were already resolved at the start of the visit. If nested argument expressions became resolved during that visit, Spark still walked and rebuilt the remaining expression tree without binding the current lambda functions.

For complex nested types, that ordering could inspect a field extraction whose lambda variable was still unresolved and fail analysis with:

Invalid call to dataType on unresolved object

In short:

Before: check readiness -> resolve nested arguments -> wait for another analyzer pass
After:  resolve nested arguments -> check readiness -> bind lambdas in the same pass

Does this PR introduce any user-facing change?

Yes. Valid queries with nested higher-order functions that previously failed during analysis can now be analyzed and executed.

There is no public API, configuration, or intended runtime behavior change for queries that already worked.

How was this patch tested?

  • Added ArrayAggregate resolves nested lambda arguments before inspecting their types to reproduce the production-shaped failure and verify the result.
  • ResolveLambdaVariablesSuite: 6 tests passed.
  • DataFrameComplexTypeSuite: 18 tests passed.
  • Catalyst and SQL test Scalastyle: 0 errors and 0 warnings.
  • git diff --check passed.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Codex (GPT-5)

@sunchao sunchao force-pushed the dev/chao/codex/hof-gate-analyzer-behavior-oss branch from f2fed4d to dea9dea Compare June 17, 2026 04:12
@sunchao sunchao changed the title [SPARK-37019][SQL][FOLLOWUP] Defer ArrayAggregate accumulator widening [SPARK-37019][SQL][FOLLOWUP] Resolve nested higher-order function arguments first Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant