Skip to content

[SPARK-57496][SQL][BUILD] Keep the Types Framework ops and UDF worker packages out of the published API#56551

Open
cloud-fan wants to merge 1 commit into
apache:masterfrom
cloud-fan:SPARK-57496
Open

[SPARK-57496][SQL][BUILD] Keep the Types Framework ops and UDF worker packages out of the published API#56551
cloud-fan wants to merge 1 commit into
apache:masterfrom
cloud-fan:SPARK-57496

Conversation

@cloud-fan

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Two related changes that keep internal packages out of the published 4.2.0 API surface:

  1. Move the client-side Types Framework ops — TypeApiOps, TimeTypeApiOps, TimestampNanosTypeApiOps (and the TimestampNTZNanosTypeApiOps / TimestampLTZNanosTypeApiOps impls) — from org.apache.spark.sql.types.ops to org.apache.spark.sql.catalyst.types.ops, co-located with the server-side TypeOps family. Consumer imports are updated; same-package consumers drop the now-redundant import.
  2. Exclude org.apache.spark.udf.worker from the generated API docs in project/SparkBuild.scala's ignoreUndocumentedPackages.

Why are the changes needed?

The *ApiOps types are internal plumbing of the Types Framework (the client-side counterpart to catalyst's TypeOps), but they lived inside the public org.apache.spark.sql.types package, so they leaked into the published PySpark/Scala API of the unreleased 4.2.0 line. org.apache.spark.sql.catalyst.* is already excluded from both the generated docs (ignoreUndocumentedPackages) and MiMa (MimaExcludes), so relocating them there makes them internal with no new build/MiMa entries and mirrors how the server-side TypeOps is already handled.

org.apache.spark.udf.worker is UDF-worker infrastructure (mostly protobuf-generated *OrBuilder Java plus worker internals) that surfaced as public API. Its modules aren't MiMa-checked, and the generated Java can't carry a Scala visibility qualifier, so excluding the package from the docs is the appropriate fix.

Does this PR introduce any user-facing change?

No. Relative to released Spark there is no change; the affected types are new in the unreleased 4.2.0 line and were never intended to be public. This only removes them from the generated API docs (and, for the ops, the binary-compatibility surface) before release. There is no behavior change.

How was this patch tested?

No new tests — this is a package relocation plus a build-config change with no logic change. The relocated classes are exercised by existing suites (e.g. TimestampNanosTypeOpsSuite) and the cast / Row / HiveResult paths; CI compiles all affected modules and runs scalastyle, which enforces the import-ordering updates made here.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

… packages out of the published API

Move the client-side Types Framework ops (TypeApiOps, TimeTypeApiOps,
TimestampNanosTypeApiOps) from org.apache.spark.sql.types.ops to
org.apache.spark.sql.catalyst.types.ops. They are internal plumbing
(parallel to the server-side TypeOps) but sat inside the public
org.apache.spark.sql.types package, leaking into the published API. The
catalyst package is already excluded from both the generated docs
(ignoreUndocumentedPackages) and MiMa (MimaExcludes), so co-locating the
client ops there with the server-side TypeOps keeps them out of the
public surface with no new build/MiMa entries.

Also exclude org.apache.spark.udf.worker from the generated docs in
SparkBuild.scala: it is UDF-worker infrastructure (mostly protobuf-
generated *OrBuilder Java plus worker internals) that surfaced as public
API.

Co-authored-by: Isaac
@cloud-fan

Copy link
Copy Markdown
Contributor Author

cc @dongjoon-hyun @huaxingao

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 because the relocation from sql to catalyst looks inevitable.

According to the PR description, do you mean this is a blocker of Apache Spark 4.2.0? If then, I'd like to recommend to cast a -1 on RC3 vote explicitly.

the affected types are new in the unreleased 4.2.0 line and were never intended to be public.

@cloud-fan

Copy link
Copy Markdown
Contributor Author

@dongjoon-hyun thanks for the reminder! I just did so :)

@dongjoon-hyun

dongjoon-hyun commented Jun 16, 2026

Copy link
Copy Markdown
Member

Thank you, @cloud-fan .

To @huaxingao , if you agree to prepare RC4, could you conclude the RC3 vote and open the branch to us for one day? AFAIK, there are multiple PRs (which we want to land on branch-4.2 and lower). If there is a chance, we want to proceed. To be clear, I don't claim that the release blockers. It's more like bug fixes. For me, the following are them which are already in branch-4.x.

In addition, branch-4.x and olders (including branch-4.2) are currently broken unfortunately. So, we need to fix it before RC4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants