Skip to content

[SPARK-54119] Support METRIC_VIEW creation on V2 catalogs#55487

Open
chenwang-databricks wants to merge 9 commits intoapache:masterfrom
chenwang-databricks:metric-view-on-51419
Open

[SPARK-54119] Support METRIC_VIEW creation on V2 catalogs#55487
chenwang-databricks wants to merge 9 commits intoapache:masterfrom
chenwang-databricks:metric-view-on-51419

Conversation

@chenwang-databricks
Copy link
Copy Markdown

@chenwang-databricks chenwang-databricks commented Apr 22, 2026

What changes were proposed in this pull request?

This PR extends metric-view support to DS v2 catalogs by routing CREATE VIEW ... WITH METRICS through the ViewCatalog / TableViewCatalog APIs introduced by SPARK-52729 and finalized by SPARK-56655. Third-party v2 catalogs that implement ViewCatalog can now host metric views with the same metadata fidelity as session-catalog metric views.

1. V2 metric-view CREATE path -- shared with CreateV2ViewExec. A new CreateV2MetricViewExec and CreateV2ViewExec both extend a new V2CreateViewPreparation trait (which itself extends V2ViewPreparation). The trait owns the shared CREATE-side run(): viewExists short-circuit on IF NOT EXISTS, createOrReplaceView for OR REPLACE, and cross-type collision decoding (ViewAlreadyExistsException -> tableExists -> EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE). The metric-view subclass only supplies the metric-view-specific bits (no collation, schema-mode UNSUPPORTED, typed viewDependencies, PROP_TABLE_TYPE = METRIC_VIEW, retainColumnMetadata = true) via optional hooks on V2ViewPreparation. DataSourceV2Strategy intercepts CreateMetricViewCommand on a non-session catalog and routes to the new exec; the v1 session-catalog path stays in CreateMetricViewCommand.run.

2. First-class METRIC_VIEW table type.

  • CatalogTableType.METRIC_VIEW is added alongside EXTERNAL / MANAGED / VIEW.
  • TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW" constant for the V2 surface.
  • The previous view.viewWithMetrics property hack is removed; CatalogTable.isMetricView checks tableType == METRIC_VIEW directly.
  • V1Table.summarizeTableType and V1Table.toCatalogTable(catalog, ident, ViewInfo) translate between the V2 property form and the V1 enum.
  • HMS round-trip support: HiveTableType has no METRIC_VIEW variant (both regular views and metric views serialize as VIRTUAL_VIEW). HiveExternalCatalog now persists a view.subType = METRIC_VIEW property on write and lifts tableType back to METRIC_VIEW on read, so HMS-backed metric views survive the round trip.

3. Repo-wide tableType == VIEW audit. Promoting metric views to a distinct CatalogTableType opens silent regressions wherever existing code branches on VIEW. 15 sites updated to accept METRIC_VIEW alongside VIEW:

  • DDL audit: CatalogTable.toJsonLinkedHashMap (DESCRIBE EXTENDED rows), HiveExternalCatalog.{createTable, alterTable, restoreTableMetadata} (4 sites for HMS-backed metric views), verifyAlterTableType in ddl.scala (so ALTER TABLE <metric_view> RENAME TO ... raises the right error category instead of silently renaming).
  • Repo grep: SessionCatalog.isView, InMemoryCatalog.listViews, RelationResolution (streaming-on-view rejection), Analyzer.lookupTableOrView (ResolvedPersistentView wrapping), rules.scala (saveDataIntoView), DataStreamWriter (streaming write), DescribeRelationJsonCommand, AnalyzeColumnCommand, AnalyzePartitionCommand, CommandUtils.analyzeTable, and 3 sites in tables.scala (CREATE TABLE LIKE provider lookup, ALTER TABLE ADD COLUMNS, DESCRIBE PARTITION).
  • Explicit rejection: SHOW CREATE TABLE on a metric view has no round-trippable CREATE VIEW ... WITH METRICS form, so it's rejected explicitly: the v1 session-catalog path raises the dedicated UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW; the v2 path is gated in DataSourceV2Strategy with unsupportedTableOperationError(...) (UNSUPPORTED_FEATURE.TABLE_OPERATION) before ShowCreateV2ViewExec would emit a misleading plain CREATE VIEW.

4. Drop-command parity.

  • DropTableCommand (v1 path) treats both VIEW and METRIC_VIEW as views: DROP TABLE rejects either with wrongCommandForObjectTypeError, and DROP VIEW accepts either.
  • V2SessionCatalog.dropTableInternal extends the existing "view rejected from DROP TABLE" guard to cover METRIC_VIEW.
  • For non-session v2: DropTableExec (post-SPARK-56655) actively rejects with WRONG_COMMAND_FOR_OBJECT_TYPE ("Use DROP VIEW instead") when a view sits at the ident -- works unchanged for metric views since TableViewCatalog's default viewExists derives from loadTableOrView and recognizes MetadataTable + ViewInfo.
  • ResolveSessionCatalog's DropView routing comment is clarified: v2 metric views fall through to DataSourceV2Strategy and ViewCatalog.dropView.

5. Typed view dependencies (ViewInfo.viewDependencies).

  • New public DTOs in org.apache.spark.sql.connector.catalog: Dependency (sealed interface with Dependency.table(String...) / Dependency.function(String...) factories), TableDependency, FunctionDependency, DependencyList(Dependency[]).
  • TableDependency and FunctionDependency carry the dependency identifier as structural multi-part name parts (record TableDependency(String[] nameParts)), not a single dot-flattened string. Arity is preserved per source so multi-level-namespace V2 catalogs (e.g. Iceberg cat.db1.db2.tbl -> 4 parts) and V1 sources without a catalog component (db.tbl -> 2 parts) round-trip without ambiguity against quoted identifiers containing literal ..
  • DependencyList's canonical constructor and accessor defensively clone() the array so consumers can't mutate stored ViewInfo dependency state.
  • ViewInfo gains a viewDependencies field and a ViewInfo.Builder.withViewDependencies(...) setter.
  • MetricViewHelper.collectTableDependencies walks the analyzed plan and emits structural Seq[Seq[String]] parts (V1 sources via TableIdentifier.nameParts; V2 sources via (catalog.name +: namespace) :+ name).

6. Multi-level-namespace targets for v2 metric views. MetricViewHelper.analyzeMetricViewText previously required a TableIdentifier, capping the metric-view target at 3 name parts. v2 metric views with multi-level-namespace targets (e.g. cat.db1.db2.mv) failed at ident.asTableIdentifier with requiresSinglePartNamespaceError. The helper now takes nameParts: Seq[String] directly; call sites in both the v1 path (CreateMetricViewCommand) and the v2 path (DataSourceV2Strategy) updated.

7. metric_view.* descriptor properties. MetricView.getProperties produces canonical descriptive properties (metric_view.from.type, metric_view.from.name / metric_view.from.sql, metric_view.where) that the create path merges into the view's properties bag, so catalog browsers and tooling can inspect the source/filter without re-parsing the YAML body. Long values are truncated to Constants.MAXIMUM_PROPERTY_SIZE; the Scaladoc on getProperties calls out that metric_view.from.sql is therefore a descriptive value, not a round-trippable representation -- consumers should re-read the YAML body for the full SQL.

8. ViewInfo constructor cleanup. The metric-view-specific PROP_TABLE_TYPE = METRIC_VIEW special case is dropped from the generic ViewInfo constructor in favor of properties().putIfAbsent(...). Callers that want a more specific kind (e.g. METRIC_VIEW) call BaseBuilder.withTableType(...) before build() -- exercised by CreateV2MetricViewExec via the new V2ViewPreparation.tableType hook.

9. ViewHelper.aliasPlan(retainMetadata). The user-specified-column-with-comment branch in aliasPlan previously dropped existing column metadata. A new retainMetadata: Boolean = false parameter merges the analyzed attribute's metadata into the new comment metadata. ViewHelper.prepareTable passes retainMetadata = isMetricView (v1 path); V2ViewPreparation exposes a retainColumnMetadata hook that CreateV2MetricViewExec overrides to true (v2 path). Both preserve the per-column metric_view.type / metric_view.expr keys that the analyzer attaches to dimensions and measures even when the user renames columns and adds comments.

10. Error classes.

  • New INVALID_METRIC_VIEW_YAML (sqlState 42K0L). MetricViewPlanner.parseYAML's catch blocks now route through QueryCompilationErrors.invalidMetricViewYamlError instead of SparkException.internalError, so a typo in the user's YAML body surfaces as a user-correctable AnalysisException rather than "please contact support".
  • New UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW (sqlState 0A000) for the v1 session-catalog path.

11. Misc.

  • MetricViewCanonical.parseSource accepts multipart identifiers (parseMultipartIdentifier) so 3-part catalog.schema.table source references work as AssetSource.

Why are the changes needed?

Before this PR, metric-view DDL only worked against the session catalog: the create path called SessionCatalog.createTable directly, and there was no way for a third-party v2 catalog (Unity Catalog, Hive Metastore catalog, custom REST catalogs, etc.) to own a metric view's lifecycle. SPARK-52729 / SPARK-56655 shipped ViewCatalog and TableViewCatalog as the public v2 surface for catalog-managed views; metric views are a kind of view and naturally belong on this surface.

Once metric views can live on a v2 catalog, two more constraints surface:

  1. Type discriminator. A consumer reading a row through ViewCatalog.loadView needs to know it's a metric view, not a plain SQL view, so it can render the right UI / planner output. Encoding this in PROP_TABLE_TYPE = METRIC_VIEW keeps the distinction wire-compatible and lets V1Table.toCatalogTable reconstruct CatalogTableType.METRIC_VIEW on the read path.
  2. Structured dependency lineage. Metric views always reference at least one source table; cataloging that lineage as flat string properties or single dot-joined strings loses arity for multi-level namespaces and is ambiguous against quoted identifiers. A typed DependencyList of TableDependency / FunctionDependency with structural String[] nameParts lets catalogs persist the lineage as a first-class field with full fidelity.

The remaining changes (drop-command parity, aliasPlan metadata retention, metric_view.* properties, parseMultipartIdentifier, tableType == VIEW audit, multi-level-namespace lift, HMS round-trip marker) are mechanical follow-ups that fall out of supporting metric views as a real CatalogTableType and as a v2 catalog citizen -- without them, basic operations like DROP VIEW, DESCRIBE TABLE EXTENDED, CREATE VIEW (a COMMENT 'c') WITH METRICS ..., or CREATE VIEW cat.db1.db2.mv WITH METRICS ... would silently degrade.

Does this PR introduce any user-facing change?

Yes, both for end users and for catalog plugin developers:

End users:

  • CREATE VIEW <ident> WITH METRICS ... now works against any v2 catalog that implements ViewCatalog, including catalogs with multi-level namespace targets. Previously it was rejected with MISSING_CATALOG_ABILITY.VIEWS for non-session catalogs, and capped at single-level namespaces. IF NOT EXISTS and OR REPLACE are honored on the v2 path (regression vs. v1 fixed).
  • A v2 metric view can be queried with SELECT region, measure(count_sum) FROM <mv> ..., dropped with DROP VIEW, listed via SHOW VIEWS (and via SHOW TABLES on a TableViewCatalog, matching v1 SHOW TABLES output), and described with DESCRIBE TABLE / DESCRIBE TABLE EXTENDED. DROP TABLE on a metric view throws WRONG_COMMAND_FOR_OBJECT_TYPE ("Use DROP VIEW instead").
  • SHOW CREATE TABLE on a metric view throws an explicit unsupported-feature error (no round-trippable form yet).
  • Session-catalog metric views are now stored as CatalogTableType.METRIC_VIEW instead of CatalogTableType.VIEW + view.viewWithMetrics=true. Observable in DESCRIBE TABLE EXTENDED's Type row and the tableType column of the tables system table. SQL behavior is unchanged. Hive-metastore-backed metric views also round-trip through HMS via a view.subType = METRIC_VIEW property marker.
  • Error messages from DROP TABLE / DROP VIEW mismatch now mention METRIC_VIEW alongside VIEW.
  • Malformed metric-view YAML now surfaces as INVALID_METRIC_VIEW_YAML (user-correctable) instead of "Spark internal error, please contact support".

Catalog plugin developers:

  • New public API surface in org.apache.spark.sql.connector.catalog: sealed interface Dependency with permits TableDependency, FunctionDependency, both records carrying String[] nameParts. DependencyList(Dependency[]) with defensive copy. All @Evolving, @since 4.2.0.
  • ViewInfo gains a typed viewDependencies() accessor and ViewInfo.Builder.withViewDependencies(...) setter.
  • TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW" constant.
  • CatalogTableType.METRIC_VIEW enum value (v1 surface).
  • V2ViewPreparation (private to org.apache.spark.sql.execution.datasources.v2) gains optional viewDependencies / tableType / retainColumnMetadata hooks.

How was this patch tested?

MetricViewV2CatalogSuite -- 30 tests across 5 sections, all against an in-memory ViewCatalog test fixture (MetricViewRecordingCatalog extends InMemoryTableCatalog with TableViewCatalog):

Section 1 -- CREATE-related (11 tests):

  • V2 catalog receives METRIC_VIEW table type and view text via ViewInfo.
  • V2 catalog path populates metric_view.* descriptor properties + view context (currentCatalog / currentNamespace) + captured SQL configs.
  • V2 catalog path captures SQLSource and comment.
  • Metric view columns carry metric_view.type / metric_view.expr in column metadata.
  • User-specified column names with comments preserve metric_view.* metadata (pins the aliasPlan(retainMetadata = true) fix).
  • CREATE OR REPLACE VIEW ... WITH METRICS replaces an existing v2 metric view (asserts on the replacement's distinguishing fields: queryText, metric_view.where, dependencies).
  • CREATE VIEW IF NOT EXISTS ... WITH METRICS is a no-op when the view exists (catalog never sees the second createView call).
  • CREATE VIEW ... WITH METRICS over a v2 table at the ident throws TABLE_OR_VIEW_ALREADY_EXISTS (analyzer-time pre-check).
  • CREATE VIEW IF NOT EXISTS ... WITH METRICS is a no-op when a v2 table sits at the ident (v1 parity).
  • CREATE VIEW ... WITH METRICS on a non-ViewCatalog catalog fails with MISSING_CATALOG_ABILITY.VIEWS.
  • CREATE VIEW ... WITH METRICS at a multi-level-namespace v2 target (testcat.ns_a.ns_b.mv_deep) succeeds (pins the analyzeMetricViewText lift to Seq[String]).

Section 2 -- Dependency extraction (5 tests):

  • SQL source JOIN captures both tables as 3-part nameParts.
  • SQL source subquery deduplicates same-table references.
  • SQL source self-join deduplicates same-table references.
  • V1 session-catalog source emits 2-or-3-part nameParts.
  • Multi-level V2 namespace source (testcat.ns_a.ns_b.events_deep) emits 4-part nameParts.

Section 3 -- SELECT cases (5 tests, modeled on MetricViewSuite patterns):

  • SELECT measure(count_sum) FROM <mv> GROUP BY region returns aggregated rows (exercises the full loadTableOrView -> MetadataTable(ViewInfo) -> V1Table.toCatalogTable(ViewInfo) -> ResolveMetricView round-trip).
  • SELECT measure(...) WHERE region = ... -- query-layer filter on top of the view.
  • View's pre-defined where clause is applied (where = Some("count > 1") filters at view-resolution time).
  • Multiple measures with different aggregations (sum / sum / max).
  • ORDER BY measure(...) DESC LIMIT 1 over the metric view.

Section 4 -- DESCRIBE cases (2 tests):

  • DESCRIBE TABLE EXTENDED round-trips through loadTableOrView and emits the View Text / Type rows (gated through the CatalogTable.toJsonLinkedHashMap VIEW gate, which now accepts METRIC_VIEW).
  • DESCRIBE TABLE (non-EXTENDED) returns the aliased columns.

Section 5 -- DROP / SHOW cases (7 tests):

  • DROP VIEW succeeds on a v2 metric view.
  • DROP VIEW IF EXISTS on a non-existent v2 metric view is a no-op.
  • DROP TABLE on a v2 metric view throws WRONG_COMMAND_FOR_OBJECT_TYPE ("Use DROP VIEW instead", per SPARK-56655's DropTableExec) and asserts the metric view is not deleted.
  • DROP TABLE IF EXISTS on a v2 metric view also throws (IF EXISTS doesn't silence the wrong-type error, v1 parity).
  • SHOW CREATE TABLE on a v2 metric view throws UNSUPPORTED_FEATURE.TABLE_OPERATION with operation "SHOW CREATE TABLE".
  • SHOW TABLES on a TableViewCatalog lists both tables and metric views (matches v1 SHOW TABLES output per SPARK-56655).
  • SHOW VIEWS lists v2 metric views.

Existing session-catalog metric-view tests (MetricViewSuite, SimpleMetricViewSuite, HiveMetricViewSuite) and v1 path tests pass unchanged. DDLSuite and HiveDDLSuite had their tableTypes enumerations updated to include METRIC_VIEW in two assertion lists.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Sonnet 4.7)

@chenwang-databricks chenwang-databricks force-pushed the metric-view-on-51419 branch 3 times, most recently from 245dc93 to 9190bf0 Compare April 28, 2026 14:47
@chenwang-databricks chenwang-databricks changed the title [Draft] Support METRIC_VIEW creation on V2 catalogs (stacks on #51419) [SPARK-54119] Support METRIC_VIEW creation on V2 catalogs Apr 28, 2026
@chenwang-databricks chenwang-databricks marked this pull request as ready for review April 28, 2026 15:19
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for extending metric views to v2 catalogs. The shape of the change is right — route through ViewCatalog.createView with PROP_TABLE_TYPE = METRIC_VIEW and a typed viewDependencies, and promote METRIC_VIEW to a first-class CatalogTableType. A few significant concerns:

1. v2 path forks instead of reusing V2ViewPreparation and changes behavior on IF NOT EXISTS / OR REPLACE. CreateMetricViewCommand.createMetricViewInV2Catalog is open-coded inside a v1 RunnableCommand and duplicates most of V2ViewPreparation.buildViewInfo, but with materially different behavior: allowExisting and replace are silently dropped on the v2 path (regression vs. v1, which honored ignoreIfExists), there's no viewExists short-circuit, no createOrReplaceView, no cross-type collision decoding (so a table sitting at the ident leaks ViewAlreadyExistsException instead of unsupportedCreateOrReplaceViewOnTableError), and the CharVarcharUtils / SchemaUtils.checkColumnNameDuplication / checkIndeterminateCollationInSchema / owner-stamping work that V2ViewPreparation does is missing. Refactoring this into a CreateV2MetricViewExec (or extending V2ViewPreparation with a viewDependencies / extraProperties hook) plus a DataSourceV2Strategy rule would close all of these gaps in one move.

2. Audit other tableType == VIEW gates for METRIC_VIEW. Promoting metric views to a distinct CatalogTableType opens up silent regressions wherever existing code branches on VIEW. Two I'm fairly sure about: CatalogTable.toJsonLinkedHashMap (interface.scala:670) gates View Text / View Original Text / View Schema Mode / View Catalog and Namespace / SQL Path on tableType == VIEW, so DESCRIBE TABLE EXTENDED on a metric view will lose all of those rows — visible regression vs. the pre-PR viewWithMetrics storage. And HiveExternalCatalog.{createTable, alterTable, restoreTableMetadata} (lines 277, 297, 598, 838) only special-case VIEW, so a Hive-metastore-backed session-catalog metric view will go through the wrong branches on persist and read-back. Tests don't cover Hive-backed metric views, so this would only surface in production. Worth grepping tableType == CatalogTableType.VIEW and tableType == VIEW repo-wide and deciding which sites also need to accept METRIC_VIEW.

3. TableDependency.tableFullName is a single dot-joined string but the Javadoc claims a fixed three-part shape. This breaks down two ways: (a) MetricViewHelper.collectTableDependencies uses unquotedString for V1 sources, which yields a 2-part db.tbl whenever TableIdentifier.catalog is unset, and mkString(".") for V2 sources, which yields N-part names for catalogs with multi-level namespaces (e.g., Iceberg cat.db1.db2.tbl is 4 parts) — so the producer doesn't honor the contract; (b) any consumer parsing the string back via split/join loses arity and is ambiguous against quoted identifiers containing a literal .. Since this is brand-new public API, it's the right time to make the storage structural — e.g., TableDependency(String[] nameParts) — rather than committing to a string the producer can't always honor.

4. MetricViewPlanner.parseYAML raises SparkException.internalError for malformed user YAML. Replacing invalidLiteralForWindowDurationError is right, but internalError is the wrong category — a typo in the user's YAML body shouldn't surface as "Spark internal error, please contact support." Should be a QueryCompilationErrors / AnalysisException with a real error class.

Minor stuff (typed DTO sealing/mutability, the ViewInfo constructor's metric-view special case, missing tests for OR REPLACE / IF NOT EXISTS / V1-source dependency / multi-level-namespace V2 source / read-back through loadRelation) is in the inline comments below.

Comment thread sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewInfo.java Outdated
Comment thread sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/Dependency.java Outdated
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request Apr 29, 2026
…eedback

Address all 4 substantive concerns and 8 inline comments from cloud-fan's
review on PR apache#55487.

V2 path refactor (concern 1, inline 87/111/142):
- New CreateV2MetricViewExec extends V2ViewPreparation (mirrors
  CreateV2ViewExec): viewExists short-circuit on IF NOT EXISTS, OR REPLACE
  via createOrReplaceView, and cross-type collision decoding via
  ViewAlreadyExistsException -> tableExists -> EXPECT_VIEW_NOT_TABLE.
- DataSourceV2Strategy rule routes CreateMetricViewCommand on a
  non-session catalog to the new exec; CreateMetricViewCommand keeps only
  the v1 session-catalog path in run().
- V2ViewPreparation gains optional viewDependencies / tableType hooks the
  metric-view subclass populates; plain views leave them None.
- Picks up CharVarcharUtils.getRawSchema, SchemaUtils.checkColumn-
  NameDuplication, SchemaUtils.checkIndeterminateCollationInSchema, and
  PROP_OWNER stamping for free via the shared trait.

Structural TableDependency / FunctionDependency (concern 3, inline 30/169):
- Replace single-string `tableFullName` with `String[] nameParts`; arity
  is preserved per source, unambiguous against quoted identifiers
  containing literal `.`. Apply same shape to FunctionDependency.
- Dependency.table / .function factories take varargs.
- MetricViewHelper.collectTableDependencies returns Seq[Seq[String]];
  each match arm emits structural parts (V1 sources via
  TableIdentifier.nameParts; V2 sources via catalog +: namespace :+ name).

Sealed Dependency + defensive DependencyList (inline 30 / 40):
- `sealed interface Dependency permits TableDependency, FunctionDependency`
  enforces the documented "is one of" structurally.
- DependencyList canonical constructor and accessor defensively clone the
  array so consumers can't mutate stored ViewInfo dependency state.

ViewInfo constructor cleanup (inline 68):
- Drop the metric-view-specific PROP_TABLE_TYPE branch in the generic
  constructor in favor of `properties().putIfAbsent(...)`. Callers that
  want a more specific kind (e.g. METRIC_VIEW) call
  BaseBuilder.withTableType(...) before build() -- exercised by
  CreateV2MetricViewExec via the new V2ViewPreparation tableType hook.

VIEW-gate audit (concern 2, inline 1088):
- CatalogTable.toJsonLinkedHashMap (interface.scala:670) accepts
  METRIC_VIEW so DESCRIBE TABLE EXTENDED still emits "View Text" /
  "View Original Text" / "View Schema Mode" / "View Catalog and
  Namespace" / "SQL Path" rows for metric views.
- HiveExternalCatalog: 4 sites (createTable empty-schema fallback x2,
  alterTable VIEW path, restoreTableMetadata read-back) accept METRIC_VIEW
  so a Hive-metastore-backed session-catalog metric view round-trips.
- Repo-wide: SessionCatalog.isView, InMemoryCatalog.listViews,
  RelationResolution.createDataSourceV1Scan (streaming-on-view rejection),
  Analyzer.lookupTableOrView (ResolvedPersistentView wrapping),
  rules.saveDataIntoView, DataStreamWriter.writeToV1Table,
  DescribeRelationJsonCommand.describePartitionInfoJson,
  AnalyzeColumnCommand, AnalyzePartitionCommand, CommandUtils.analyzeTable,
  tables.CreateTableLike provider lookup,
  tables.AlterTableAddColumnsCommand, tables.describeDetailedPartitionInfo,
  tables.ShowCreateTableCommand (3 sites) -- all extended to accept
  METRIC_VIEW.

Real error class for malformed YAML (concern 4, inline 70):
- New INVALID_METRIC_VIEW_YAML error condition (sqlState 42K0L) +
  QueryCompilationErrors.invalidMetricViewYamlError. Replaces
  SparkException.internalError so a typo in the user's YAML body surfaces
  as a user-correctable AnalysisException.

Tests (inline 39):
- CREATE OR REPLACE VIEW WITH METRICS replaces the existing view.
- CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when the view exists.
- CREATE VIEW WITH METRICS over a v2 table at the ident throws
  EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE.
- CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when a v2 table sits
  at the ident.
- V1 session-catalog source dependency: nameParts arity 2-3.
- Multi-level V2 namespace source dependency: nameParts arity N+2.
- DESCRIBE TABLE EXTENDED on a v2 metric view -- read-back through
  loadRelation -> MetadataOnlyTable -> V1Table.toCatalogTable(ViewInfo).
- Existing dep tests updated to assert nameParts instead of tableFullName.

Misc (inline 178):
- Scaladoc on MetricView.getProperties calls out that
  metric_view.from.sql / metric_view.where are truncated to
  Constants.MAXIMUM_PROPERTY_SIZE and are descriptive (not round-trippable)
  values; consumers should re-read the YAML body for full SQL.
Comment thread sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala Outdated
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request Apr 29, 2026
…e#55487

CI fix:
- Restore the missing `"MISSING_CATALOG_ABILITY.VIEWS") {` line in
  `MetricViewV2CatalogSuite.scala` whose absence broke the test file's
  parse (compileIncremental "four errors found" -- ')' expected, unmatched
  '}', unclosed multi-line string, '}' expected at EOF).

Self-review feedback:
- Shorten `viewDependencies` and `tableType` Scaladocs on
  `V2ViewPreparation` to one sentence each; drop the plain-view /
  metric-view distinctions.
- Consolidate the duplicated `run()` body from `CreateV2ViewExec` and
  `CreateV2MetricViewExec` into a new `V2CreateViewPreparation` trait
  (extends `V2ViewPreparation`) that owns the shared CREATE-side
  flow: `viewExists` short-circuit, `OR REPLACE` via
  `createOrReplaceView`, and `ViewAlreadyExistsException` decoding for
  cross-type collisions. Both subclasses now extend
  `V2CreateViewPreparation` and inherit `run()` unchanged. The intermediate
  trait keeps `V2ViewPreparation` (also used by `AlterV2ViewExec`) free of
  CREATE-only fields.
- Drop the defensive `case _: ResolvedIdentifier =>` branch in
  `CreateMetricViewCommand.run` -- the catch-all internal-error case is
  sufficient now that `DataSourceV2Strategy` reliably intercepts the
  non-session path.
- `SHOW CREATE TABLE` on a metric view now throws the dedicated
  `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000) instead
  of falling through to the VIEW branch. Reverts the three METRIC_VIEW
  additions in `tables.scala`'s VIEW gates from the previous round.

Test reorg + new coverage in `MetricViewV2CatalogSuite`:
- Reorganized into 5 sections: (1) Create-related, (2) Dependency
  extraction, (3) SELECT cases, (4) DESCRIBE cases, (5) DROP / SHOW.
- Trimmed the `OR REPLACE` test's `!== firstYaml` assertion; replaced
  with positive assertions on the captured ViewInfo's queryText,
  metric_view.where, and viewDependencies (parts shape).
- New `SELECT from a v2 metric view returns aggregated rows` -- exercises
  the `loadRelation` -> `MetadataOnlyTable(ViewInfo)` ->
  `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip
  and asserts on the aggregated output rows.
- New `DESCRIBE TABLE on a v2 metric view returns the aliased columns`
  (non-EXTENDED variant alongside the existing EXTENDED test).
- New `SHOW TABLES does not list v2 metric views` (RelationCatalog
  contract: tables only).
- New `SHOW VIEWS lists v2 metric views`.
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request Apr 29, 2026
… TABLE coverage

Add the test cases requested on PR apache#55487:

SELECT read-back (section 3) -- 4 new tests modeled on `MetricViewSuite`
patterns:
- Existing test fixed to use `measure(count_sum)` (the v1 suite shows the
  required syntax for measure columns) and switched to `checkAnswer`
  against an equivalent raw aggregation over the source.
- `SELECT measure(...) with a WHERE clause on a dimension` -- exercises a
  filter at the query layer.
- `SELECT against a v2 metric view honors the view's pre-defined where
  clause` -- creates the metric view with `where = Some("count > 1")`,
  asserts only the matching rows surface.
- `SELECT from a v2 metric view supports multiple measures with different
  aggregations` -- adds price_sum / price_max alongside count_sum.
- `SELECT from a v2 metric view supports ORDER BY and LIMIT on measures`.

DROP TABLE on a v2 metric view (section 5) -- 2 new tests:
- `DROP TABLE on a v2 metric view throws EXPECT_TABLE_NOT_VIEW` --
  `DropTableExec`'s `tableExists` probe returns false (RelationCatalog
  passive filtering), the `viewExists` fallback returns true, so the exec
  emits `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE`. Also asserts the metric
  view is still present in the catalog after the failed DROP.
- `DROP TABLE IF EXISTS on a v2 metric view also throws
  EXPECT_TABLE_NOT_VIEW` -- IF EXISTS does not silence the wrong-type
  error (v1 parity).

SHOW CREATE TABLE on a v2 metric view (section 5) -- 1 new test:
- `SHOW CREATE TABLE on a v2 metric view is unsupported` -- routes
  through `DataSourceV2Strategy`'s `ResolvedPersistentView` case and
  throws `UNSUPPORTED_FEATURE.TABLE_OPERATION` with operation
  "SHOW CREATE TABLE".
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request Apr 30, 2026
…eedback

Address all 4 substantive concerns and 8 inline comments from cloud-fan's
review on PR apache#55487.

V2 path refactor (concern 1, inline 87/111/142):
- New CreateV2MetricViewExec extends V2ViewPreparation (mirrors
  CreateV2ViewExec): viewExists short-circuit on IF NOT EXISTS, OR REPLACE
  via createOrReplaceView, and cross-type collision decoding via
  ViewAlreadyExistsException -> tableExists -> EXPECT_VIEW_NOT_TABLE.
- DataSourceV2Strategy rule routes CreateMetricViewCommand on a
  non-session catalog to the new exec; CreateMetricViewCommand keeps only
  the v1 session-catalog path in run().
- V2ViewPreparation gains optional viewDependencies / tableType hooks the
  metric-view subclass populates; plain views leave them None.
- Picks up CharVarcharUtils.getRawSchema, SchemaUtils.checkColumn-
  NameDuplication, SchemaUtils.checkIndeterminateCollationInSchema, and
  PROP_OWNER stamping for free via the shared trait.

Structural TableDependency / FunctionDependency (concern 3, inline 30/169):
- Replace single-string `tableFullName` with `String[] nameParts`; arity
  is preserved per source, unambiguous against quoted identifiers
  containing literal `.`. Apply same shape to FunctionDependency.
- Dependency.table / .function factories take varargs.
- MetricViewHelper.collectTableDependencies returns Seq[Seq[String]];
  each match arm emits structural parts (V1 sources via
  TableIdentifier.nameParts; V2 sources via catalog +: namespace :+ name).

Sealed Dependency + defensive DependencyList (inline 30 / 40):
- `sealed interface Dependency permits TableDependency, FunctionDependency`
  enforces the documented "is one of" structurally.
- DependencyList canonical constructor and accessor defensively clone the
  array so consumers can't mutate stored ViewInfo dependency state.

ViewInfo constructor cleanup (inline 68):
- Drop the metric-view-specific PROP_TABLE_TYPE branch in the generic
  constructor in favor of `properties().putIfAbsent(...)`. Callers that
  want a more specific kind (e.g. METRIC_VIEW) call
  BaseBuilder.withTableType(...) before build() -- exercised by
  CreateV2MetricViewExec via the new V2ViewPreparation tableType hook.

VIEW-gate audit (concern 2, inline 1088):
- CatalogTable.toJsonLinkedHashMap (interface.scala:670) accepts
  METRIC_VIEW so DESCRIBE TABLE EXTENDED still emits "View Text" /
  "View Original Text" / "View Schema Mode" / "View Catalog and
  Namespace" / "SQL Path" rows for metric views.
- HiveExternalCatalog: 4 sites (createTable empty-schema fallback x2,
  alterTable VIEW path, restoreTableMetadata read-back) accept METRIC_VIEW
  so a Hive-metastore-backed session-catalog metric view round-trips.
- Repo-wide: SessionCatalog.isView, InMemoryCatalog.listViews,
  RelationResolution.createDataSourceV1Scan (streaming-on-view rejection),
  Analyzer.lookupTableOrView (ResolvedPersistentView wrapping),
  rules.saveDataIntoView, DataStreamWriter.writeToV1Table,
  DescribeRelationJsonCommand.describePartitionInfoJson,
  AnalyzeColumnCommand, AnalyzePartitionCommand, CommandUtils.analyzeTable,
  tables.CreateTableLike provider lookup,
  tables.AlterTableAddColumnsCommand, tables.describeDetailedPartitionInfo,
  tables.ShowCreateTableCommand (3 sites) -- all extended to accept
  METRIC_VIEW.

Real error class for malformed YAML (concern 4, inline 70):
- New INVALID_METRIC_VIEW_YAML error condition (sqlState 42K0L) +
  QueryCompilationErrors.invalidMetricViewYamlError. Replaces
  SparkException.internalError so a typo in the user's YAML body surfaces
  as a user-correctable AnalysisException.

Tests (inline 39):
- CREATE OR REPLACE VIEW WITH METRICS replaces the existing view.
- CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when the view exists.
- CREATE VIEW WITH METRICS over a v2 table at the ident throws
  EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE.
- CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when a v2 table sits
  at the ident.
- V1 session-catalog source dependency: nameParts arity 2-3.
- Multi-level V2 namespace source dependency: nameParts arity N+2.
- DESCRIBE TABLE EXTENDED on a v2 metric view -- read-back through
  loadRelation -> MetadataOnlyTable -> V1Table.toCatalogTable(ViewInfo).
- Existing dep tests updated to assert nameParts instead of tableFullName.

Misc (inline 178):
- Scaladoc on MetricView.getProperties calls out that
  metric_view.from.sql / metric_view.where are truncated to
  Constants.MAXIMUM_PROPERTY_SIZE and are descriptive (not round-trippable)
  values; consumers should re-read the YAML body for full SQL.
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request Apr 30, 2026
…e#55487

CI fix:
- Restore the missing `"MISSING_CATALOG_ABILITY.VIEWS") {` line in
  `MetricViewV2CatalogSuite.scala` whose absence broke the test file's
  parse (compileIncremental "four errors found" -- ')' expected, unmatched
  '}', unclosed multi-line string, '}' expected at EOF).

Self-review feedback:
- Shorten `viewDependencies` and `tableType` Scaladocs on
  `V2ViewPreparation` to one sentence each; drop the plain-view /
  metric-view distinctions.
- Consolidate the duplicated `run()` body from `CreateV2ViewExec` and
  `CreateV2MetricViewExec` into a new `V2CreateViewPreparation` trait
  (extends `V2ViewPreparation`) that owns the shared CREATE-side
  flow: `viewExists` short-circuit, `OR REPLACE` via
  `createOrReplaceView`, and `ViewAlreadyExistsException` decoding for
  cross-type collisions. Both subclasses now extend
  `V2CreateViewPreparation` and inherit `run()` unchanged. The intermediate
  trait keeps `V2ViewPreparation` (also used by `AlterV2ViewExec`) free of
  CREATE-only fields.
- Drop the defensive `case _: ResolvedIdentifier =>` branch in
  `CreateMetricViewCommand.run` -- the catch-all internal-error case is
  sufficient now that `DataSourceV2Strategy` reliably intercepts the
  non-session path.
- `SHOW CREATE TABLE` on a metric view now throws the dedicated
  `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000) instead
  of falling through to the VIEW branch. Reverts the three METRIC_VIEW
  additions in `tables.scala`'s VIEW gates from the previous round.

Test reorg + new coverage in `MetricViewV2CatalogSuite`:
- Reorganized into 5 sections: (1) Create-related, (2) Dependency
  extraction, (3) SELECT cases, (4) DESCRIBE cases, (5) DROP / SHOW.
- Trimmed the `OR REPLACE` test's `!== firstYaml` assertion; replaced
  with positive assertions on the captured ViewInfo's queryText,
  metric_view.where, and viewDependencies (parts shape).
- New `SELECT from a v2 metric view returns aggregated rows` -- exercises
  the `loadRelation` -> `MetadataOnlyTable(ViewInfo)` ->
  `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip
  and asserts on the aggregated output rows.
- New `DESCRIBE TABLE on a v2 metric view returns the aliased columns`
  (non-EXTENDED variant alongside the existing EXTENDED test).
- New `SHOW TABLES does not list v2 metric views` (RelationCatalog
  contract: tables only).
- New `SHOW VIEWS lists v2 metric views`.
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request Apr 30, 2026
… TABLE coverage

Add the test cases requested on PR apache#55487:

SELECT read-back (section 3) -- 4 new tests modeled on `MetricViewSuite`
patterns:
- Existing test fixed to use `measure(count_sum)` (the v1 suite shows the
  required syntax for measure columns) and switched to `checkAnswer`
  against an equivalent raw aggregation over the source.
- `SELECT measure(...) with a WHERE clause on a dimension` -- exercises a
  filter at the query layer.
- `SELECT against a v2 metric view honors the view's pre-defined where
  clause` -- creates the metric view with `where = Some("count > 1")`,
  asserts only the matching rows surface.
- `SELECT from a v2 metric view supports multiple measures with different
  aggregations` -- adds price_sum / price_max alongside count_sum.
- `SELECT from a v2 metric view supports ORDER BY and LIMIT on measures`.

DROP TABLE on a v2 metric view (section 5) -- 2 new tests:
- `DROP TABLE on a v2 metric view throws EXPECT_TABLE_NOT_VIEW` --
  `DropTableExec`'s `tableExists` probe returns false (RelationCatalog
  passive filtering), the `viewExists` fallback returns true, so the exec
  emits `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE`. Also asserts the metric
  view is still present in the catalog after the failed DROP.
- `DROP TABLE IF EXISTS on a v2 metric view also throws
  EXPECT_TABLE_NOT_VIEW` -- IF EXISTS does not silence the wrong-type
  error (v1 parity).

SHOW CREATE TABLE on a v2 metric view (section 5) -- 1 new test:
- `SHOW CREATE TABLE on a v2 metric view is unsupported` -- routes
  through `DataSourceV2Strategy`'s `ResolvedPersistentView` case and
  throws `UNSUPPORTED_FEATURE.TABLE_OPERATION` with operation
  "SHOW CREATE TABLE".
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2 review.

Status: 13 prior findings addressed (4 main concerns + 9 inline), 0 remaining, 4 new (3 newly introduced in commits since the prior review at 8f62efc6, 1 late catch). The shared V2ViewPreparation / V2CreateViewPreparation refactor reads cleanly, the typed Dependency / DependencyList API replaces the dot-joined-string form, the INVALID_METRIC_VIEW_YAML error is in place, and the tableType == VIEW audit covers most sites. Four follow-ups inline.

Comment thread sql/api/src/main/scala/org/apache/spark/sql/types/StructField.scala Outdated
Comment thread sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/Column.java Outdated
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request May 1, 2026
…pache#55487

Address the four new findings from cloud-fan's round 2 review of
PR apache#55487 plus the corresponding CI fixes. Combines the revert of
the StructField/Column public-API commit (per cloud-fan's split
request) with the three remaining metric-view fixes into one commit.

Cloud-fan finding #1 (`verifyAlterTableType` audit miss):

* `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in
  the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without
  this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently
  succeeds (renames as if a regular table), and
  `ALTER VIEW <metric_view> RENAME TO ...` throws
  `cannotAlterTableWithAlterViewError` (wrong category for a view-
  kind). Widen both arms to also accept `METRIC_VIEW`.

Cloud-fan finding apache#2/apache#3 (StructField commit split-out):

* Revert d55143f ("Public StructField.json / .fromJson +
  Column.fromStructField"). None of the three new public APIs are
  called by code in this PR -- they exist to support the downstream
  Unity Catalog connector. Per cloud-fan's request, split them out
  into a separate PR with its own JIRA ticket so they can be
  reviewed independently. Reverting the commit also removes the
  `Column.fromStructField` divergence from `structFieldToV2Column`
  (which would have duplicated the comment field in
  `metadataInJSON`); the split-out PR fixes that inline.

  The revert also fixes two CI failures introduced by the same
  commit:
  - `mimaReportBinaryIssues` losing `StructField.tupled` /
    `StructField.curried` / `AbstractFunction4` parent on the
    `StructField` companion object (sql-api 4.0.0 binary surface).
  - `Javaunidoc` failing on the `{@link StructField.json}` /
    `{@link StructField.fromJson}` references that javadoc rejects
    (member separator must be `#`, not `.`).

Cloud-fan finding apache#4 (multi-level namespace lift):

* `MetricViewHelper.analyzeMetricViewText` previously required a
  `TableIdentifier`, which can only carry up to 3 name parts. v2
  metric views with multi-level-namespace targets (e.g.
  `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with
  `requiresSinglePartNamespaceError`. Lift the helper to take
  `nameParts: Seq[String]` directly. The synthetic `CatalogTable`
  used as analysis context still needs a TableIdentifier (it's a
  v1 type); we collapse the intermediate namespace components into
  the synthetic `database` slot via dot-join. Since the synthetic
  identifier is never used to resolve the view body itself, this
  collapse is observationally invisible to the analyzed plan, and
  `verifyTemporaryObjectsNotExists` still receives the full
  `nameParts` so error rendering preserves the multi-part form.

  Update both call sites:
  - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1)
    passes `name.nameParts`.
  - `DataSourceV2Strategy` (v2) passes
    `(catalog.name() +: ident.namespace() :+ ident.name())`.

  Add a new test in `MetricViewV2CatalogSuite` that creates a
  metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create
  succeeds + the captured `ViewInfo` carries
  `PROP_TABLE_TYPE = METRIC_VIEW`.

Test realignment to SPARK-56655's new contract:

* SPARK-56655 (master) added several v2 view DDL behaviors that
  changed the error categories surfaced for cross-type collisions
  and added view-aware listing on `TableViewCatalog`. Realign 5
  existing test assertions in `MetricViewV2CatalogSuite`:
  - "CREATE VIEW ... WITH METRICS over a v2 table at the ident":
    `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_
    ALREADY_EXISTS` (master's analyzer-time pre-check fires before
    the exec-time decoding in `CreateV2MetricViewExec.run`).
  - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on
    a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` ->
    `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead",
    raised by master's new `DropTableExec.run`).
  - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES
    on a v2 TableViewCatalog lists both tables and metric views":
    SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through
    `listRelationSummaries` so views appear alongside tables in
    the output (matches v1 SHOW TABLES on a session catalog).
  - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing
    to the pre-substitution YAML body (the row carries the raw
    text between the SQL `$$ ... $$` markers, including the
    surrounding `\n`).

* Add an explicit `SHOW CREATE TABLE` rejection guard for v2
  metric views in `DataSourceV2Strategy`. Master's new
  `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident>
  AS <yaml>`, which is not round-trippable as metric-view DDL
  (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
  YAML AS $$ <yaml> $$`). Reject up front with
  `unsupportedTableOperationError("SHOW CREATE TABLE")` rather
  than emit lossy DDL; the existing test already expects this
  exact behavior.

Signed-off-by: chen.wang <chen.wang@databricks.com>
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request May 1, 2026
…pache#55487

Address the four new findings from cloud-fan's round 2 review of
PR apache#55487 plus the corresponding CI fixes. Combines the revert of
the StructField/Column public-API commit (per cloud-fan's split
request) with the three remaining metric-view fixes into one commit.

Cloud-fan finding #1 (`verifyAlterTableType` audit miss):

* `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in
  the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without
  this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently
  succeeds (renames as if a regular table), and
  `ALTER VIEW <metric_view> RENAME TO ...` throws
  `cannotAlterTableWithAlterViewError` (wrong category for a view-
  kind). Widen both arms to also accept `METRIC_VIEW`.

Cloud-fan finding apache#2/apache#3 (StructField commit split-out):

* Revert d55143f ("Public StructField.json / .fromJson +
  Column.fromStructField"). None of the three new public APIs are
  called by code in this PR -- they exist to support the downstream
  Unity Catalog connector. Per cloud-fan's request, split them out
  into a separate PR with its own JIRA ticket so they can be
  reviewed independently. Reverting the commit also removes the
  `Column.fromStructField` divergence from `structFieldToV2Column`
  (which would have duplicated the comment field in
  `metadataInJSON`); the split-out PR fixes that inline.

  The revert also fixes two CI failures introduced by the same
  commit:
  - `mimaReportBinaryIssues` losing `StructField.tupled` /
    `StructField.curried` / `AbstractFunction4` parent on the
    `StructField` companion object (sql-api 4.0.0 binary surface).
  - `Javaunidoc` failing on the `{@link StructField.json}` /
    `{@link StructField.fromJson}` references that javadoc rejects
    (member separator must be `#`, not `.`).

Cloud-fan finding apache#4 (multi-level namespace lift):

* `MetricViewHelper.analyzeMetricViewText` previously required a
  `TableIdentifier`, which can only carry up to 3 name parts. v2
  metric views with multi-level-namespace targets (e.g.
  `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with
  `requiresSinglePartNamespaceError`. Lift the helper to take
  `nameParts: Seq[String]` directly. The synthetic `CatalogTable`
  used as analysis context still needs a TableIdentifier (it's a
  v1 type); we collapse the intermediate namespace components into
  the synthetic `database` slot via dot-join. Since the synthetic
  identifier is never used to resolve the view body itself, this
  collapse is observationally invisible to the analyzed plan, and
  `verifyTemporaryObjectsNotExists` still receives the full
  `nameParts` so error rendering preserves the multi-part form.

  Update both call sites:
  - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1)
    passes `name.nameParts`.
  - `DataSourceV2Strategy` (v2) passes
    `(catalog.name() +: ident.namespace() :+ ident.name())`.

  Add a new test in `MetricViewV2CatalogSuite` that creates a
  metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create
  succeeds + the captured `ViewInfo` carries
  `PROP_TABLE_TYPE = METRIC_VIEW`.

Test realignment to SPARK-56655's new contract:

* SPARK-56655 (master) added several v2 view DDL behaviors that
  changed the error categories surfaced for cross-type collisions
  and added view-aware listing on `TableViewCatalog`. Realign 5
  existing test assertions in `MetricViewV2CatalogSuite`:
  - "CREATE VIEW ... WITH METRICS over a v2 table at the ident":
    `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_
    ALREADY_EXISTS` (master's analyzer-time pre-check fires before
    the exec-time decoding in `CreateV2MetricViewExec.run`).
  - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on
    a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` ->
    `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead",
    raised by master's new `DropTableExec.run`).
  - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES
    on a v2 TableViewCatalog lists both tables and metric views":
    SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through
    `listRelationSummaries` so views appear alongside tables in
    the output (matches v1 SHOW TABLES on a session catalog).
  - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing
    to the pre-substitution YAML body (the row carries the raw
    text between the SQL `$$ ... $$` markers, including the
    surrounding `\n`).

* Add an explicit `SHOW CREATE TABLE` rejection guard for v2
  metric views in `DataSourceV2Strategy`. Master's new
  `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident>
  AS <yaml>`, which is not round-trippable as metric-view DDL
  (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
  YAML AS $$ <yaml> $$`). Reject up front with
  `unsupportedTableOperationError("SHOW CREATE TABLE")` rather
  than emit lossy DDL; the existing test already expects this
  exact behavior.

Signed-off-by: chen.wang <chen.wang@databricks.com>
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request May 1, 2026
…pache#55487

Address the four new findings from cloud-fan's round 2 review of
PR apache#55487 plus the corresponding CI fixes. Combines the revert of
the StructField/Column public-API commit (per cloud-fan's split
request) with the three remaining metric-view fixes into one commit.

Cloud-fan finding #1 (`verifyAlterTableType` audit miss):

* `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in
  the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without
  this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently
  succeeds (renames as if a regular table), and
  `ALTER VIEW <metric_view> RENAME TO ...` throws
  `cannotAlterTableWithAlterViewError` (wrong category for a view-
  kind). Widen both arms to also accept `METRIC_VIEW`.

Cloud-fan finding apache#2/apache#3 (StructField commit split-out):

* Revert d55143f ("Public StructField.json / .fromJson +
  Column.fromStructField"). None of the three new public APIs are
  called by code in this PR -- they exist to support the downstream
  Unity Catalog connector. Per cloud-fan's request, split them out
  into a separate PR with its own JIRA ticket so they can be
  reviewed independently. Reverting the commit also removes the
  `Column.fromStructField` divergence from `structFieldToV2Column`
  (which would have duplicated the comment field in
  `metadataInJSON`); the split-out PR fixes that inline.

  The revert also fixes two CI failures introduced by the same
  commit:
  - `mimaReportBinaryIssues` losing `StructField.tupled` /
    `StructField.curried` / `AbstractFunction4` parent on the
    `StructField` companion object (sql-api 4.0.0 binary surface).
  - `Javaunidoc` failing on the `{@link StructField.json}` /
    `{@link StructField.fromJson}` references that javadoc rejects
    (member separator must be `#`, not `.`).

Cloud-fan finding apache#4 (multi-level namespace lift):

* `MetricViewHelper.analyzeMetricViewText` previously required a
  `TableIdentifier`, which can only carry up to 3 name parts. v2
  metric views with multi-level-namespace targets (e.g.
  `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with
  `requiresSinglePartNamespaceError`. Lift the helper to take
  `nameParts: Seq[String]` directly. The synthetic `CatalogTable`
  used as analysis context still needs a TableIdentifier (it's a
  v1 type); we collapse the intermediate namespace components into
  the synthetic `database` slot via dot-join. Since the synthetic
  identifier is never used to resolve the view body itself, this
  collapse is observationally invisible to the analyzed plan, and
  `verifyTemporaryObjectsNotExists` still receives the full
  `nameParts` so error rendering preserves the multi-part form.

  Update both call sites:
  - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1)
    passes `name.nameParts`.
  - `DataSourceV2Strategy` (v2) passes
    `(catalog.name() +: ident.namespace() :+ ident.name())`.

  Add a new test in `MetricViewV2CatalogSuite` that creates a
  metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create
  succeeds + the captured `ViewInfo` carries
  `PROP_TABLE_TYPE = METRIC_VIEW`.

Test realignment to SPARK-56655's new contract:

* SPARK-56655 (master) added several v2 view DDL behaviors that
  changed the error categories surfaced for cross-type collisions
  and added view-aware listing on `TableViewCatalog`. Realign 5
  existing test assertions in `MetricViewV2CatalogSuite`:
  - "CREATE VIEW ... WITH METRICS over a v2 table at the ident":
    `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_
    ALREADY_EXISTS` (master's analyzer-time pre-check fires before
    the exec-time decoding in `CreateV2MetricViewExec.run`).
  - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on
    a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` ->
    `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead",
    raised by master's new `DropTableExec.run`).
  - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES
    on a v2 TableViewCatalog lists both tables and metric views":
    SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through
    `listRelationSummaries` so views appear alongside tables in
    the output (matches v1 SHOW TABLES on a session catalog).
  - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing
    to the pre-substitution YAML body (the row carries the raw
    text between the SQL `$$ ... $$` markers, including the
    surrounding `\n`).

* Add an explicit `SHOW CREATE TABLE` rejection guard for v2
  metric views in `DataSourceV2Strategy`. Master's new
  `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident>
  AS <yaml>`, which is not round-trippable as metric-view DDL
  (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
  YAML AS $$ <yaml> $$`). Reject up front with
  `unsupportedTableOperationError("SHOW CREATE TABLE")` rather
  than emit lossy DDL; the existing test already expects this
  exact behavior.

Signed-off-by: chen.wang <chen.wang@databricks.com>
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request May 1, 2026
…pache#55487

Address the four new findings from cloud-fan's round 2 review of
PR apache#55487 plus the corresponding CI fixes. Combines the revert of
the StructField/Column public-API commit (per cloud-fan's split
request) with the three remaining metric-view fixes into one commit.

Cloud-fan finding #1 (`verifyAlterTableType` audit miss):

* `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in
  the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without
  this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently
  succeeds (renames as if a regular table), and
  `ALTER VIEW <metric_view> RENAME TO ...` throws
  `cannotAlterTableWithAlterViewError` (wrong category for a view-
  kind). Widen both arms to also accept `METRIC_VIEW`.

Cloud-fan finding apache#2/apache#3 (StructField commit split-out):

* Revert d55143f ("Public StructField.json / .fromJson +
  Column.fromStructField"). None of the three new public APIs are
  called by code in this PR -- they exist to support the downstream
  Unity Catalog connector. Per cloud-fan's request, split them out
  into a separate PR with its own JIRA ticket so they can be
  reviewed independently. Reverting the commit also removes the
  `Column.fromStructField` divergence from `structFieldToV2Column`
  (which would have duplicated the comment field in
  `metadataInJSON`); the split-out PR fixes that inline.

  The revert also fixes two CI failures introduced by the same
  commit:
  - `mimaReportBinaryIssues` losing `StructField.tupled` /
    `StructField.curried` / `AbstractFunction4` parent on the
    `StructField` companion object (sql-api 4.0.0 binary surface).
  - `Javaunidoc` failing on the `{@link StructField.json}` /
    `{@link StructField.fromJson}` references that javadoc rejects
    (member separator must be `#`, not `.`).

Cloud-fan finding apache#4 (multi-level namespace lift):

* `MetricViewHelper.analyzeMetricViewText` previously required a
  `TableIdentifier`, which can only carry up to 3 name parts. v2
  metric views with multi-level-namespace targets (e.g.
  `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with
  `requiresSinglePartNamespaceError`. Lift the helper to take
  `nameParts: Seq[String]` directly. The synthetic `CatalogTable`
  used as analysis context still needs a TableIdentifier (it's a
  v1 type); we collapse the intermediate namespace components into
  the synthetic `database` slot via dot-join. Since the synthetic
  identifier is never used to resolve the view body itself, this
  collapse is observationally invisible to the analyzed plan, and
  `verifyTemporaryObjectsNotExists` still receives the full
  `nameParts` so error rendering preserves the multi-part form.

  Update both call sites:
  - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1)
    passes `name.nameParts`.
  - `DataSourceV2Strategy` (v2) passes
    `(catalog.name() +: ident.namespace() :+ ident.name())`.

  Add a new test in `MetricViewV2CatalogSuite` that creates a
  metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create
  succeeds + the captured `ViewInfo` carries
  `PROP_TABLE_TYPE = METRIC_VIEW`.

Test realignment to SPARK-56655's new contract:

* SPARK-56655 (master) added several v2 view DDL behaviors that
  changed the error categories surfaced for cross-type collisions
  and added view-aware listing on `TableViewCatalog`. Realign 5
  existing test assertions in `MetricViewV2CatalogSuite`:
  - "CREATE VIEW ... WITH METRICS over a v2 table at the ident":
    `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_
    ALREADY_EXISTS` (master's analyzer-time pre-check fires before
    the exec-time decoding in `CreateV2MetricViewExec.run`).
  - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on
    a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` ->
    `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead",
    raised by master's new `DropTableExec.run`).
  - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES
    on a v2 TableViewCatalog lists both tables and metric views":
    SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through
    `listRelationSummaries` so views appear alongside tables in
    the output (matches v1 SHOW TABLES on a session catalog).
  - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing
    to the pre-substitution YAML body (the row carries the raw
    text between the SQL `$$ ... $$` markers, including the
    surrounding `\n`).

* Add an explicit `SHOW CREATE TABLE` rejection guard for v2
  metric views in `DataSourceV2Strategy`. Master's new
  `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident>
  AS <yaml>`, which is not round-trippable as metric-view DDL
  (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
  YAML AS $$ <yaml> $$`). Reject up front with
  `unsupportedTableOperationError("SHOW CREATE TABLE")` rather
  than emit lossy DDL; the existing test already expects this
  exact behavior.

Signed-off-by: chen.wang <chen.wang@databricks.com>
Add support for CREATE VIEW ... WITH METRICS and the corresponding V2 catalog round-trip on top of the ViewCatalog / RelationCatalog APIs introduced by SPARK-52729.

The V2 metric-view create path now builds a ViewInfo and calls ViewCatalog.createView. Metric views are distinguished from plain views with PROP_TABLE_TYPE=METRIC_VIEW, which ViewInfo now preserves when callers intentionally set it. This avoids mutating ViewInfo properties after construction while keeping plain views defaulted to VIEW.

ViewInfo gains a typed viewDependencies field so catalogs can persist structured table/function dependencies without flattening nested lineage into string properties. The metric-view planner collects direct table dependencies from the analyzed source plan and passes them through this field.

Also add first-class CatalogTableType.METRIC_VIEW / TableSummary.METRIC_VIEW_TABLE_TYPE support, V2-to-V1 conversion for MetricView ViewInfo rows, drop-command parity for metric views, metric_view.* descriptive properties, and focused V1/V2 tests for dependency extraction, ViewInfo payloads, DROP VIEW routing, and user-specified column metadata preservation.
…eedback

Address all 4 substantive concerns and 8 inline comments from cloud-fan's
review on PR apache#55487.

V2 path refactor (concern 1, inline 87/111/142):
- New CreateV2MetricViewExec extends V2ViewPreparation (mirrors
  CreateV2ViewExec): viewExists short-circuit on IF NOT EXISTS, OR REPLACE
  via createOrReplaceView, and cross-type collision decoding via
  ViewAlreadyExistsException -> tableExists -> EXPECT_VIEW_NOT_TABLE.
- DataSourceV2Strategy rule routes CreateMetricViewCommand on a
  non-session catalog to the new exec; CreateMetricViewCommand keeps only
  the v1 session-catalog path in run().
- V2ViewPreparation gains optional viewDependencies / tableType hooks the
  metric-view subclass populates; plain views leave them None.
- Picks up CharVarcharUtils.getRawSchema, SchemaUtils.checkColumn-
  NameDuplication, SchemaUtils.checkIndeterminateCollationInSchema, and
  PROP_OWNER stamping for free via the shared trait.

Structural TableDependency / FunctionDependency (concern 3, inline 30/169):
- Replace single-string `tableFullName` with `String[] nameParts`; arity
  is preserved per source, unambiguous against quoted identifiers
  containing literal `.`. Apply same shape to FunctionDependency.
- Dependency.table / .function factories take varargs.
- MetricViewHelper.collectTableDependencies returns Seq[Seq[String]];
  each match arm emits structural parts (V1 sources via
  TableIdentifier.nameParts; V2 sources via catalog +: namespace :+ name).

Sealed Dependency + defensive DependencyList (inline 30 / 40):
- `sealed interface Dependency permits TableDependency, FunctionDependency`
  enforces the documented "is one of" structurally.
- DependencyList canonical constructor and accessor defensively clone the
  array so consumers can't mutate stored ViewInfo dependency state.

ViewInfo constructor cleanup (inline 68):
- Drop the metric-view-specific PROP_TABLE_TYPE branch in the generic
  constructor in favor of `properties().putIfAbsent(...)`. Callers that
  want a more specific kind (e.g. METRIC_VIEW) call
  BaseBuilder.withTableType(...) before build() -- exercised by
  CreateV2MetricViewExec via the new V2ViewPreparation tableType hook.

VIEW-gate audit (concern 2, inline 1088):
- CatalogTable.toJsonLinkedHashMap (interface.scala:670) accepts
  METRIC_VIEW so DESCRIBE TABLE EXTENDED still emits "View Text" /
  "View Original Text" / "View Schema Mode" / "View Catalog and
  Namespace" / "SQL Path" rows for metric views.
- HiveExternalCatalog: 4 sites (createTable empty-schema fallback x2,
  alterTable VIEW path, restoreTableMetadata read-back) accept METRIC_VIEW
  so a Hive-metastore-backed session-catalog metric view round-trips.
- Repo-wide: SessionCatalog.isView, InMemoryCatalog.listViews,
  RelationResolution.createDataSourceV1Scan (streaming-on-view rejection),
  Analyzer.lookupTableOrView (ResolvedPersistentView wrapping),
  rules.saveDataIntoView, DataStreamWriter.writeToV1Table,
  DescribeRelationJsonCommand.describePartitionInfoJson,
  AnalyzeColumnCommand, AnalyzePartitionCommand, CommandUtils.analyzeTable,
  tables.CreateTableLike provider lookup,
  tables.AlterTableAddColumnsCommand, tables.describeDetailedPartitionInfo,
  tables.ShowCreateTableCommand (3 sites) -- all extended to accept
  METRIC_VIEW.

Real error class for malformed YAML (concern 4, inline 70):
- New INVALID_METRIC_VIEW_YAML error condition (sqlState 42K0L) +
  QueryCompilationErrors.invalidMetricViewYamlError. Replaces
  SparkException.internalError so a typo in the user's YAML body surfaces
  as a user-correctable AnalysisException.

Tests (inline 39):
- CREATE OR REPLACE VIEW WITH METRICS replaces the existing view.
- CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when the view exists.
- CREATE VIEW WITH METRICS over a v2 table at the ident throws
  EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE.
- CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when a v2 table sits
  at the ident.
- V1 session-catalog source dependency: nameParts arity 2-3.
- Multi-level V2 namespace source dependency: nameParts arity N+2.
- DESCRIBE TABLE EXTENDED on a v2 metric view -- read-back through
  loadRelation -> MetadataOnlyTable -> V1Table.toCatalogTable(ViewInfo).
- Existing dep tests updated to assert nameParts instead of tableFullName.

Misc (inline 178):
- Scaladoc on MetricView.getProperties calls out that
  metric_view.from.sql / metric_view.where are truncated to
  Constants.MAXIMUM_PROPERTY_SIZE and are descriptive (not round-trippable)
  values; consumers should re-read the YAML body for full SQL.
…e#55487

CI fix:
- Restore the missing `"MISSING_CATALOG_ABILITY.VIEWS") {` line in
  `MetricViewV2CatalogSuite.scala` whose absence broke the test file's
  parse (compileIncremental "four errors found" -- ')' expected, unmatched
  '}', unclosed multi-line string, '}' expected at EOF).

Self-review feedback:
- Shorten `viewDependencies` and `tableType` Scaladocs on
  `V2ViewPreparation` to one sentence each; drop the plain-view /
  metric-view distinctions.
- Consolidate the duplicated `run()` body from `CreateV2ViewExec` and
  `CreateV2MetricViewExec` into a new `V2CreateViewPreparation` trait
  (extends `V2ViewPreparation`) that owns the shared CREATE-side
  flow: `viewExists` short-circuit, `OR REPLACE` via
  `createOrReplaceView`, and `ViewAlreadyExistsException` decoding for
  cross-type collisions. Both subclasses now extend
  `V2CreateViewPreparation` and inherit `run()` unchanged. The intermediate
  trait keeps `V2ViewPreparation` (also used by `AlterV2ViewExec`) free of
  CREATE-only fields.
- Drop the defensive `case _: ResolvedIdentifier =>` branch in
  `CreateMetricViewCommand.run` -- the catch-all internal-error case is
  sufficient now that `DataSourceV2Strategy` reliably intercepts the
  non-session path.
- `SHOW CREATE TABLE` on a metric view now throws the dedicated
  `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000) instead
  of falling through to the VIEW branch. Reverts the three METRIC_VIEW
  additions in `tables.scala`'s VIEW gates from the previous round.

Test reorg + new coverage in `MetricViewV2CatalogSuite`:
- Reorganized into 5 sections: (1) Create-related, (2) Dependency
  extraction, (3) SELECT cases, (4) DESCRIBE cases, (5) DROP / SHOW.
- Trimmed the `OR REPLACE` test's `!== firstYaml` assertion; replaced
  with positive assertions on the captured ViewInfo's queryText,
  metric_view.where, and viewDependencies (parts shape).
- New `SELECT from a v2 metric view returns aggregated rows` -- exercises
  the `loadRelation` -> `MetadataOnlyTable(ViewInfo)` ->
  `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip
  and asserts on the aggregated output rows.
- New `DESCRIBE TABLE on a v2 metric view returns the aliased columns`
  (non-EXTENDED variant alongside the existing EXTENDED test).
- New `SHOW TABLES does not list v2 metric views` (RelationCatalog
  contract: tables only).
- New `SHOW VIEWS lists v2 metric views`.
… TABLE coverage

Add the test cases requested on PR apache#55487:

SELECT read-back (section 3) -- 4 new tests modeled on `MetricViewSuite`
patterns:
- Existing test fixed to use `measure(count_sum)` (the v1 suite shows the
  required syntax for measure columns) and switched to `checkAnswer`
  against an equivalent raw aggregation over the source.
- `SELECT measure(...) with a WHERE clause on a dimension` -- exercises a
  filter at the query layer.
- `SELECT against a v2 metric view honors the view's pre-defined where
  clause` -- creates the metric view with `where = Some("count > 1")`,
  asserts only the matching rows surface.
- `SELECT from a v2 metric view supports multiple measures with different
  aggregations` -- adds price_sum / price_max alongside count_sum.
- `SELECT from a v2 metric view supports ORDER BY and LIMIT on measures`.

DROP TABLE on a v2 metric view (section 5) -- 2 new tests:
- `DROP TABLE on a v2 metric view throws EXPECT_TABLE_NOT_VIEW` --
  `DropTableExec`'s `tableExists` probe returns false (RelationCatalog
  passive filtering), the `viewExists` fallback returns true, so the exec
  emits `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE`. Also asserts the metric
  view is still present in the catalog after the failed DROP.
- `DROP TABLE IF EXISTS on a v2 metric view also throws
  EXPECT_TABLE_NOT_VIEW` -- IF EXISTS does not silence the wrong-type
  error (v1 parity).

SHOW CREATE TABLE on a v2 metric view (section 5) -- 1 new test:
- `SHOW CREATE TABLE on a v2 metric view is unsupported` -- routes
  through `DataSourceV2Strategy`'s `ResolvedPersistentView` case and
  throws `UNSUPPORTED_FEATURE.TABLE_OPERATION` with operation
  "SHOW CREATE TABLE".
…ursion, scalastyle

Address four CI failures introduced by the v2 metric-view changes on this PR:

1. `SparkOperation.tableTypeString` (hive-thriftserver) did not know how to map
   the new `CatalogTableType.METRIC_VIEW` to a Hive JDBC table type, causing
   `SparkMetadataOperationSuite` `GET_TABLE_TYPES` to fail with "Unknown table
   type is found: CatalogTableType(METRIC_VIEW)". Map `METRIC_VIEW` to `"VIEW"`
   on the JDBC wire (Hive only knows `TABLE` / `VIEW`).

2. `HiveClientImpl.toHiveTableType` (hive) had the same gap, breaking 19
   `HiveMetricViewSuite` cases that go through HMS persistence on the V1
   session-catalog path. Map `METRIC_VIEW` to `HiveTableType.VIRTUAL_VIEW`.

3. `MetricViewRecordingCatalog.loadRelation` in `MetricViewV2CatalogSuite`
   recursed infinitely. `class MetricViewRecordingCatalog extends
   InMemoryTableCatalog with RelationCatalog` linearizes `RelationCatalog`
   ahead of `InMemoryTableCatalog`, so `super.loadTable` resolved to
   `RelationCatalog.loadTable`, whose default delegates back to
   `loadRelation`. Qualify the super call to `super[InMemoryTableCatalog]`
   to dispatch directly to the table-catalog implementation.

4. Scalastyle import-order violation in `DataSourceV2Strategy.scala`:
   `metricview.serde.MetricViewFactory` was placed before
   `execution.datasources.*`. Reorder to satisfy the Scalastyle import-order
   rule.

Signed-off-by: chen.wang <chen.wang@databricks.com>
…ble / loadTableOrView

Adapt `MetricViewV2CatalogSuite` to the renames in SPARK-56655, which closed
out the still-`Evolving` v2 view API surface introduced by SPARK-52729:

- `RelationCatalog` -> `TableViewCatalog`
- `MetadataOnlyTable` -> `MetadataTable`
- `loadRelation` -> `loadTableOrView`
- `ViewCatalog` gained a new abstract `renameView(oldIdent, newIdent)`

Updates:
- Imports: drop `MetadataOnlyTable`, `RelationCatalog`; add `MetadataTable`,
  `TableViewCatalog`.
- `MetricViewRecordingCatalog`: mix in `TableViewCatalog` instead of
  `RelationCatalog`; rename the override of `loadRelation` -> `loadTableOrView`,
  and `MetadataOnlyTable` -> `MetadataTable`. The qualified
  `super[InMemoryTableCatalog].loadTable(ident)` recursion-bypass continues
  to work (the default `loadTable` derived from `TableViewCatalog` still
  delegates to `loadTableOrView`, hence the bypass is still required).
- Add a `renameView(oldIdent, newIdent)` implementation: moves the
  `views` map entry, the `MetricViewRecordingCatalog.capturedViews` entry,
  and rejects via `ViewAlreadyExistsException` / `NoSuchViewException`
  per the `TableViewCatalog` contract.
- Update doc comments and a few test names that mentioned the old API.

Signed-off-by: chen.wang <chen.wang@databricks.com>
…-trip

Five fixes to recover the metric-view test suites after the latest rebase
on apache/spark master:

1. `MetricViewV2CatalogSuite.withTestCatalogTables`: a few negative tests
   pre-create a regular table at `mv` to exercise cross-type collisions,
   but the previous cleanup only did `DROP VIEW IF EXISTS mv`, leaving the
   table behind to poison every subsequent test in the suite (cleanup
   raised `WRONG_COMMAND_FOR_OBJECT_TYPE` because mv was a TABLE, not a
   VIEW). Sweep `mv` as both view and table in cleanup, wrapped in `Try`
   so the symmetric "wrong command for type" errors don't escape.

2. Test assertions on `info.queryText() === yaml`: the captured queryText
   includes the `\n` between the SQL `$$` markers and the YAML body, but
   `MetricViewFactory.toYAML(...)` doesn't emit surrounding newlines. Trim
   both sides before the equality check (3 sites).

3. `MISSING_CATALOG_ABILITY` assertion: SPARK-56655 added the `.VIEWS`
   subclass; the bare error condition no longer surfaces directly. Update
   the assertion to expect `MISSING_CATALOG_ABILITY.VIEWS`.

4. `MetricViewRecordingCatalog.{createView, replaceView}`: the
   `TableViewCatalog` active-rejection contract requires `createView` to
   throw `ViewAlreadyExistsException` (and `replaceView` to throw
   `NoSuchViewException`) when a *table* sits at the ident, not just when
   a view does. The previous override only checked `views`, so the
   pre-table-then-create-view test never raised the cross-type collision
   `CreateV2MetricViewExec` is supposed to decode into
   `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE`. Add a tables-only existence
   probe (bypassing `TableViewCatalog.tableExists`'s default, which would
   recurse through our own `loadTableOrView`) and reject up front.

5. `HiveExternalCatalog` HMS round-trip for metric views: `HiveTableType`
   has no `METRIC_VIEW` variant -- both regular views and metric views
   serialize as `VIRTUAL_VIEW`, and `HiveClientImpl.getTableOption` always
   maps `VIRTUAL_VIEW` back to `CatalogTableType.VIEW`, dropping the
   metric-view distinction on read. Result: `HiveMetricViewSuite` (which
   reuses the same v1 `MetricViewSuite` against
   `HiveExternalCatalog`) saw the YAML body parsed as SQL, raising
   `PARSE_SYNTAX_ERROR` on every test.

   Persist a `view.subType = METRIC_VIEW` property on write
   (`tableMetaToTableProps`) and lift `tableType` back to `METRIC_VIEW`
   on read (`restoreTableMetadata`) when the marker is present. Property
   name follows the existing `view.*` convention so it round-trips
   through HMS like other view metadata.

Also factored the `metric_view.*` column metadata preservation across
user-specified renames into a hookable `retainColumnMetadata: Boolean`
on `V2ViewPreparation` (defaults to `false`); `CreateV2MetricViewExec`
overrides it to `true`. Mirrors what
`ViewHelper.prepareTable(isMetricView = true)` already does on the v1
path, fixing the `key not found: metric_view.type` failure on the v2
"user-specified column names with comments preserve metric_view.*
metadata" test.

Signed-off-by: chen.wang <chen.wang@databricks.com>
…tructField

Add three small public APIs that let v2 catalog connectors (such as the Unity
Catalog Spark connector built on TableViewCatalog) round-trip a Spark
StructField through external storage without reaching for private[sql] helpers
or the singleton-StructType wrap workaround:

- StructField.json / StructField.prettyJson: public counterparts of
  DataType.json / DataType.prettyJson, exposing the existing private[sql]
  jsonValue.
- StructField.fromJson(String): companion-object parser that mirrors
  DataType.fromJson and is the inverse of StructField.json.
- Column.fromStructField(StructField): static factory in the catalog Column
  interface that maps a Spark StructField (with metadata) into a connector
  Column (with metadataInJSON), symmetric to TableInfo.schema() which already
  goes the other way via CatalogV2Util.v2ColumnsToStructType.

Widens DataType.parseStructField from private to private[sql] so the new
StructField.fromJson companion can call it directly.

Tests:
- DataTypeSuite covers StructField round-trip via .json / .fromJson with
  metric_view-style metadata + comment, and an empty-metadata field.
- CatalogV2UtilSuite covers Column.fromStructField for both the
  metadata-bearing and empty-metadata cases.

Signed-off-by: chen.wang <chen.wang@databricks.com>
…pache#55487

Address the four new findings from cloud-fan's round 2 review of
PR apache#55487 plus the corresponding CI fixes. Combines the revert of
the StructField/Column public-API commit (per cloud-fan's split
request) with the three remaining metric-view fixes into one commit.

Cloud-fan finding #1 (`verifyAlterTableType` audit miss):

* `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in
  the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without
  this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently
  succeeds (renames as if a regular table), and
  `ALTER VIEW <metric_view> RENAME TO ...` throws
  `cannotAlterTableWithAlterViewError` (wrong category for a view-
  kind). Widen both arms to also accept `METRIC_VIEW`.

Cloud-fan finding apache#2/apache#3 (StructField commit split-out):

* Revert d55143f ("Public StructField.json / .fromJson +
  Column.fromStructField"). None of the three new public APIs are
  called by code in this PR -- they exist to support the downstream
  Unity Catalog connector. Per cloud-fan's request, split them out
  into a separate PR with its own JIRA ticket so they can be
  reviewed independently. Reverting the commit also removes the
  `Column.fromStructField` divergence from `structFieldToV2Column`
  (which would have duplicated the comment field in
  `metadataInJSON`); the split-out PR fixes that inline.

  The revert also fixes two CI failures introduced by the same
  commit:
  - `mimaReportBinaryIssues` losing `StructField.tupled` /
    `StructField.curried` / `AbstractFunction4` parent on the
    `StructField` companion object (sql-api 4.0.0 binary surface).
  - `Javaunidoc` failing on the `{@link StructField.json}` /
    `{@link StructField.fromJson}` references that javadoc rejects
    (member separator must be `#`, not `.`).

Cloud-fan finding apache#4 (multi-level namespace lift):

* `MetricViewHelper.analyzeMetricViewText` previously required a
  `TableIdentifier`, which can only carry up to 3 name parts. v2
  metric views with multi-level-namespace targets (e.g.
  `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with
  `requiresSinglePartNamespaceError`. Lift the helper to take
  `nameParts: Seq[String]` directly. The synthetic `CatalogTable`
  used as analysis context still needs a TableIdentifier (it's a
  v1 type); we collapse the intermediate namespace components into
  the synthetic `database` slot via dot-join. Since the synthetic
  identifier is never used to resolve the view body itself, this
  collapse is observationally invisible to the analyzed plan, and
  `verifyTemporaryObjectsNotExists` still receives the full
  `nameParts` so error rendering preserves the multi-part form.

  Update both call sites:
  - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1)
    passes `name.nameParts`.
  - `DataSourceV2Strategy` (v2) passes
    `(catalog.name() +: ident.namespace() :+ ident.name())`.

  Add a new test in `MetricViewV2CatalogSuite` that creates a
  metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create
  succeeds + the captured `ViewInfo` carries
  `PROP_TABLE_TYPE = METRIC_VIEW`.

Test realignment to SPARK-56655's new contract:

* SPARK-56655 (master) added several v2 view DDL behaviors that
  changed the error categories surfaced for cross-type collisions
  and added view-aware listing on `TableViewCatalog`. Realign 5
  existing test assertions in `MetricViewV2CatalogSuite`:
  - "CREATE VIEW ... WITH METRICS over a v2 table at the ident":
    `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_
    ALREADY_EXISTS` (master's analyzer-time pre-check fires before
    the exec-time decoding in `CreateV2MetricViewExec.run`).
  - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on
    a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` ->
    `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead",
    raised by master's new `DropTableExec.run`).
  - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES
    on a v2 TableViewCatalog lists both tables and metric views":
    SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through
    `listRelationSummaries` so views appear alongside tables in
    the output (matches v1 SHOW TABLES on a session catalog).
  - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing
    to the pre-substitution YAML body (the row carries the raw
    text between the SQL `$$ ... $$` markers, including the
    surrounding `\n`).

* Add an explicit `SHOW CREATE TABLE` rejection guard for v2
  metric views in `DataSourceV2Strategy`. Master's new
  `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident>
  AS <yaml>`, which is not round-trippable as metric-view DDL
  (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
  YAML AS $$ <yaml> $$`). Reject up front with
  `unsupportedTableOperationError("SHOW CREATE TABLE")` rather
  than emit lossy DDL; the existing test already expects this
  exact behavior.

Signed-off-by: chen.wang <chen.wang@databricks.com>
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 3 review.

Status: All 4 prior findings addressed (verifyAlterTableType audit miss, StructField/Column public-API split, Column.fromStructField comment-duplication via revert, multi-level namespace lift in analyzeMetricViewText), 0 remaining, 7 new (1 newly introduced, 6 late catches). The round 3 follow-up commit cleanly addresses the round 2 findings; remaining concerns center on v1/v2 inconsistencies (descriptive properties, error class, dependency-name arity) and API correctness of the new Dependency records.

I'll be honest about the 6 late catches: most have been latent since round 1 — the structural-form switch fixed the format dimension I flagged but not the arity dimension; the records I suggested have a value-semantics issue I missed; the metric_view.* merge has been on the v2 path since the original PR. Apologies for not surfacing these earlier.

val analyzed = MetricViewHelper.analyzeMetricViewText(
session, nameParts, originalText)
val metricView = MetricViewFactory.fromYAML(originalText)
val mergedProps = properties ++ metricView.getProperties
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The v2 path merges metric_view.from.type / metric_view.from.name / metric_view.from.sql / metric_view.where into the property bag here (properties ++ metricView.getProperties), but the v1 path in CreateMetricViewCommand.createMetricViewInSessionCatalog (metricViewCommands.scala:74-78) passes properties to prepareTable without the merge. Result: DESCRIBE TABLE EXTENDED on a session-catalog metric view doesn't show these descriptor rows; on a v2 metric view, it does.

The PR description states the merge happens "into the view's properties bag" without a v1/v2 carve-out — please confirm this is intentional, or merge in v1 too.

// would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not a round-trippable
// metric-view DDL form (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
// YAML AS $$ <yaml> $$`). Reject up front rather than emit lossy DDL.
throw QueryCompilationErrors.unsupportedTableOperationError(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The v1 path uses the dedicated UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW (tables.scala:1219 → showCreateTableNotSupportedOnMetricViewError); this v2 path uses the generic UNSUPPORTED_FEATURE.TABLE_OPERATION. Same user action, two error classes — the dedicated one carries a more actionable message ("not supported on metric view"). showCreateTableNotSupportedOnMetricViewError takes a String; you can format the multipart name the same way the case below does ((catalog.name() +: ident.asMultipartIdentifier).map(quoteIfNeeded).mkString(".")).

case r: HiveTableRelation =>
tables += r.tableMeta.identifier.nameParts
case r: LogicalRelation if r.catalogTable.isDefined =>
tables += r.catalogTable.get.identifier.nameParts
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableIdentifier.nameParts returns 1, 2, or 3 parts depending on which of catalog / database are set on the resolved CatalogTable.identifier. The same v1 source table can yield [db, tbl] or [spark_catalog, db, tbl] across runs — the test in MetricViewV2CatalogSuite (the V1 source dependency-extraction test) admits this with parts.length >= 2 && parts.length <= 3. The TableDependency Javadoc (TableDependency.java:32-33) acknowledges it, but downstream consumers comparing dependency names structurally won't be able to rely on stable arity. Same issue applies to the View arm (line 105) and HiveTableRelation arm (line 112).

Consider normalizing v1 sources to always emit 3-part [spark_catalog, db, tbl] (prepending the session-catalog name when missing) so consumers see a stable shape per source kind. The test should then assert exactly 3 parts.

val nameParts = (catalog.name() +: ident.namespace().toIndexedSeq) :+ ident.name()
val analyzed = MetricViewHelper.analyzeMetricViewText(
session, nameParts, originalText)
val metricView = MetricViewFactory.fromYAML(originalText)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetricViewHelper.analyzeMetricViewText (called on line 345) already parses the YAML internally via MetricViewPlanner.parseYAMLMetricViewFactory.fromYAML. Re-parsing here duplicates the work. analyzeMetricViewText could return (LogicalPlan, MetricView) (or expose the parsed MetricView via the MetricViewPlaceholder it builds) so the strategy doesn't pay the parse twice.

}

case DropTable(r: ResolvedIdentifier, ifExists, purge) =>
val tableCatalog = r.catalog.asTableCatalog
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting r.catalog.asTableCatalog into tableCatalog is a no-op refactor unrelated to metric-view support and is not mentioned in the PR description. Please revert or split into a separate cleanup PR — keeps the metric-view PR focused.

* @since 4.2.0
*/
@Evolving
public record DependencyList(Dependency[] dependencies) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java records auto-generate equals/hashCode from per-field Objects.equals/Objects.hashCode, which on array fields fall through to Object.equals (reference equality). So DependencyList.of(Dependency.table("a","b")).equals(DependencyList.of(Dependency.table("a","b"))) == false. Same issue on TableDependency.nameParts (TableDependency.java:42) and FunctionDependency.nameParts (FunctionDependency.java:34).

This undermines the "structural multi-part name" intent — consumers can't dedup, compare, or use these as Map keys. Please override equals/hashCode to use Arrays.equals/Arrays.hashCode on each record (and Arrays.deepEquals/Arrays.deepHashCode on DependencyList), or expose List<...> instead of [] to inherit value semantics for free.

where = None,
select = metricViewColumns)
val yaml = MetricViewFactory.toYAML(metricView)
// Give both columns new names, and a comment on each. Without the `retainMetadata`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two test comments describe behavior in terms of the in-PR change rather than the current invariant — they age poorly post-merge:

  • Lines 253-254: // ... Without the retainMetadatafix toViewHelper.aliasPlan, the metric_view.* keys disappear here. — once merged, there is no "fix" to refer to. Rephrase as a pin: // Pins aliasPlan(retainMetadata=true): metric_view.* keys must survive a column rename with comments.
  • Lines 749-752 (DESCRIBE EXTENDED test): // The "Type" row was added alongside this metric-view PR so DescribeV2ViewExec matches v1 parity... — same problem. Rephrase to describe the current invariant being pinned (e.g. DescribeV2ViewExec emits a "Type" row matching v1 parity, so users can distinguish VIEW vs METRIC_VIEW.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants