[SPARK-54119] Support METRIC_VIEW creation on V2 catalogs#55487
[SPARK-54119] Support METRIC_VIEW creation on V2 catalogs#55487chenwang-databricks wants to merge 9 commits intoapache:masterfrom
Conversation
a48673d to
6f2ac0b
Compare
245dc93 to
9190bf0
Compare
12905c0 to
8f62efc
Compare
cloud-fan
left a comment
There was a problem hiding this comment.
Thanks for extending metric views to v2 catalogs. The shape of the change is right — route through ViewCatalog.createView with PROP_TABLE_TYPE = METRIC_VIEW and a typed viewDependencies, and promote METRIC_VIEW to a first-class CatalogTableType. A few significant concerns:
1. v2 path forks instead of reusing V2ViewPreparation and changes behavior on IF NOT EXISTS / OR REPLACE. CreateMetricViewCommand.createMetricViewInV2Catalog is open-coded inside a v1 RunnableCommand and duplicates most of V2ViewPreparation.buildViewInfo, but with materially different behavior: allowExisting and replace are silently dropped on the v2 path (regression vs. v1, which honored ignoreIfExists), there's no viewExists short-circuit, no createOrReplaceView, no cross-type collision decoding (so a table sitting at the ident leaks ViewAlreadyExistsException instead of unsupportedCreateOrReplaceViewOnTableError), and the CharVarcharUtils / SchemaUtils.checkColumnNameDuplication / checkIndeterminateCollationInSchema / owner-stamping work that V2ViewPreparation does is missing. Refactoring this into a CreateV2MetricViewExec (or extending V2ViewPreparation with a viewDependencies / extraProperties hook) plus a DataSourceV2Strategy rule would close all of these gaps in one move.
2. Audit other tableType == VIEW gates for METRIC_VIEW. Promoting metric views to a distinct CatalogTableType opens up silent regressions wherever existing code branches on VIEW. Two I'm fairly sure about: CatalogTable.toJsonLinkedHashMap (interface.scala:670) gates View Text / View Original Text / View Schema Mode / View Catalog and Namespace / SQL Path on tableType == VIEW, so DESCRIBE TABLE EXTENDED on a metric view will lose all of those rows — visible regression vs. the pre-PR viewWithMetrics storage. And HiveExternalCatalog.{createTable, alterTable, restoreTableMetadata} (lines 277, 297, 598, 838) only special-case VIEW, so a Hive-metastore-backed session-catalog metric view will go through the wrong branches on persist and read-back. Tests don't cover Hive-backed metric views, so this would only surface in production. Worth grepping tableType == CatalogTableType.VIEW and tableType == VIEW repo-wide and deciding which sites also need to accept METRIC_VIEW.
3. TableDependency.tableFullName is a single dot-joined string but the Javadoc claims a fixed three-part shape. This breaks down two ways: (a) MetricViewHelper.collectTableDependencies uses unquotedString for V1 sources, which yields a 2-part db.tbl whenever TableIdentifier.catalog is unset, and mkString(".") for V2 sources, which yields N-part names for catalogs with multi-level namespaces (e.g., Iceberg cat.db1.db2.tbl is 4 parts) — so the producer doesn't honor the contract; (b) any consumer parsing the string back via split/join loses arity and is ambiguous against quoted identifiers containing a literal .. Since this is brand-new public API, it's the right time to make the storage structural — e.g., TableDependency(String[] nameParts) — rather than committing to a string the producer can't always honor.
4. MetricViewPlanner.parseYAML raises SparkException.internalError for malformed user YAML. Replacing invalidLiteralForWindowDurationError is right, but internalError is the wrong category — a typo in the user's YAML body shouldn't surface as "Spark internal error, please contact support." Should be a QueryCompilationErrors / AnalysisException with a real error class.
Minor stuff (typed DTO sealing/mutability, the ViewInfo constructor's metric-view special case, missing tests for OR REPLACE / IF NOT EXISTS / V1-source dependency / multi-level-namespace V2 source / read-back through loadRelation) is in the inline comments below.
…eedback Address all 4 substantive concerns and 8 inline comments from cloud-fan's review on PR apache#55487. V2 path refactor (concern 1, inline 87/111/142): - New CreateV2MetricViewExec extends V2ViewPreparation (mirrors CreateV2ViewExec): viewExists short-circuit on IF NOT EXISTS, OR REPLACE via createOrReplaceView, and cross-type collision decoding via ViewAlreadyExistsException -> tableExists -> EXPECT_VIEW_NOT_TABLE. - DataSourceV2Strategy rule routes CreateMetricViewCommand on a non-session catalog to the new exec; CreateMetricViewCommand keeps only the v1 session-catalog path in run(). - V2ViewPreparation gains optional viewDependencies / tableType hooks the metric-view subclass populates; plain views leave them None. - Picks up CharVarcharUtils.getRawSchema, SchemaUtils.checkColumn- NameDuplication, SchemaUtils.checkIndeterminateCollationInSchema, and PROP_OWNER stamping for free via the shared trait. Structural TableDependency / FunctionDependency (concern 3, inline 30/169): - Replace single-string `tableFullName` with `String[] nameParts`; arity is preserved per source, unambiguous against quoted identifiers containing literal `.`. Apply same shape to FunctionDependency. - Dependency.table / .function factories take varargs. - MetricViewHelper.collectTableDependencies returns Seq[Seq[String]]; each match arm emits structural parts (V1 sources via TableIdentifier.nameParts; V2 sources via catalog +: namespace :+ name). Sealed Dependency + defensive DependencyList (inline 30 / 40): - `sealed interface Dependency permits TableDependency, FunctionDependency` enforces the documented "is one of" structurally. - DependencyList canonical constructor and accessor defensively clone the array so consumers can't mutate stored ViewInfo dependency state. ViewInfo constructor cleanup (inline 68): - Drop the metric-view-specific PROP_TABLE_TYPE branch in the generic constructor in favor of `properties().putIfAbsent(...)`. Callers that want a more specific kind (e.g. METRIC_VIEW) call BaseBuilder.withTableType(...) before build() -- exercised by CreateV2MetricViewExec via the new V2ViewPreparation tableType hook. VIEW-gate audit (concern 2, inline 1088): - CatalogTable.toJsonLinkedHashMap (interface.scala:670) accepts METRIC_VIEW so DESCRIBE TABLE EXTENDED still emits "View Text" / "View Original Text" / "View Schema Mode" / "View Catalog and Namespace" / "SQL Path" rows for metric views. - HiveExternalCatalog: 4 sites (createTable empty-schema fallback x2, alterTable VIEW path, restoreTableMetadata read-back) accept METRIC_VIEW so a Hive-metastore-backed session-catalog metric view round-trips. - Repo-wide: SessionCatalog.isView, InMemoryCatalog.listViews, RelationResolution.createDataSourceV1Scan (streaming-on-view rejection), Analyzer.lookupTableOrView (ResolvedPersistentView wrapping), rules.saveDataIntoView, DataStreamWriter.writeToV1Table, DescribeRelationJsonCommand.describePartitionInfoJson, AnalyzeColumnCommand, AnalyzePartitionCommand, CommandUtils.analyzeTable, tables.CreateTableLike provider lookup, tables.AlterTableAddColumnsCommand, tables.describeDetailedPartitionInfo, tables.ShowCreateTableCommand (3 sites) -- all extended to accept METRIC_VIEW. Real error class for malformed YAML (concern 4, inline 70): - New INVALID_METRIC_VIEW_YAML error condition (sqlState 42K0L) + QueryCompilationErrors.invalidMetricViewYamlError. Replaces SparkException.internalError so a typo in the user's YAML body surfaces as a user-correctable AnalysisException. Tests (inline 39): - CREATE OR REPLACE VIEW WITH METRICS replaces the existing view. - CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when the view exists. - CREATE VIEW WITH METRICS over a v2 table at the ident throws EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE. - CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when a v2 table sits at the ident. - V1 session-catalog source dependency: nameParts arity 2-3. - Multi-level V2 namespace source dependency: nameParts arity N+2. - DESCRIBE TABLE EXTENDED on a v2 metric view -- read-back through loadRelation -> MetadataOnlyTable -> V1Table.toCatalogTable(ViewInfo). - Existing dep tests updated to assert nameParts instead of tableFullName. Misc (inline 178): - Scaladoc on MetricView.getProperties calls out that metric_view.from.sql / metric_view.where are truncated to Constants.MAXIMUM_PROPERTY_SIZE and are descriptive (not round-trippable) values; consumers should re-read the YAML body for full SQL.
…e#55487 CI fix: - Restore the missing `"MISSING_CATALOG_ABILITY.VIEWS") {` line in `MetricViewV2CatalogSuite.scala` whose absence broke the test file's parse (compileIncremental "four errors found" -- ')' expected, unmatched '}', unclosed multi-line string, '}' expected at EOF). Self-review feedback: - Shorten `viewDependencies` and `tableType` Scaladocs on `V2ViewPreparation` to one sentence each; drop the plain-view / metric-view distinctions. - Consolidate the duplicated `run()` body from `CreateV2ViewExec` and `CreateV2MetricViewExec` into a new `V2CreateViewPreparation` trait (extends `V2ViewPreparation`) that owns the shared CREATE-side flow: `viewExists` short-circuit, `OR REPLACE` via `createOrReplaceView`, and `ViewAlreadyExistsException` decoding for cross-type collisions. Both subclasses now extend `V2CreateViewPreparation` and inherit `run()` unchanged. The intermediate trait keeps `V2ViewPreparation` (also used by `AlterV2ViewExec`) free of CREATE-only fields. - Drop the defensive `case _: ResolvedIdentifier =>` branch in `CreateMetricViewCommand.run` -- the catch-all internal-error case is sufficient now that `DataSourceV2Strategy` reliably intercepts the non-session path. - `SHOW CREATE TABLE` on a metric view now throws the dedicated `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000) instead of falling through to the VIEW branch. Reverts the three METRIC_VIEW additions in `tables.scala`'s VIEW gates from the previous round. Test reorg + new coverage in `MetricViewV2CatalogSuite`: - Reorganized into 5 sections: (1) Create-related, (2) Dependency extraction, (3) SELECT cases, (4) DESCRIBE cases, (5) DROP / SHOW. - Trimmed the `OR REPLACE` test's `!== firstYaml` assertion; replaced with positive assertions on the captured ViewInfo's queryText, metric_view.where, and viewDependencies (parts shape). - New `SELECT from a v2 metric view returns aggregated rows` -- exercises the `loadRelation` -> `MetadataOnlyTable(ViewInfo)` -> `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip and asserts on the aggregated output rows. - New `DESCRIBE TABLE on a v2 metric view returns the aliased columns` (non-EXTENDED variant alongside the existing EXTENDED test). - New `SHOW TABLES does not list v2 metric views` (RelationCatalog contract: tables only). - New `SHOW VIEWS lists v2 metric views`.
… TABLE coverage Add the test cases requested on PR apache#55487: SELECT read-back (section 3) -- 4 new tests modeled on `MetricViewSuite` patterns: - Existing test fixed to use `measure(count_sum)` (the v1 suite shows the required syntax for measure columns) and switched to `checkAnswer` against an equivalent raw aggregation over the source. - `SELECT measure(...) with a WHERE clause on a dimension` -- exercises a filter at the query layer. - `SELECT against a v2 metric view honors the view's pre-defined where clause` -- creates the metric view with `where = Some("count > 1")`, asserts only the matching rows surface. - `SELECT from a v2 metric view supports multiple measures with different aggregations` -- adds price_sum / price_max alongside count_sum. - `SELECT from a v2 metric view supports ORDER BY and LIMIT on measures`. DROP TABLE on a v2 metric view (section 5) -- 2 new tests: - `DROP TABLE on a v2 metric view throws EXPECT_TABLE_NOT_VIEW` -- `DropTableExec`'s `tableExists` probe returns false (RelationCatalog passive filtering), the `viewExists` fallback returns true, so the exec emits `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE`. Also asserts the metric view is still present in the catalog after the failed DROP. - `DROP TABLE IF EXISTS on a v2 metric view also throws EXPECT_TABLE_NOT_VIEW` -- IF EXISTS does not silence the wrong-type error (v1 parity). SHOW CREATE TABLE on a v2 metric view (section 5) -- 1 new test: - `SHOW CREATE TABLE on a v2 metric view is unsupported` -- routes through `DataSourceV2Strategy`'s `ResolvedPersistentView` case and throws `UNSUPPORTED_FEATURE.TABLE_OPERATION` with operation "SHOW CREATE TABLE".
b7d086f to
269056a
Compare
…eedback Address all 4 substantive concerns and 8 inline comments from cloud-fan's review on PR apache#55487. V2 path refactor (concern 1, inline 87/111/142): - New CreateV2MetricViewExec extends V2ViewPreparation (mirrors CreateV2ViewExec): viewExists short-circuit on IF NOT EXISTS, OR REPLACE via createOrReplaceView, and cross-type collision decoding via ViewAlreadyExistsException -> tableExists -> EXPECT_VIEW_NOT_TABLE. - DataSourceV2Strategy rule routes CreateMetricViewCommand on a non-session catalog to the new exec; CreateMetricViewCommand keeps only the v1 session-catalog path in run(). - V2ViewPreparation gains optional viewDependencies / tableType hooks the metric-view subclass populates; plain views leave them None. - Picks up CharVarcharUtils.getRawSchema, SchemaUtils.checkColumn- NameDuplication, SchemaUtils.checkIndeterminateCollationInSchema, and PROP_OWNER stamping for free via the shared trait. Structural TableDependency / FunctionDependency (concern 3, inline 30/169): - Replace single-string `tableFullName` with `String[] nameParts`; arity is preserved per source, unambiguous against quoted identifiers containing literal `.`. Apply same shape to FunctionDependency. - Dependency.table / .function factories take varargs. - MetricViewHelper.collectTableDependencies returns Seq[Seq[String]]; each match arm emits structural parts (V1 sources via TableIdentifier.nameParts; V2 sources via catalog +: namespace :+ name). Sealed Dependency + defensive DependencyList (inline 30 / 40): - `sealed interface Dependency permits TableDependency, FunctionDependency` enforces the documented "is one of" structurally. - DependencyList canonical constructor and accessor defensively clone the array so consumers can't mutate stored ViewInfo dependency state. ViewInfo constructor cleanup (inline 68): - Drop the metric-view-specific PROP_TABLE_TYPE branch in the generic constructor in favor of `properties().putIfAbsent(...)`. Callers that want a more specific kind (e.g. METRIC_VIEW) call BaseBuilder.withTableType(...) before build() -- exercised by CreateV2MetricViewExec via the new V2ViewPreparation tableType hook. VIEW-gate audit (concern 2, inline 1088): - CatalogTable.toJsonLinkedHashMap (interface.scala:670) accepts METRIC_VIEW so DESCRIBE TABLE EXTENDED still emits "View Text" / "View Original Text" / "View Schema Mode" / "View Catalog and Namespace" / "SQL Path" rows for metric views. - HiveExternalCatalog: 4 sites (createTable empty-schema fallback x2, alterTable VIEW path, restoreTableMetadata read-back) accept METRIC_VIEW so a Hive-metastore-backed session-catalog metric view round-trips. - Repo-wide: SessionCatalog.isView, InMemoryCatalog.listViews, RelationResolution.createDataSourceV1Scan (streaming-on-view rejection), Analyzer.lookupTableOrView (ResolvedPersistentView wrapping), rules.saveDataIntoView, DataStreamWriter.writeToV1Table, DescribeRelationJsonCommand.describePartitionInfoJson, AnalyzeColumnCommand, AnalyzePartitionCommand, CommandUtils.analyzeTable, tables.CreateTableLike provider lookup, tables.AlterTableAddColumnsCommand, tables.describeDetailedPartitionInfo, tables.ShowCreateTableCommand (3 sites) -- all extended to accept METRIC_VIEW. Real error class for malformed YAML (concern 4, inline 70): - New INVALID_METRIC_VIEW_YAML error condition (sqlState 42K0L) + QueryCompilationErrors.invalidMetricViewYamlError. Replaces SparkException.internalError so a typo in the user's YAML body surfaces as a user-correctable AnalysisException. Tests (inline 39): - CREATE OR REPLACE VIEW WITH METRICS replaces the existing view. - CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when the view exists. - CREATE VIEW WITH METRICS over a v2 table at the ident throws EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE. - CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when a v2 table sits at the ident. - V1 session-catalog source dependency: nameParts arity 2-3. - Multi-level V2 namespace source dependency: nameParts arity N+2. - DESCRIBE TABLE EXTENDED on a v2 metric view -- read-back through loadRelation -> MetadataOnlyTable -> V1Table.toCatalogTable(ViewInfo). - Existing dep tests updated to assert nameParts instead of tableFullName. Misc (inline 178): - Scaladoc on MetricView.getProperties calls out that metric_view.from.sql / metric_view.where are truncated to Constants.MAXIMUM_PROPERTY_SIZE and are descriptive (not round-trippable) values; consumers should re-read the YAML body for full SQL.
…e#55487 CI fix: - Restore the missing `"MISSING_CATALOG_ABILITY.VIEWS") {` line in `MetricViewV2CatalogSuite.scala` whose absence broke the test file's parse (compileIncremental "four errors found" -- ')' expected, unmatched '}', unclosed multi-line string, '}' expected at EOF). Self-review feedback: - Shorten `viewDependencies` and `tableType` Scaladocs on `V2ViewPreparation` to one sentence each; drop the plain-view / metric-view distinctions. - Consolidate the duplicated `run()` body from `CreateV2ViewExec` and `CreateV2MetricViewExec` into a new `V2CreateViewPreparation` trait (extends `V2ViewPreparation`) that owns the shared CREATE-side flow: `viewExists` short-circuit, `OR REPLACE` via `createOrReplaceView`, and `ViewAlreadyExistsException` decoding for cross-type collisions. Both subclasses now extend `V2CreateViewPreparation` and inherit `run()` unchanged. The intermediate trait keeps `V2ViewPreparation` (also used by `AlterV2ViewExec`) free of CREATE-only fields. - Drop the defensive `case _: ResolvedIdentifier =>` branch in `CreateMetricViewCommand.run` -- the catch-all internal-error case is sufficient now that `DataSourceV2Strategy` reliably intercepts the non-session path. - `SHOW CREATE TABLE` on a metric view now throws the dedicated `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000) instead of falling through to the VIEW branch. Reverts the three METRIC_VIEW additions in `tables.scala`'s VIEW gates from the previous round. Test reorg + new coverage in `MetricViewV2CatalogSuite`: - Reorganized into 5 sections: (1) Create-related, (2) Dependency extraction, (3) SELECT cases, (4) DESCRIBE cases, (5) DROP / SHOW. - Trimmed the `OR REPLACE` test's `!== firstYaml` assertion; replaced with positive assertions on the captured ViewInfo's queryText, metric_view.where, and viewDependencies (parts shape). - New `SELECT from a v2 metric view returns aggregated rows` -- exercises the `loadRelation` -> `MetadataOnlyTable(ViewInfo)` -> `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip and asserts on the aggregated output rows. - New `DESCRIBE TABLE on a v2 metric view returns the aliased columns` (non-EXTENDED variant alongside the existing EXTENDED test). - New `SHOW TABLES does not list v2 metric views` (RelationCatalog contract: tables only). - New `SHOW VIEWS lists v2 metric views`.
… TABLE coverage Add the test cases requested on PR apache#55487: SELECT read-back (section 3) -- 4 new tests modeled on `MetricViewSuite` patterns: - Existing test fixed to use `measure(count_sum)` (the v1 suite shows the required syntax for measure columns) and switched to `checkAnswer` against an equivalent raw aggregation over the source. - `SELECT measure(...) with a WHERE clause on a dimension` -- exercises a filter at the query layer. - `SELECT against a v2 metric view honors the view's pre-defined where clause` -- creates the metric view with `where = Some("count > 1")`, asserts only the matching rows surface. - `SELECT from a v2 metric view supports multiple measures with different aggregations` -- adds price_sum / price_max alongside count_sum. - `SELECT from a v2 metric view supports ORDER BY and LIMIT on measures`. DROP TABLE on a v2 metric view (section 5) -- 2 new tests: - `DROP TABLE on a v2 metric view throws EXPECT_TABLE_NOT_VIEW` -- `DropTableExec`'s `tableExists` probe returns false (RelationCatalog passive filtering), the `viewExists` fallback returns true, so the exec emits `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE`. Also asserts the metric view is still present in the catalog after the failed DROP. - `DROP TABLE IF EXISTS on a v2 metric view also throws EXPECT_TABLE_NOT_VIEW` -- IF EXISTS does not silence the wrong-type error (v1 parity). SHOW CREATE TABLE on a v2 metric view (section 5) -- 1 new test: - `SHOW CREATE TABLE on a v2 metric view is unsupported` -- routes through `DataSourceV2Strategy`'s `ResolvedPersistentView` case and throws `UNSUPPORTED_FEATURE.TABLE_OPERATION` with operation "SHOW CREATE TABLE".
cloud-fan
left a comment
There was a problem hiding this comment.
Round 2 review.
Status: 13 prior findings addressed (4 main concerns + 9 inline), 0 remaining, 4 new (3 newly introduced in commits since the prior review at 8f62efc6, 1 late catch). The shared V2ViewPreparation / V2CreateViewPreparation refactor reads cleanly, the typed Dependency / DependencyList API replaces the dot-joined-string form, the INVALID_METRIC_VIEW_YAML error is in place, and the tableType == VIEW audit covers most sites. Four follow-ups inline.
…pache#55487 Address the four new findings from cloud-fan's round 2 review of PR apache#55487 plus the corresponding CI fixes. Combines the revert of the StructField/Column public-API commit (per cloud-fan's split request) with the three remaining metric-view fixes into one commit. Cloud-fan finding #1 (`verifyAlterTableType` audit miss): * `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently succeeds (renames as if a regular table), and `ALTER VIEW <metric_view> RENAME TO ...` throws `cannotAlterTableWithAlterViewError` (wrong category for a view- kind). Widen both arms to also accept `METRIC_VIEW`. Cloud-fan finding apache#2/apache#3 (StructField commit split-out): * Revert d55143f ("Public StructField.json / .fromJson + Column.fromStructField"). None of the three new public APIs are called by code in this PR -- they exist to support the downstream Unity Catalog connector. Per cloud-fan's request, split them out into a separate PR with its own JIRA ticket so they can be reviewed independently. Reverting the commit also removes the `Column.fromStructField` divergence from `structFieldToV2Column` (which would have duplicated the comment field in `metadataInJSON`); the split-out PR fixes that inline. The revert also fixes two CI failures introduced by the same commit: - `mimaReportBinaryIssues` losing `StructField.tupled` / `StructField.curried` / `AbstractFunction4` parent on the `StructField` companion object (sql-api 4.0.0 binary surface). - `Javaunidoc` failing on the `{@link StructField.json}` / `{@link StructField.fromJson}` references that javadoc rejects (member separator must be `#`, not `.`). Cloud-fan finding apache#4 (multi-level namespace lift): * `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, which can only carry up to 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. Lift the helper to take `nameParts: Seq[String]` directly. The synthetic `CatalogTable` used as analysis context still needs a TableIdentifier (it's a v1 type); we collapse the intermediate namespace components into the synthetic `database` slot via dot-join. Since the synthetic identifier is never used to resolve the view body itself, this collapse is observationally invisible to the analyzed plan, and `verifyTemporaryObjectsNotExists` still receives the full `nameParts` so error rendering preserves the multi-part form. Update both call sites: - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1) passes `name.nameParts`. - `DataSourceV2Strategy` (v2) passes `(catalog.name() +: ident.namespace() :+ ident.name())`. Add a new test in `MetricViewV2CatalogSuite` that creates a metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create succeeds + the captured `ViewInfo` carries `PROP_TABLE_TYPE = METRIC_VIEW`. Test realignment to SPARK-56655's new contract: * SPARK-56655 (master) added several v2 view DDL behaviors that changed the error categories surfaced for cross-type collisions and added view-aware listing on `TableViewCatalog`. Realign 5 existing test assertions in `MetricViewV2CatalogSuite`: - "CREATE VIEW ... WITH METRICS over a v2 table at the ident": `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_ ALREADY_EXISTS` (master's analyzer-time pre-check fires before the exec-time decoding in `CreateV2MetricViewExec.run`). - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` -> `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", raised by master's new `DropTableExec.run`). - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES on a v2 TableViewCatalog lists both tables and metric views": SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through `listRelationSummaries` so views appear alongside tables in the output (matches v1 SHOW TABLES on a session catalog). - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing to the pre-substitution YAML body (the row carries the raw text between the SQL `$$ ... $$` markers, including the surrounding `\n`). * Add an explicit `SHOW CREATE TABLE` rejection guard for v2 metric views in `DataSourceV2Strategy`. Master's new `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not round-trippable as metric-view DDL (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE YAML AS $$ <yaml> $$`). Reject up front with `unsupportedTableOperationError("SHOW CREATE TABLE")` rather than emit lossy DDL; the existing test already expects this exact behavior. Signed-off-by: chen.wang <chen.wang@databricks.com>
…pache#55487 Address the four new findings from cloud-fan's round 2 review of PR apache#55487 plus the corresponding CI fixes. Combines the revert of the StructField/Column public-API commit (per cloud-fan's split request) with the three remaining metric-view fixes into one commit. Cloud-fan finding #1 (`verifyAlterTableType` audit miss): * `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently succeeds (renames as if a regular table), and `ALTER VIEW <metric_view> RENAME TO ...` throws `cannotAlterTableWithAlterViewError` (wrong category for a view- kind). Widen both arms to also accept `METRIC_VIEW`. Cloud-fan finding apache#2/apache#3 (StructField commit split-out): * Revert d55143f ("Public StructField.json / .fromJson + Column.fromStructField"). None of the three new public APIs are called by code in this PR -- they exist to support the downstream Unity Catalog connector. Per cloud-fan's request, split them out into a separate PR with its own JIRA ticket so they can be reviewed independently. Reverting the commit also removes the `Column.fromStructField` divergence from `structFieldToV2Column` (which would have duplicated the comment field in `metadataInJSON`); the split-out PR fixes that inline. The revert also fixes two CI failures introduced by the same commit: - `mimaReportBinaryIssues` losing `StructField.tupled` / `StructField.curried` / `AbstractFunction4` parent on the `StructField` companion object (sql-api 4.0.0 binary surface). - `Javaunidoc` failing on the `{@link StructField.json}` / `{@link StructField.fromJson}` references that javadoc rejects (member separator must be `#`, not `.`). Cloud-fan finding apache#4 (multi-level namespace lift): * `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, which can only carry up to 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. Lift the helper to take `nameParts: Seq[String]` directly. The synthetic `CatalogTable` used as analysis context still needs a TableIdentifier (it's a v1 type); we collapse the intermediate namespace components into the synthetic `database` slot via dot-join. Since the synthetic identifier is never used to resolve the view body itself, this collapse is observationally invisible to the analyzed plan, and `verifyTemporaryObjectsNotExists` still receives the full `nameParts` so error rendering preserves the multi-part form. Update both call sites: - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1) passes `name.nameParts`. - `DataSourceV2Strategy` (v2) passes `(catalog.name() +: ident.namespace() :+ ident.name())`. Add a new test in `MetricViewV2CatalogSuite` that creates a metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create succeeds + the captured `ViewInfo` carries `PROP_TABLE_TYPE = METRIC_VIEW`. Test realignment to SPARK-56655's new contract: * SPARK-56655 (master) added several v2 view DDL behaviors that changed the error categories surfaced for cross-type collisions and added view-aware listing on `TableViewCatalog`. Realign 5 existing test assertions in `MetricViewV2CatalogSuite`: - "CREATE VIEW ... WITH METRICS over a v2 table at the ident": `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_ ALREADY_EXISTS` (master's analyzer-time pre-check fires before the exec-time decoding in `CreateV2MetricViewExec.run`). - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` -> `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", raised by master's new `DropTableExec.run`). - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES on a v2 TableViewCatalog lists both tables and metric views": SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through `listRelationSummaries` so views appear alongside tables in the output (matches v1 SHOW TABLES on a session catalog). - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing to the pre-substitution YAML body (the row carries the raw text between the SQL `$$ ... $$` markers, including the surrounding `\n`). * Add an explicit `SHOW CREATE TABLE` rejection guard for v2 metric views in `DataSourceV2Strategy`. Master's new `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not round-trippable as metric-view DDL (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE YAML AS $$ <yaml> $$`). Reject up front with `unsupportedTableOperationError("SHOW CREATE TABLE")` rather than emit lossy DDL; the existing test already expects this exact behavior. Signed-off-by: chen.wang <chen.wang@databricks.com>
fb8da8c to
28543c1
Compare
…pache#55487 Address the four new findings from cloud-fan's round 2 review of PR apache#55487 plus the corresponding CI fixes. Combines the revert of the StructField/Column public-API commit (per cloud-fan's split request) with the three remaining metric-view fixes into one commit. Cloud-fan finding #1 (`verifyAlterTableType` audit miss): * `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently succeeds (renames as if a regular table), and `ALTER VIEW <metric_view> RENAME TO ...` throws `cannotAlterTableWithAlterViewError` (wrong category for a view- kind). Widen both arms to also accept `METRIC_VIEW`. Cloud-fan finding apache#2/apache#3 (StructField commit split-out): * Revert d55143f ("Public StructField.json / .fromJson + Column.fromStructField"). None of the three new public APIs are called by code in this PR -- they exist to support the downstream Unity Catalog connector. Per cloud-fan's request, split them out into a separate PR with its own JIRA ticket so they can be reviewed independently. Reverting the commit also removes the `Column.fromStructField` divergence from `structFieldToV2Column` (which would have duplicated the comment field in `metadataInJSON`); the split-out PR fixes that inline. The revert also fixes two CI failures introduced by the same commit: - `mimaReportBinaryIssues` losing `StructField.tupled` / `StructField.curried` / `AbstractFunction4` parent on the `StructField` companion object (sql-api 4.0.0 binary surface). - `Javaunidoc` failing on the `{@link StructField.json}` / `{@link StructField.fromJson}` references that javadoc rejects (member separator must be `#`, not `.`). Cloud-fan finding apache#4 (multi-level namespace lift): * `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, which can only carry up to 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. Lift the helper to take `nameParts: Seq[String]` directly. The synthetic `CatalogTable` used as analysis context still needs a TableIdentifier (it's a v1 type); we collapse the intermediate namespace components into the synthetic `database` slot via dot-join. Since the synthetic identifier is never used to resolve the view body itself, this collapse is observationally invisible to the analyzed plan, and `verifyTemporaryObjectsNotExists` still receives the full `nameParts` so error rendering preserves the multi-part form. Update both call sites: - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1) passes `name.nameParts`. - `DataSourceV2Strategy` (v2) passes `(catalog.name() +: ident.namespace() :+ ident.name())`. Add a new test in `MetricViewV2CatalogSuite` that creates a metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create succeeds + the captured `ViewInfo` carries `PROP_TABLE_TYPE = METRIC_VIEW`. Test realignment to SPARK-56655's new contract: * SPARK-56655 (master) added several v2 view DDL behaviors that changed the error categories surfaced for cross-type collisions and added view-aware listing on `TableViewCatalog`. Realign 5 existing test assertions in `MetricViewV2CatalogSuite`: - "CREATE VIEW ... WITH METRICS over a v2 table at the ident": `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_ ALREADY_EXISTS` (master's analyzer-time pre-check fires before the exec-time decoding in `CreateV2MetricViewExec.run`). - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` -> `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", raised by master's new `DropTableExec.run`). - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES on a v2 TableViewCatalog lists both tables and metric views": SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through `listRelationSummaries` so views appear alongside tables in the output (matches v1 SHOW TABLES on a session catalog). - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing to the pre-substitution YAML body (the row carries the raw text between the SQL `$$ ... $$` markers, including the surrounding `\n`). * Add an explicit `SHOW CREATE TABLE` rejection guard for v2 metric views in `DataSourceV2Strategy`. Master's new `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not round-trippable as metric-view DDL (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE YAML AS $$ <yaml> $$`). Reject up front with `unsupportedTableOperationError("SHOW CREATE TABLE")` rather than emit lossy DDL; the existing test already expects this exact behavior. Signed-off-by: chen.wang <chen.wang@databricks.com>
28543c1 to
b8bd9cc
Compare
…pache#55487 Address the four new findings from cloud-fan's round 2 review of PR apache#55487 plus the corresponding CI fixes. Combines the revert of the StructField/Column public-API commit (per cloud-fan's split request) with the three remaining metric-view fixes into one commit. Cloud-fan finding #1 (`verifyAlterTableType` audit miss): * `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently succeeds (renames as if a regular table), and `ALTER VIEW <metric_view> RENAME TO ...` throws `cannotAlterTableWithAlterViewError` (wrong category for a view- kind). Widen both arms to also accept `METRIC_VIEW`. Cloud-fan finding apache#2/apache#3 (StructField commit split-out): * Revert d55143f ("Public StructField.json / .fromJson + Column.fromStructField"). None of the three new public APIs are called by code in this PR -- they exist to support the downstream Unity Catalog connector. Per cloud-fan's request, split them out into a separate PR with its own JIRA ticket so they can be reviewed independently. Reverting the commit also removes the `Column.fromStructField` divergence from `structFieldToV2Column` (which would have duplicated the comment field in `metadataInJSON`); the split-out PR fixes that inline. The revert also fixes two CI failures introduced by the same commit: - `mimaReportBinaryIssues` losing `StructField.tupled` / `StructField.curried` / `AbstractFunction4` parent on the `StructField` companion object (sql-api 4.0.0 binary surface). - `Javaunidoc` failing on the `{@link StructField.json}` / `{@link StructField.fromJson}` references that javadoc rejects (member separator must be `#`, not `.`). Cloud-fan finding apache#4 (multi-level namespace lift): * `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, which can only carry up to 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. Lift the helper to take `nameParts: Seq[String]` directly. The synthetic `CatalogTable` used as analysis context still needs a TableIdentifier (it's a v1 type); we collapse the intermediate namespace components into the synthetic `database` slot via dot-join. Since the synthetic identifier is never used to resolve the view body itself, this collapse is observationally invisible to the analyzed plan, and `verifyTemporaryObjectsNotExists` still receives the full `nameParts` so error rendering preserves the multi-part form. Update both call sites: - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1) passes `name.nameParts`. - `DataSourceV2Strategy` (v2) passes `(catalog.name() +: ident.namespace() :+ ident.name())`. Add a new test in `MetricViewV2CatalogSuite` that creates a metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create succeeds + the captured `ViewInfo` carries `PROP_TABLE_TYPE = METRIC_VIEW`. Test realignment to SPARK-56655's new contract: * SPARK-56655 (master) added several v2 view DDL behaviors that changed the error categories surfaced for cross-type collisions and added view-aware listing on `TableViewCatalog`. Realign 5 existing test assertions in `MetricViewV2CatalogSuite`: - "CREATE VIEW ... WITH METRICS over a v2 table at the ident": `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_ ALREADY_EXISTS` (master's analyzer-time pre-check fires before the exec-time decoding in `CreateV2MetricViewExec.run`). - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` -> `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", raised by master's new `DropTableExec.run`). - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES on a v2 TableViewCatalog lists both tables and metric views": SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through `listRelationSummaries` so views appear alongside tables in the output (matches v1 SHOW TABLES on a session catalog). - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing to the pre-substitution YAML body (the row carries the raw text between the SQL `$$ ... $$` markers, including the surrounding `\n`). * Add an explicit `SHOW CREATE TABLE` rejection guard for v2 metric views in `DataSourceV2Strategy`. Master's new `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not round-trippable as metric-view DDL (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE YAML AS $$ <yaml> $$`). Reject up front with `unsupportedTableOperationError("SHOW CREATE TABLE")` rather than emit lossy DDL; the existing test already expects this exact behavior. Signed-off-by: chen.wang <chen.wang@databricks.com>
b8bd9cc to
5d89566
Compare
Add support for CREATE VIEW ... WITH METRICS and the corresponding V2 catalog round-trip on top of the ViewCatalog / RelationCatalog APIs introduced by SPARK-52729. The V2 metric-view create path now builds a ViewInfo and calls ViewCatalog.createView. Metric views are distinguished from plain views with PROP_TABLE_TYPE=METRIC_VIEW, which ViewInfo now preserves when callers intentionally set it. This avoids mutating ViewInfo properties after construction while keeping plain views defaulted to VIEW. ViewInfo gains a typed viewDependencies field so catalogs can persist structured table/function dependencies without flattening nested lineage into string properties. The metric-view planner collects direct table dependencies from the analyzed source plan and passes them through this field. Also add first-class CatalogTableType.METRIC_VIEW / TableSummary.METRIC_VIEW_TABLE_TYPE support, V2-to-V1 conversion for MetricView ViewInfo rows, drop-command parity for metric views, metric_view.* descriptive properties, and focused V1/V2 tests for dependency extraction, ViewInfo payloads, DROP VIEW routing, and user-specified column metadata preservation.
…eedback Address all 4 substantive concerns and 8 inline comments from cloud-fan's review on PR apache#55487. V2 path refactor (concern 1, inline 87/111/142): - New CreateV2MetricViewExec extends V2ViewPreparation (mirrors CreateV2ViewExec): viewExists short-circuit on IF NOT EXISTS, OR REPLACE via createOrReplaceView, and cross-type collision decoding via ViewAlreadyExistsException -> tableExists -> EXPECT_VIEW_NOT_TABLE. - DataSourceV2Strategy rule routes CreateMetricViewCommand on a non-session catalog to the new exec; CreateMetricViewCommand keeps only the v1 session-catalog path in run(). - V2ViewPreparation gains optional viewDependencies / tableType hooks the metric-view subclass populates; plain views leave them None. - Picks up CharVarcharUtils.getRawSchema, SchemaUtils.checkColumn- NameDuplication, SchemaUtils.checkIndeterminateCollationInSchema, and PROP_OWNER stamping for free via the shared trait. Structural TableDependency / FunctionDependency (concern 3, inline 30/169): - Replace single-string `tableFullName` with `String[] nameParts`; arity is preserved per source, unambiguous against quoted identifiers containing literal `.`. Apply same shape to FunctionDependency. - Dependency.table / .function factories take varargs. - MetricViewHelper.collectTableDependencies returns Seq[Seq[String]]; each match arm emits structural parts (V1 sources via TableIdentifier.nameParts; V2 sources via catalog +: namespace :+ name). Sealed Dependency + defensive DependencyList (inline 30 / 40): - `sealed interface Dependency permits TableDependency, FunctionDependency` enforces the documented "is one of" structurally. - DependencyList canonical constructor and accessor defensively clone the array so consumers can't mutate stored ViewInfo dependency state. ViewInfo constructor cleanup (inline 68): - Drop the metric-view-specific PROP_TABLE_TYPE branch in the generic constructor in favor of `properties().putIfAbsent(...)`. Callers that want a more specific kind (e.g. METRIC_VIEW) call BaseBuilder.withTableType(...) before build() -- exercised by CreateV2MetricViewExec via the new V2ViewPreparation tableType hook. VIEW-gate audit (concern 2, inline 1088): - CatalogTable.toJsonLinkedHashMap (interface.scala:670) accepts METRIC_VIEW so DESCRIBE TABLE EXTENDED still emits "View Text" / "View Original Text" / "View Schema Mode" / "View Catalog and Namespace" / "SQL Path" rows for metric views. - HiveExternalCatalog: 4 sites (createTable empty-schema fallback x2, alterTable VIEW path, restoreTableMetadata read-back) accept METRIC_VIEW so a Hive-metastore-backed session-catalog metric view round-trips. - Repo-wide: SessionCatalog.isView, InMemoryCatalog.listViews, RelationResolution.createDataSourceV1Scan (streaming-on-view rejection), Analyzer.lookupTableOrView (ResolvedPersistentView wrapping), rules.saveDataIntoView, DataStreamWriter.writeToV1Table, DescribeRelationJsonCommand.describePartitionInfoJson, AnalyzeColumnCommand, AnalyzePartitionCommand, CommandUtils.analyzeTable, tables.CreateTableLike provider lookup, tables.AlterTableAddColumnsCommand, tables.describeDetailedPartitionInfo, tables.ShowCreateTableCommand (3 sites) -- all extended to accept METRIC_VIEW. Real error class for malformed YAML (concern 4, inline 70): - New INVALID_METRIC_VIEW_YAML error condition (sqlState 42K0L) + QueryCompilationErrors.invalidMetricViewYamlError. Replaces SparkException.internalError so a typo in the user's YAML body surfaces as a user-correctable AnalysisException. Tests (inline 39): - CREATE OR REPLACE VIEW WITH METRICS replaces the existing view. - CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when the view exists. - CREATE VIEW WITH METRICS over a v2 table at the ident throws EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE. - CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when a v2 table sits at the ident. - V1 session-catalog source dependency: nameParts arity 2-3. - Multi-level V2 namespace source dependency: nameParts arity N+2. - DESCRIBE TABLE EXTENDED on a v2 metric view -- read-back through loadRelation -> MetadataOnlyTable -> V1Table.toCatalogTable(ViewInfo). - Existing dep tests updated to assert nameParts instead of tableFullName. Misc (inline 178): - Scaladoc on MetricView.getProperties calls out that metric_view.from.sql / metric_view.where are truncated to Constants.MAXIMUM_PROPERTY_SIZE and are descriptive (not round-trippable) values; consumers should re-read the YAML body for full SQL.
…e#55487 CI fix: - Restore the missing `"MISSING_CATALOG_ABILITY.VIEWS") {` line in `MetricViewV2CatalogSuite.scala` whose absence broke the test file's parse (compileIncremental "four errors found" -- ')' expected, unmatched '}', unclosed multi-line string, '}' expected at EOF). Self-review feedback: - Shorten `viewDependencies` and `tableType` Scaladocs on `V2ViewPreparation` to one sentence each; drop the plain-view / metric-view distinctions. - Consolidate the duplicated `run()` body from `CreateV2ViewExec` and `CreateV2MetricViewExec` into a new `V2CreateViewPreparation` trait (extends `V2ViewPreparation`) that owns the shared CREATE-side flow: `viewExists` short-circuit, `OR REPLACE` via `createOrReplaceView`, and `ViewAlreadyExistsException` decoding for cross-type collisions. Both subclasses now extend `V2CreateViewPreparation` and inherit `run()` unchanged. The intermediate trait keeps `V2ViewPreparation` (also used by `AlterV2ViewExec`) free of CREATE-only fields. - Drop the defensive `case _: ResolvedIdentifier =>` branch in `CreateMetricViewCommand.run` -- the catch-all internal-error case is sufficient now that `DataSourceV2Strategy` reliably intercepts the non-session path. - `SHOW CREATE TABLE` on a metric view now throws the dedicated `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000) instead of falling through to the VIEW branch. Reverts the three METRIC_VIEW additions in `tables.scala`'s VIEW gates from the previous round. Test reorg + new coverage in `MetricViewV2CatalogSuite`: - Reorganized into 5 sections: (1) Create-related, (2) Dependency extraction, (3) SELECT cases, (4) DESCRIBE cases, (5) DROP / SHOW. - Trimmed the `OR REPLACE` test's `!== firstYaml` assertion; replaced with positive assertions on the captured ViewInfo's queryText, metric_view.where, and viewDependencies (parts shape). - New `SELECT from a v2 metric view returns aggregated rows` -- exercises the `loadRelation` -> `MetadataOnlyTable(ViewInfo)` -> `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip and asserts on the aggregated output rows. - New `DESCRIBE TABLE on a v2 metric view returns the aliased columns` (non-EXTENDED variant alongside the existing EXTENDED test). - New `SHOW TABLES does not list v2 metric views` (RelationCatalog contract: tables only). - New `SHOW VIEWS lists v2 metric views`.
… TABLE coverage Add the test cases requested on PR apache#55487: SELECT read-back (section 3) -- 4 new tests modeled on `MetricViewSuite` patterns: - Existing test fixed to use `measure(count_sum)` (the v1 suite shows the required syntax for measure columns) and switched to `checkAnswer` against an equivalent raw aggregation over the source. - `SELECT measure(...) with a WHERE clause on a dimension` -- exercises a filter at the query layer. - `SELECT against a v2 metric view honors the view's pre-defined where clause` -- creates the metric view with `where = Some("count > 1")`, asserts only the matching rows surface. - `SELECT from a v2 metric view supports multiple measures with different aggregations` -- adds price_sum / price_max alongside count_sum. - `SELECT from a v2 metric view supports ORDER BY and LIMIT on measures`. DROP TABLE on a v2 metric view (section 5) -- 2 new tests: - `DROP TABLE on a v2 metric view throws EXPECT_TABLE_NOT_VIEW` -- `DropTableExec`'s `tableExists` probe returns false (RelationCatalog passive filtering), the `viewExists` fallback returns true, so the exec emits `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE`. Also asserts the metric view is still present in the catalog after the failed DROP. - `DROP TABLE IF EXISTS on a v2 metric view also throws EXPECT_TABLE_NOT_VIEW` -- IF EXISTS does not silence the wrong-type error (v1 parity). SHOW CREATE TABLE on a v2 metric view (section 5) -- 1 new test: - `SHOW CREATE TABLE on a v2 metric view is unsupported` -- routes through `DataSourceV2Strategy`'s `ResolvedPersistentView` case and throws `UNSUPPORTED_FEATURE.TABLE_OPERATION` with operation "SHOW CREATE TABLE".
…ursion, scalastyle Address four CI failures introduced by the v2 metric-view changes on this PR: 1. `SparkOperation.tableTypeString` (hive-thriftserver) did not know how to map the new `CatalogTableType.METRIC_VIEW` to a Hive JDBC table type, causing `SparkMetadataOperationSuite` `GET_TABLE_TYPES` to fail with "Unknown table type is found: CatalogTableType(METRIC_VIEW)". Map `METRIC_VIEW` to `"VIEW"` on the JDBC wire (Hive only knows `TABLE` / `VIEW`). 2. `HiveClientImpl.toHiveTableType` (hive) had the same gap, breaking 19 `HiveMetricViewSuite` cases that go through HMS persistence on the V1 session-catalog path. Map `METRIC_VIEW` to `HiveTableType.VIRTUAL_VIEW`. 3. `MetricViewRecordingCatalog.loadRelation` in `MetricViewV2CatalogSuite` recursed infinitely. `class MetricViewRecordingCatalog extends InMemoryTableCatalog with RelationCatalog` linearizes `RelationCatalog` ahead of `InMemoryTableCatalog`, so `super.loadTable` resolved to `RelationCatalog.loadTable`, whose default delegates back to `loadRelation`. Qualify the super call to `super[InMemoryTableCatalog]` to dispatch directly to the table-catalog implementation. 4. Scalastyle import-order violation in `DataSourceV2Strategy.scala`: `metricview.serde.MetricViewFactory` was placed before `execution.datasources.*`. Reorder to satisfy the Scalastyle import-order rule. Signed-off-by: chen.wang <chen.wang@databricks.com>
…ble / loadTableOrView Adapt `MetricViewV2CatalogSuite` to the renames in SPARK-56655, which closed out the still-`Evolving` v2 view API surface introduced by SPARK-52729: - `RelationCatalog` -> `TableViewCatalog` - `MetadataOnlyTable` -> `MetadataTable` - `loadRelation` -> `loadTableOrView` - `ViewCatalog` gained a new abstract `renameView(oldIdent, newIdent)` Updates: - Imports: drop `MetadataOnlyTable`, `RelationCatalog`; add `MetadataTable`, `TableViewCatalog`. - `MetricViewRecordingCatalog`: mix in `TableViewCatalog` instead of `RelationCatalog`; rename the override of `loadRelation` -> `loadTableOrView`, and `MetadataOnlyTable` -> `MetadataTable`. The qualified `super[InMemoryTableCatalog].loadTable(ident)` recursion-bypass continues to work (the default `loadTable` derived from `TableViewCatalog` still delegates to `loadTableOrView`, hence the bypass is still required). - Add a `renameView(oldIdent, newIdent)` implementation: moves the `views` map entry, the `MetricViewRecordingCatalog.capturedViews` entry, and rejects via `ViewAlreadyExistsException` / `NoSuchViewException` per the `TableViewCatalog` contract. - Update doc comments and a few test names that mentioned the old API. Signed-off-by: chen.wang <chen.wang@databricks.com>
…-trip
Five fixes to recover the metric-view test suites after the latest rebase
on apache/spark master:
1. `MetricViewV2CatalogSuite.withTestCatalogTables`: a few negative tests
pre-create a regular table at `mv` to exercise cross-type collisions,
but the previous cleanup only did `DROP VIEW IF EXISTS mv`, leaving the
table behind to poison every subsequent test in the suite (cleanup
raised `WRONG_COMMAND_FOR_OBJECT_TYPE` because mv was a TABLE, not a
VIEW). Sweep `mv` as both view and table in cleanup, wrapped in `Try`
so the symmetric "wrong command for type" errors don't escape.
2. Test assertions on `info.queryText() === yaml`: the captured queryText
includes the `\n` between the SQL `$$` markers and the YAML body, but
`MetricViewFactory.toYAML(...)` doesn't emit surrounding newlines. Trim
both sides before the equality check (3 sites).
3. `MISSING_CATALOG_ABILITY` assertion: SPARK-56655 added the `.VIEWS`
subclass; the bare error condition no longer surfaces directly. Update
the assertion to expect `MISSING_CATALOG_ABILITY.VIEWS`.
4. `MetricViewRecordingCatalog.{createView, replaceView}`: the
`TableViewCatalog` active-rejection contract requires `createView` to
throw `ViewAlreadyExistsException` (and `replaceView` to throw
`NoSuchViewException`) when a *table* sits at the ident, not just when
a view does. The previous override only checked `views`, so the
pre-table-then-create-view test never raised the cross-type collision
`CreateV2MetricViewExec` is supposed to decode into
`EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE`. Add a tables-only existence
probe (bypassing `TableViewCatalog.tableExists`'s default, which would
recurse through our own `loadTableOrView`) and reject up front.
5. `HiveExternalCatalog` HMS round-trip for metric views: `HiveTableType`
has no `METRIC_VIEW` variant -- both regular views and metric views
serialize as `VIRTUAL_VIEW`, and `HiveClientImpl.getTableOption` always
maps `VIRTUAL_VIEW` back to `CatalogTableType.VIEW`, dropping the
metric-view distinction on read. Result: `HiveMetricViewSuite` (which
reuses the same v1 `MetricViewSuite` against
`HiveExternalCatalog`) saw the YAML body parsed as SQL, raising
`PARSE_SYNTAX_ERROR` on every test.
Persist a `view.subType = METRIC_VIEW` property on write
(`tableMetaToTableProps`) and lift `tableType` back to `METRIC_VIEW`
on read (`restoreTableMetadata`) when the marker is present. Property
name follows the existing `view.*` convention so it round-trips
through HMS like other view metadata.
Also factored the `metric_view.*` column metadata preservation across
user-specified renames into a hookable `retainColumnMetadata: Boolean`
on `V2ViewPreparation` (defaults to `false`); `CreateV2MetricViewExec`
overrides it to `true`. Mirrors what
`ViewHelper.prepareTable(isMetricView = true)` already does on the v1
path, fixing the `key not found: metric_view.type` failure on the v2
"user-specified column names with comments preserve metric_view.*
metadata" test.
Signed-off-by: chen.wang <chen.wang@databricks.com>
…tructField Add three small public APIs that let v2 catalog connectors (such as the Unity Catalog Spark connector built on TableViewCatalog) round-trip a Spark StructField through external storage without reaching for private[sql] helpers or the singleton-StructType wrap workaround: - StructField.json / StructField.prettyJson: public counterparts of DataType.json / DataType.prettyJson, exposing the existing private[sql] jsonValue. - StructField.fromJson(String): companion-object parser that mirrors DataType.fromJson and is the inverse of StructField.json. - Column.fromStructField(StructField): static factory in the catalog Column interface that maps a Spark StructField (with metadata) into a connector Column (with metadataInJSON), symmetric to TableInfo.schema() which already goes the other way via CatalogV2Util.v2ColumnsToStructType. Widens DataType.parseStructField from private to private[sql] so the new StructField.fromJson companion can call it directly. Tests: - DataTypeSuite covers StructField round-trip via .json / .fromJson with metric_view-style metadata + comment, and an empty-metadata field. - CatalogV2UtilSuite covers Column.fromStructField for both the metadata-bearing and empty-metadata cases. Signed-off-by: chen.wang <chen.wang@databricks.com>
…pache#55487 Address the four new findings from cloud-fan's round 2 review of PR apache#55487 plus the corresponding CI fixes. Combines the revert of the StructField/Column public-API commit (per cloud-fan's split request) with the three remaining metric-view fixes into one commit. Cloud-fan finding #1 (`verifyAlterTableType` audit miss): * `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently succeeds (renames as if a regular table), and `ALTER VIEW <metric_view> RENAME TO ...` throws `cannotAlterTableWithAlterViewError` (wrong category for a view- kind). Widen both arms to also accept `METRIC_VIEW`. Cloud-fan finding apache#2/apache#3 (StructField commit split-out): * Revert d55143f ("Public StructField.json / .fromJson + Column.fromStructField"). None of the three new public APIs are called by code in this PR -- they exist to support the downstream Unity Catalog connector. Per cloud-fan's request, split them out into a separate PR with its own JIRA ticket so they can be reviewed independently. Reverting the commit also removes the `Column.fromStructField` divergence from `structFieldToV2Column` (which would have duplicated the comment field in `metadataInJSON`); the split-out PR fixes that inline. The revert also fixes two CI failures introduced by the same commit: - `mimaReportBinaryIssues` losing `StructField.tupled` / `StructField.curried` / `AbstractFunction4` parent on the `StructField` companion object (sql-api 4.0.0 binary surface). - `Javaunidoc` failing on the `{@link StructField.json}` / `{@link StructField.fromJson}` references that javadoc rejects (member separator must be `#`, not `.`). Cloud-fan finding apache#4 (multi-level namespace lift): * `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, which can only carry up to 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. Lift the helper to take `nameParts: Seq[String]` directly. The synthetic `CatalogTable` used as analysis context still needs a TableIdentifier (it's a v1 type); we collapse the intermediate namespace components into the synthetic `database` slot via dot-join. Since the synthetic identifier is never used to resolve the view body itself, this collapse is observationally invisible to the analyzed plan, and `verifyTemporaryObjectsNotExists` still receives the full `nameParts` so error rendering preserves the multi-part form. Update both call sites: - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1) passes `name.nameParts`. - `DataSourceV2Strategy` (v2) passes `(catalog.name() +: ident.namespace() :+ ident.name())`. Add a new test in `MetricViewV2CatalogSuite` that creates a metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create succeeds + the captured `ViewInfo` carries `PROP_TABLE_TYPE = METRIC_VIEW`. Test realignment to SPARK-56655's new contract: * SPARK-56655 (master) added several v2 view DDL behaviors that changed the error categories surfaced for cross-type collisions and added view-aware listing on `TableViewCatalog`. Realign 5 existing test assertions in `MetricViewV2CatalogSuite`: - "CREATE VIEW ... WITH METRICS over a v2 table at the ident": `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_ ALREADY_EXISTS` (master's analyzer-time pre-check fires before the exec-time decoding in `CreateV2MetricViewExec.run`). - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` -> `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", raised by master's new `DropTableExec.run`). - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES on a v2 TableViewCatalog lists both tables and metric views": SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through `listRelationSummaries` so views appear alongside tables in the output (matches v1 SHOW TABLES on a session catalog). - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing to the pre-substitution YAML body (the row carries the raw text between the SQL `$$ ... $$` markers, including the surrounding `\n`). * Add an explicit `SHOW CREATE TABLE` rejection guard for v2 metric views in `DataSourceV2Strategy`. Master's new `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not round-trippable as metric-view DDL (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE YAML AS $$ <yaml> $$`). Reject up front with `unsupportedTableOperationError("SHOW CREATE TABLE")` rather than emit lossy DDL; the existing test already expects this exact behavior. Signed-off-by: chen.wang <chen.wang@databricks.com>
5d89566 to
0b26f95
Compare
cloud-fan
left a comment
There was a problem hiding this comment.
Round 3 review.
Status: All 4 prior findings addressed (verifyAlterTableType audit miss, StructField/Column public-API split, Column.fromStructField comment-duplication via revert, multi-level namespace lift in analyzeMetricViewText), 0 remaining, 7 new (1 newly introduced, 6 late catches). The round 3 follow-up commit cleanly addresses the round 2 findings; remaining concerns center on v1/v2 inconsistencies (descriptive properties, error class, dependency-name arity) and API correctness of the new Dependency records.
I'll be honest about the 6 late catches: most have been latent since round 1 — the structural-form switch fixed the format dimension I flagged but not the arity dimension; the records I suggested have a value-semantics issue I missed; the metric_view.* merge has been on the v2 path since the original PR. Apologies for not surfacing these earlier.
| val analyzed = MetricViewHelper.analyzeMetricViewText( | ||
| session, nameParts, originalText) | ||
| val metricView = MetricViewFactory.fromYAML(originalText) | ||
| val mergedProps = properties ++ metricView.getProperties |
There was a problem hiding this comment.
The v2 path merges metric_view.from.type / metric_view.from.name / metric_view.from.sql / metric_view.where into the property bag here (properties ++ metricView.getProperties), but the v1 path in CreateMetricViewCommand.createMetricViewInSessionCatalog (metricViewCommands.scala:74-78) passes properties to prepareTable without the merge. Result: DESCRIBE TABLE EXTENDED on a session-catalog metric view doesn't show these descriptor rows; on a v2 metric view, it does.
The PR description states the merge happens "into the view's properties bag" without a v1/v2 carve-out — please confirm this is intentional, or merge in v1 too.
| // would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not a round-trippable | ||
| // metric-view DDL form (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE | ||
| // YAML AS $$ <yaml> $$`). Reject up front rather than emit lossy DDL. | ||
| throw QueryCompilationErrors.unsupportedTableOperationError( |
There was a problem hiding this comment.
The v1 path uses the dedicated UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW (tables.scala:1219 → showCreateTableNotSupportedOnMetricViewError); this v2 path uses the generic UNSUPPORTED_FEATURE.TABLE_OPERATION. Same user action, two error classes — the dedicated one carries a more actionable message ("not supported on metric view"). showCreateTableNotSupportedOnMetricViewError takes a String; you can format the multipart name the same way the case below does ((catalog.name() +: ident.asMultipartIdentifier).map(quoteIfNeeded).mkString(".")).
| case r: HiveTableRelation => | ||
| tables += r.tableMeta.identifier.nameParts | ||
| case r: LogicalRelation if r.catalogTable.isDefined => | ||
| tables += r.catalogTable.get.identifier.nameParts |
There was a problem hiding this comment.
TableIdentifier.nameParts returns 1, 2, or 3 parts depending on which of catalog / database are set on the resolved CatalogTable.identifier. The same v1 source table can yield [db, tbl] or [spark_catalog, db, tbl] across runs — the test in MetricViewV2CatalogSuite (the V1 source dependency-extraction test) admits this with parts.length >= 2 && parts.length <= 3. The TableDependency Javadoc (TableDependency.java:32-33) acknowledges it, but downstream consumers comparing dependency names structurally won't be able to rely on stable arity. Same issue applies to the View arm (line 105) and HiveTableRelation arm (line 112).
Consider normalizing v1 sources to always emit 3-part [spark_catalog, db, tbl] (prepending the session-catalog name when missing) so consumers see a stable shape per source kind. The test should then assert exactly 3 parts.
| val nameParts = (catalog.name() +: ident.namespace().toIndexedSeq) :+ ident.name() | ||
| val analyzed = MetricViewHelper.analyzeMetricViewText( | ||
| session, nameParts, originalText) | ||
| val metricView = MetricViewFactory.fromYAML(originalText) |
There was a problem hiding this comment.
MetricViewHelper.analyzeMetricViewText (called on line 345) already parses the YAML internally via MetricViewPlanner.parseYAML → MetricViewFactory.fromYAML. Re-parsing here duplicates the work. analyzeMetricViewText could return (LogicalPlan, MetricView) (or expose the parsed MetricView via the MetricViewPlaceholder it builds) so the strategy doesn't pay the parse twice.
| } | ||
|
|
||
| case DropTable(r: ResolvedIdentifier, ifExists, purge) => | ||
| val tableCatalog = r.catalog.asTableCatalog |
There was a problem hiding this comment.
Extracting r.catalog.asTableCatalog into tableCatalog is a no-op refactor unrelated to metric-view support and is not mentioned in the PR description. Please revert or split into a separate cleanup PR — keeps the metric-view PR focused.
| * @since 4.2.0 | ||
| */ | ||
| @Evolving | ||
| public record DependencyList(Dependency[] dependencies) { |
There was a problem hiding this comment.
Java records auto-generate equals/hashCode from per-field Objects.equals/Objects.hashCode, which on array fields fall through to Object.equals (reference equality). So DependencyList.of(Dependency.table("a","b")).equals(DependencyList.of(Dependency.table("a","b"))) == false. Same issue on TableDependency.nameParts (TableDependency.java:42) and FunctionDependency.nameParts (FunctionDependency.java:34).
This undermines the "structural multi-part name" intent — consumers can't dedup, compare, or use these as Map keys. Please override equals/hashCode to use Arrays.equals/Arrays.hashCode on each record (and Arrays.deepEquals/Arrays.deepHashCode on DependencyList), or expose List<...> instead of [] to inherit value semantics for free.
| where = None, | ||
| select = metricViewColumns) | ||
| val yaml = MetricViewFactory.toYAML(metricView) | ||
| // Give both columns new names, and a comment on each. Without the `retainMetadata` |
There was a problem hiding this comment.
Two test comments describe behavior in terms of the in-PR change rather than the current invariant — they age poorly post-merge:
- Lines 253-254:
// ... Without theretainMetadatafix toViewHelper.aliasPlan, the metric_view.* keys disappear here.— once merged, there is no "fix" to refer to. Rephrase as a pin:// Pins aliasPlan(retainMetadata=true): metric_view.* keys must survive a column rename with comments. - Lines 749-752 (DESCRIBE EXTENDED test):
// The "Type" row was added alongside this metric-view PR so DescribeV2ViewExec matches v1 parity...— same problem. Rephrase to describe the current invariant being pinned (e.g.DescribeV2ViewExec emits a "Type" row matching v1 parity, so users can distinguish VIEW vs METRIC_VIEW.).
What changes were proposed in this pull request?
This PR extends metric-view support to DS v2 catalogs by routing
CREATE VIEW ... WITH METRICSthrough theViewCatalog/TableViewCatalogAPIs introduced by SPARK-52729 and finalized by SPARK-56655. Third-party v2 catalogs that implementViewCatalogcan now host metric views with the same metadata fidelity as session-catalog metric views.1. V2 metric-view CREATE path -- shared with
CreateV2ViewExec. A newCreateV2MetricViewExecandCreateV2ViewExecboth extend a newV2CreateViewPreparationtrait (which itself extendsV2ViewPreparation). The trait owns the shared CREATE-siderun():viewExistsshort-circuit onIF NOT EXISTS,createOrReplaceViewforOR REPLACE, and cross-type collision decoding (ViewAlreadyExistsException->tableExists->EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE). The metric-view subclass only supplies the metric-view-specific bits (no collation, schema-modeUNSUPPORTED, typedviewDependencies,PROP_TABLE_TYPE = METRIC_VIEW,retainColumnMetadata = true) via optional hooks onV2ViewPreparation.DataSourceV2StrategyinterceptsCreateMetricViewCommandon a non-session catalog and routes to the new exec; the v1 session-catalog path stays inCreateMetricViewCommand.run.2. First-class
METRIC_VIEWtable type.CatalogTableType.METRIC_VIEWis added alongsideEXTERNAL/MANAGED/VIEW.TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW"constant for the V2 surface.view.viewWithMetricsproperty hack is removed;CatalogTable.isMetricViewcheckstableType == METRIC_VIEWdirectly.V1Table.summarizeTableTypeandV1Table.toCatalogTable(catalog, ident, ViewInfo)translate between the V2 property form and the V1 enum.HiveTableTypehas noMETRIC_VIEWvariant (both regular views and metric views serialize asVIRTUAL_VIEW).HiveExternalCatalognow persists aview.subType = METRIC_VIEWproperty on write and liftstableTypeback toMETRIC_VIEWon read, so HMS-backed metric views survive the round trip.3. Repo-wide
tableType == VIEWaudit. Promoting metric views to a distinctCatalogTableTypeopens silent regressions wherever existing code branches onVIEW. 15 sites updated to acceptMETRIC_VIEWalongsideVIEW:CatalogTable.toJsonLinkedHashMap(DESCRIBE EXTENDED rows),HiveExternalCatalog.{createTable, alterTable, restoreTableMetadata}(4 sites for HMS-backed metric views),verifyAlterTableTypeinddl.scala(soALTER TABLE <metric_view> RENAME TO ...raises the right error category instead of silently renaming).SessionCatalog.isView,InMemoryCatalog.listViews,RelationResolution(streaming-on-view rejection),Analyzer.lookupTableOrView(ResolvedPersistentViewwrapping),rules.scala(saveDataIntoView),DataStreamWriter(streaming write),DescribeRelationJsonCommand,AnalyzeColumnCommand,AnalyzePartitionCommand,CommandUtils.analyzeTable, and 3 sites intables.scala(CREATE TABLE LIKEprovider lookup,ALTER TABLE ADD COLUMNS,DESCRIBE PARTITION).SHOW CREATE TABLEon a metric view has no round-trippableCREATE VIEW ... WITH METRICSform, so it's rejected explicitly: the v1 session-catalog path raises the dedicatedUNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW; the v2 path is gated inDataSourceV2StrategywithunsupportedTableOperationError(...)(UNSUPPORTED_FEATURE.TABLE_OPERATION) beforeShowCreateV2ViewExecwould emit a misleading plainCREATE VIEW.4. Drop-command parity.
DropTableCommand(v1 path) treats bothVIEWandMETRIC_VIEWas views:DROP TABLErejects either withwrongCommandForObjectTypeError, andDROP VIEWaccepts either.V2SessionCatalog.dropTableInternalextends the existing "view rejected fromDROP TABLE" guard to coverMETRIC_VIEW.DropTableExec(post-SPARK-56655) actively rejects withWRONG_COMMAND_FOR_OBJECT_TYPE("Use DROP VIEW instead") when a view sits at the ident -- works unchanged for metric views sinceTableViewCatalog's defaultviewExistsderives fromloadTableOrViewand recognizesMetadataTable + ViewInfo.ResolveSessionCatalog'sDropViewrouting comment is clarified: v2 metric views fall through toDataSourceV2StrategyandViewCatalog.dropView.5. Typed view dependencies (
ViewInfo.viewDependencies).org.apache.spark.sql.connector.catalog:Dependency(sealed interface withDependency.table(String...)/Dependency.function(String...)factories),TableDependency,FunctionDependency,DependencyList(Dependency[]).TableDependencyandFunctionDependencycarry the dependency identifier as structural multi-part name parts (record TableDependency(String[] nameParts)), not a single dot-flattened string. Arity is preserved per source so multi-level-namespace V2 catalogs (e.g. Icebergcat.db1.db2.tbl-> 4 parts) and V1 sources without a catalog component (db.tbl-> 2 parts) round-trip without ambiguity against quoted identifiers containing literal..DependencyList's canonical constructor and accessor defensivelyclone()the array so consumers can't mutate storedViewInfodependency state.ViewInfogains aviewDependenciesfield and aViewInfo.Builder.withViewDependencies(...)setter.MetricViewHelper.collectTableDependencieswalks the analyzed plan and emits structuralSeq[Seq[String]]parts (V1 sources viaTableIdentifier.nameParts; V2 sources via(catalog.name +: namespace) :+ name).6. Multi-level-namespace targets for v2 metric views.
MetricViewHelper.analyzeMetricViewTextpreviously required aTableIdentifier, capping the metric-view target at 3 name parts. v2 metric views with multi-level-namespace targets (e.g.cat.db1.db2.mv) failed atident.asTableIdentifierwithrequiresSinglePartNamespaceError. The helper now takesnameParts: Seq[String]directly; call sites in both the v1 path (CreateMetricViewCommand) and the v2 path (DataSourceV2Strategy) updated.7.
metric_view.*descriptor properties.MetricView.getPropertiesproduces canonical descriptive properties (metric_view.from.type,metric_view.from.name/metric_view.from.sql,metric_view.where) that the create path merges into the view's properties bag, so catalog browsers and tooling can inspect the source/filter without re-parsing the YAML body. Long values are truncated toConstants.MAXIMUM_PROPERTY_SIZE; the Scaladoc ongetPropertiescalls out thatmetric_view.from.sqlis therefore a descriptive value, not a round-trippable representation -- consumers should re-read the YAML body for the full SQL.8.
ViewInfoconstructor cleanup. The metric-view-specificPROP_TABLE_TYPE = METRIC_VIEWspecial case is dropped from the genericViewInfoconstructor in favor ofproperties().putIfAbsent(...). Callers that want a more specific kind (e.g.METRIC_VIEW) callBaseBuilder.withTableType(...)beforebuild()-- exercised byCreateV2MetricViewExecvia the newV2ViewPreparation.tableTypehook.9.
ViewHelper.aliasPlan(retainMetadata). The user-specified-column-with-comment branch inaliasPlanpreviously dropped existing column metadata. A newretainMetadata: Boolean = falseparameter merges the analyzed attribute's metadata into the new comment metadata.ViewHelper.prepareTablepassesretainMetadata = isMetricView(v1 path);V2ViewPreparationexposes aretainColumnMetadatahook thatCreateV2MetricViewExecoverrides totrue(v2 path). Both preserve the per-columnmetric_view.type/metric_view.exprkeys that the analyzer attaches to dimensions and measures even when the user renames columns and adds comments.10. Error classes.
INVALID_METRIC_VIEW_YAML(sqlState 42K0L).MetricViewPlanner.parseYAML's catch blocks now route throughQueryCompilationErrors.invalidMetricViewYamlErrorinstead ofSparkException.internalError, so a typo in the user's YAML body surfaces as a user-correctableAnalysisExceptionrather than "please contact support".UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW(sqlState 0A000) for the v1 session-catalog path.11. Misc.
MetricViewCanonical.parseSourceaccepts multipart identifiers (parseMultipartIdentifier) so 3-partcatalog.schema.tablesource references work asAssetSource.Why are the changes needed?
Before this PR, metric-view DDL only worked against the session catalog: the create path called
SessionCatalog.createTabledirectly, and there was no way for a third-party v2 catalog (Unity Catalog, Hive Metastore catalog, custom REST catalogs, etc.) to own a metric view's lifecycle. SPARK-52729 / SPARK-56655 shippedViewCatalogandTableViewCatalogas the public v2 surface for catalog-managed views; metric views are a kind of view and naturally belong on this surface.Once metric views can live on a v2 catalog, two more constraints surface:
ViewCatalog.loadViewneeds to know it's a metric view, not a plain SQL view, so it can render the right UI / planner output. Encoding this inPROP_TABLE_TYPE = METRIC_VIEWkeeps the distinction wire-compatible and letsV1Table.toCatalogTablereconstructCatalogTableType.METRIC_VIEWon the read path.DependencyListofTableDependency/FunctionDependencywith structuralString[] namePartslets catalogs persist the lineage as a first-class field with full fidelity.The remaining changes (drop-command parity,
aliasPlanmetadata retention,metric_view.*properties,parseMultipartIdentifier,tableType == VIEWaudit, multi-level-namespace lift, HMS round-trip marker) are mechanical follow-ups that fall out of supporting metric views as a realCatalogTableTypeand as a v2 catalog citizen -- without them, basic operations likeDROP VIEW,DESCRIBE TABLE EXTENDED,CREATE VIEW (a COMMENT 'c') WITH METRICS ..., orCREATE VIEW cat.db1.db2.mv WITH METRICS ...would silently degrade.Does this PR introduce any user-facing change?
Yes, both for end users and for catalog plugin developers:
End users:
CREATE VIEW <ident> WITH METRICS ...now works against any v2 catalog that implementsViewCatalog, including catalogs with multi-level namespace targets. Previously it was rejected withMISSING_CATALOG_ABILITY.VIEWSfor non-session catalogs, and capped at single-level namespaces.IF NOT EXISTSandOR REPLACEare honored on the v2 path (regression vs. v1 fixed).SELECT region, measure(count_sum) FROM <mv> ..., dropped withDROP VIEW, listed viaSHOW VIEWS(and viaSHOW TABLESon aTableViewCatalog, matching v1 SHOW TABLES output), and described withDESCRIBE TABLE/DESCRIBE TABLE EXTENDED.DROP TABLEon a metric view throwsWRONG_COMMAND_FOR_OBJECT_TYPE("Use DROP VIEW instead").SHOW CREATE TABLEon a metric view throws an explicit unsupported-feature error (no round-trippable form yet).CatalogTableType.METRIC_VIEWinstead ofCatalogTableType.VIEW + view.viewWithMetrics=true. Observable inDESCRIBE TABLE EXTENDED'sTyperow and thetableTypecolumn of thetablessystem table. SQL behavior is unchanged. Hive-metastore-backed metric views also round-trip through HMS via aview.subType = METRIC_VIEWproperty marker.DROP TABLE/DROP VIEWmismatch now mentionMETRIC_VIEWalongsideVIEW.INVALID_METRIC_VIEW_YAML(user-correctable) instead of "Spark internal error, please contact support".Catalog plugin developers:
org.apache.spark.sql.connector.catalog: sealed interfaceDependencywithpermits TableDependency, FunctionDependency, both records carryingString[] nameParts.DependencyList(Dependency[])with defensive copy. All@Evolving,@since 4.2.0.ViewInfogains a typedviewDependencies()accessor andViewInfo.Builder.withViewDependencies(...)setter.TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW"constant.CatalogTableType.METRIC_VIEWenum value (v1 surface).V2ViewPreparation(private toorg.apache.spark.sql.execution.datasources.v2) gains optionalviewDependencies/tableType/retainColumnMetadatahooks.How was this patch tested?
MetricViewV2CatalogSuite-- 30 tests across 5 sections, all against an in-memoryViewCatalogtest fixture (MetricViewRecordingCatalog extends InMemoryTableCatalog with TableViewCatalog):Section 1 -- CREATE-related (11 tests):
METRIC_VIEWtable type and view text viaViewInfo.metric_view.*descriptor properties + view context (currentCatalog/currentNamespace) + captured SQL configs.SQLSourceand comment.metric_view.type/metric_view.exprin column metadata.metric_view.*metadata (pins thealiasPlan(retainMetadata = true)fix).CREATE OR REPLACE VIEW ... WITH METRICSreplaces an existing v2 metric view (asserts on the replacement's distinguishing fields: queryText,metric_view.where, dependencies).CREATE VIEW IF NOT EXISTS ... WITH METRICSis a no-op when the view exists (catalog never sees the secondcreateViewcall).CREATE VIEW ... WITH METRICSover a v2 table at the ident throwsTABLE_OR_VIEW_ALREADY_EXISTS(analyzer-time pre-check).CREATE VIEW IF NOT EXISTS ... WITH METRICSis a no-op when a v2 table sits at the ident (v1 parity).CREATE VIEW ... WITH METRICSon a non-ViewCatalogcatalog fails withMISSING_CATALOG_ABILITY.VIEWS.CREATE VIEW ... WITH METRICSat a multi-level-namespace v2 target (testcat.ns_a.ns_b.mv_deep) succeeds (pins theanalyzeMetricViewTextlift toSeq[String]).Section 2 -- Dependency extraction (5 tests):
JOINcaptures both tables as 3-partnameParts.nameParts.testcat.ns_a.ns_b.events_deep) emits 4-partnameParts.Section 3 -- SELECT cases (5 tests, modeled on
MetricViewSuitepatterns):SELECT measure(count_sum) FROM <mv> GROUP BY regionreturns aggregated rows (exercises the fullloadTableOrView->MetadataTable(ViewInfo)->V1Table.toCatalogTable(ViewInfo)->ResolveMetricViewround-trip).SELECT measure(...) WHERE region = ...-- query-layer filter on top of the view.whereclause is applied (where = Some("count > 1")filters at view-resolution time).ORDER BY measure(...) DESC LIMIT 1over the metric view.Section 4 -- DESCRIBE cases (2 tests):
DESCRIBE TABLE EXTENDEDround-trips throughloadTableOrViewand emits theView Text/Typerows (gated through theCatalogTable.toJsonLinkedHashMapVIEW gate, which now accepts METRIC_VIEW).DESCRIBE TABLE(non-EXTENDED) returns the aliased columns.Section 5 -- DROP / SHOW cases (7 tests):
DROP VIEWsucceeds on a v2 metric view.DROP VIEW IF EXISTSon a non-existent v2 metric view is a no-op.DROP TABLEon a v2 metric view throwsWRONG_COMMAND_FOR_OBJECT_TYPE("Use DROP VIEW instead", per SPARK-56655'sDropTableExec) and asserts the metric view is not deleted.DROP TABLE IF EXISTSon a v2 metric view also throws (IF EXISTSdoesn't silence the wrong-type error, v1 parity).SHOW CREATE TABLEon a v2 metric view throwsUNSUPPORTED_FEATURE.TABLE_OPERATIONwith operation "SHOW CREATE TABLE".SHOW TABLESon aTableViewCataloglists both tables and metric views (matches v1 SHOW TABLES output per SPARK-56655).SHOW VIEWSlists v2 metric views.Existing session-catalog metric-view tests (
MetricViewSuite,SimpleMetricViewSuite,HiveMetricViewSuite) and v1 path tests pass unchanged.DDLSuiteandHiveDDLSuitehad theirtableTypesenumerations updated to includeMETRIC_VIEWin two assertion lists.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Sonnet 4.7)