Skip to content

feat: place snapshot meta-columns first (#1481)#1486

Open
aarushisingh04 wants to merge 6 commits into
databricks:mainfrom
aarushisingh04:feat/snapshot-meta-columns-first
Open

feat: place snapshot meta-columns first (#1481)#1486
aarushisingh04 wants to merge 6 commits into
databricks:mainfrom
aarushisingh04:feat/snapshot-meta-columns-first

Conversation

@aarushisingh04

Copy link
Copy Markdown
Contributor

fixes #1481

description

databricks collects statistics and enables automatic liquid clustering only on the first 32 columns of a delta table. previously snapshot meta-columns were appended after user columns, pushing them out of that window on wide tables.

this PR

  • overrides databricks__build_snapshot_table in snapshot_helpers.sql to place the meta-columns first in the SELECT list, before the user columns expanded by *
  • this guarantees the meta-columns always fall within databricks' 32-column statistics window regardless of how wide the source table is
  • allows dbt_valid_to and dbt_scd_id to benefit from automatic liquid clustering out of the box

the hard_deletes = 'new_record' path is handled correctly, placing dbt_is_deleted alongside the other meta-columns at the front. custom column names configured via snapshot_meta_column_names are respected through the existing get_snapshot_table_column_names() dispatch, consistent with the rest of snapshot_helpers.sql.

Note

this change only affects newly created snapshot tables. existing snapshots will continue to work unchanged with the old column order. If you want the new column order on an existing snapshot for the statistics/clustering benefit, you can recreate it with dbt snapshot --full-refresh but be aware this will permanently drop all historical SCD2 data. back up the table first.

checklist

  • I have run this code in development and it appears to resolve the
    stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my
    change to the "dbt-databricks next" section.

@sd-db

sd-db commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Since this changes the default column ordering for snapshot tables, best to add this behind a flag (refer to behaviour flag, you can keep default as False). This would also require a lot more testing to verify the new behaviour, how this affects heterogeneous setup (snapshots created before this have old ordering, newer ones have new ordering) and how this all maps together (and no regressions)

@sd-db sd-db added the needs more info Waiting on response from user to gather more info label Jun 2, 2026
@sd-db

sd-db commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

I have made a comment on the original issue, that there might be a better (less intrusive way) o accomplishing the desired behavior

@sd-db sd-db added the Stale label Jun 10, 2026
@sd-db

sd-db commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Hi @aarushisingh04 , I have verified an alternative approach for using dataSkippingStatsColumns which we prefer as that is less intrusive. Added all details in #1481. Marked the PR as stale and will close it in the next few days. Thanks again for contributions. If any doubts etc or anything I have missed happy to discuss here/on the issue.

@aarushisingh04

Copy link
Copy Markdown
Contributor Author

@sd-db understood, and its definitely better for me to have learnt of a cleaner approach! all good 👍

@github-actions github-actions Bot removed the Stale label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs more info Waiting on response from user to gather more info

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Snapshots technical columns should be in front of the table to get statistics automatically

2 participants