Skip to content

columnar: Fix partition table reading and columnar OOB crash on generated columns#10858

Merged
ti-chi-bot[bot] merged 7 commits into
pingcap:masterfrom
JaySon-Huang:fix/partition-table-and-columnar-oob
May 25, 2026
Merged

columnar: Fix partition table reading and columnar OOB crash on generated columns#10858
ti-chi-bot[bot] merged 7 commits into
pingcap:masterfrom
JaySon-Huang:fix/partition-table-and-columnar-oob

Conversation

@JaySon-Huang
Copy link
Copy Markdown
Contributor

@JaySon-Huang JaySon-Huang commented May 22, 2026

What problem does this PR solve?

Issue Number: close #10856, close #10862

Problem Summary:

  • Partition table scan could fail to read correctly due to incorrect column handling in the disaggregated columnar read path.
  • When reading tables with generated columns through columnar, a redundant second-pass filter loop indexed column_defines with table_scan indices. Since genColumnDefinesForDisaggregatedRead already excludes generated columns, this misalignment could cause out-of-bounds access and std::bad_alloc.
  • On next-gen columnar reads with late materialization, select * on a partitioned table includes virtual column _tidb_tid (-3). createProxyReader() passed this virtual column to kvengine in table_info, but kvengine tries to decode it from memtable/row data and fails with Schema out of date: col:-3 read null for not null column.

What is changed and how it works?

columnar: fix partition table reading, generated-column OOB, and _tidb_tid proxy read

- Fix partition table reading in PhysicalTableScan and related disaggregated columnar column define handling.
- Remove redundant generated-column filter loop in genColumnDefinesForDisaggregatedReadThroughColumnar; generated columns are already excluded by genColumnDefinesForDisaggregatedRead and filled later by executeGeneratedColumnPlaceholder.
- Skip `_tidb_tid` (`MutSup::extra_table_id_col_id`) when building `table_info` in createProxyReader(), consistent with genColumnDefinesForDisaggregatedRead(); TiFlash fills it locally via AddExtraTableIDColumnTransformAction.
- Skip kvstore restore on columnar compute nodes.
- Make placement_in_sql.test reentrant.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Manual test:

# #10862: partition table late materialization with _tidb_tid
cd tests/fullstack-test-next-gen
./compose.sh exec -T tiflash-cn0 bash -c \
  'cd /tests && ENABLE_NEXT_GEN=true verbose=true ./run-test.sh fullstack-test/mpp/late_materialization_extra_table_id_column.test'

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Fix TiFlash next-gen columnar read failures on partition tables with late materialization and generated columns, including crashes caused by misaligned column defines and errors when reading virtual column `_tidb_tid`.

Summary by CodeRabbit

  • Bug Fixes

    • Improved error messages for table schema validation to provide clearer diagnostic information when schema mismatches occur between storage layers.
    • Enhanced placement policy test setup to prevent conflicts from prior test runs.
  • Refactor

    • Internal optimization of column handling and storage initialization logic for better consistency and maintainability.

Review Change Stack

@ti-chi-bot ti-chi-bot Bot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label May 22, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented May 22, 2026

@JaySon-Huang I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 22, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9d859273-0ebf-4038-9531-6a7d7725d798

📥 Commits

Reviewing files that changed from the base of the PR and between 233be75 and 4166d32.

📒 Files selected for processing (1)
  • dbms/src/Storages/StorageDisaggregatedColumnar.cpp

📝 Walkthrough

Walkthrough

Removes a redundant generated-column filter and reorders placeholder execution, aligns proxy operator headers with table-scan header, updates planner diagnostic messages/includes, refactors RNProxy reader/table-info construction and casts, conditionally skips KV/Region restore for columnar mode, and adds a test pre-drop.

Changes

Generated Columns Disaggregated Read Fix

Layer / File(s) Summary
Header initialization alignment in proxy operators
dbms/src/Storages/StorageDisaggregatedColumnar.h
RNProxyInputStream and RNProxySourceOp set header from AddExtraTableIDColumnTransformAction::getHeader() to align with genNamesAndTypesForTableScan for _tidb_tid; createProxyReader parameter renamed to physical_table_ranges.
Generated-column filtering removal and sequencing
dbms/src/Storages/StorageDisaggregatedColumnar.cpp
genColumnDefinesForDisaggregatedReadThroughColumnar() returns column_defines/extra_table_id_index from genColumnDefinesForDisaggregatedRead() directly; generated_column_infos computed after proxy wiring and passed into executeGeneratedColumnPlaceholder(...).
RNProxy reader/table-info construction and minor refactors
dbms/src/Storages/StorageDisaggregatedColumnar.cpp
Use should_skip_column_for_columnar_table_info to omit _tidb_tid and generated columns when building proxy table_info; adjust BaseBuffView bindings/constness; rename lock-guard vars; reformat error messages; fix casts and const bindings across RNProxy code paths.
Diagnostics, includes, and restore guard
dbms/src/Core/NamesAndTypes.h, dbms/src/Flash/Planner/Plans/PhysicalTableScan.cpp, dbms/src/Storages/KVStore/TMTContext.cpp
Removed unused <map>/<set> includes; added Core/NamesAndTypes.h include to PhysicalTableScan.cpp; updated schema-size and column-type mismatch exception messages to placeholder-based formats with table id and JSON dumps; TMTContext::restore skips kvstore/region_table restore when use_columnar is true.
Test precondition cleanup
tests/fullstack-test-next-gen/placement/placement_in_sql.test
Adds DROP PLACEMENT POLICY IF EXISTS evict_sata_dw; before creating the placement policy.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant PipelineExecutorContext
  participant StorageDisaggregated
  participant RNProxySourceOp
  participant executeGeneratedColumnPlaceholder
  Client->>PipelineExecutorContext: request readThroughColumnar(...)
  PipelineExecutorContext->>StorageDisaggregated: invoke readThroughColumnar
  StorageDisaggregated->>RNProxySourceOp: add proxy source operators (group_builder.addConcurrency)
  StorageDisaggregated->>executeGeneratedColumnPlaceholder: compute generated_column_infos
  StorageDisaggregated->>executeGeneratedColumnPlaceholder: executeGeneratedColumnPlaceholder(...)
  executeGeneratedColumnPlaceholder-->>PipelineExecutorContext: return filled virtual/generated columns
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • pingcap/tiflash#10850: Modifies genColumnDefinesForDisaggregatedReadThroughColumnar and related pipeline wiring in the disaggregated read-through columnar path.
  • pingcap/tiflash#10842: Refactors the disaggregated columnar RNProxy read path and proxy wiring; closely related integration work.

Suggested labels

approved, lgtm

Suggested reviewers

  • windtalker
  • yongman
  • CalvinNeo
  • JinheLin

🐇 A hop to mend the column flow,
Headers aligned so reads can go,
Filters trimmed to stop the crash,
Placeholders fill in one smooth pass,
Queries hop back—data on show!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: fixing partition table reading, columnar OOB crash on generated columns, and virtual column handling—all key issues addressed in the PR.
Description check ✅ Passed The PR description follows the template with all required sections filled: problem summary with issue numbers, detailed explanation of changes, check list with manual test provided, and a comprehensive release note.
Linked Issues check ✅ Passed The PR successfully addresses all objectives from linked issues #10856 and #10862: removes redundant generated-column filter loop causing OOB crashes, skips _tidb_tid in createProxyReader() for partition tables, and includes related fixes for kvstore restore and test reentrancy.
Out of Scope Changes check ✅ Passed All code changes are within scope: redundant generated-column filter removal (#10856), _tidb_tid skipping in createProxyReader() (#10862), kvstore restore skip for columnar compute nodes, and placement test reentrancy fix are all directly addressing the linked issues.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@JaySon-Huang JaySon-Huang requested a review from Copilot May 22, 2026 06:49
@JaySon-Huang JaySon-Huang changed the title Fix partition table reading and columnar OOB crash on generated columns [WIP] Fix partition table reading and columnar OOB crash on generated columns May 22, 2026
@ti-chi-bot ti-chi-bot Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 22, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes correctness and stability issues in the disaggregated next-gen columnar read path: (1) partition table scans needing _tidb_tid now keep stream headers consistent with the expected table-scan schema, and (2) a generated-column handling bug that could cause out-of-bounds access (manifesting as std::bad_alloc) is removed.

Changes:

  • Align RN proxy stream/source headers with AddExtraTableIDColumnTransformAction so _tidb_tid is present in headers for partition scans.
  • Remove the redundant generated-column filtering pass in genColumnDefinesForDisaggregatedReadThroughColumnar() (generated columns are already excluded upstream and filled later).
  • Improve schema/type mismatch diagnostics in PhysicalTableScan by including structured schema/header dumps.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
dbms/src/Storages/StorageDisaggregatedColumnar.h Use action.getHeader() to keep headers consistent when injecting _tidb_tid.
dbms/src/Storages/StorageDisaggregatedColumnar.cpp Remove buggy redundant generated-column filtering; rely on existing skip + placeholder fill.
dbms/src/Flash/Planner/Plans/PhysicalTableScan.cpp Enhance exception details for schema/type mismatches (adds JSON dumps).
dbms/src/Core/NamesAndTypes.h Remove unused <map>/<set> includes (but see review comment: causes downstream compile break).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dbms/src/Core/NamesAndTypes.h
@JaySon-Huang JaySon-Huang force-pushed the fix/partition-table-and-columnar-oob branch from 3f8a1f0 to 2a2027f Compare May 22, 2026 07:03
Signed-off-by: JaySon-Huang <tshent@qq.com>
…loop.

genColumnDefinesForDisaggregatedRead already excludes generated columns from
column_defines. The second-pass loop incorrectly indexed column_defines with
table_scan indices, causing misaligned columns and std::bad_alloc.

Fixes pingcap#10856
Signed-off-by: JaySon-Huang <tshent@qq.com>
@JaySon-Huang JaySon-Huang force-pushed the fix/partition-table-and-columnar-oob branch from 2a2027f to e468a83 Compare May 22, 2026 10:23
Signed-off-by: JaySon-Huang <tshent@qq.com>
@JaySon-Huang JaySon-Huang changed the title [WIP] Fix partition table reading and columnar OOB crash on generated columns columnar: Fix partition table reading and columnar OOB crash on generated columns May 22, 2026
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 22, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (1)

533-536: ⚡ Quick win

Use error-code-first DB::Exception constructor in these new throw paths.

Please switch to the fmt-style Exception(ErrorCodes::..., "msg {}", arg) form instead of building a string via fmt::format first.

Proposed fix
-        throw Exception(fmt::format("pd client error: {}", error_msg), ErrorCodes::COLUMNAR_SNAPSHOT_ERROR);
+        throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "pd client error: {}", error_msg);
...
-        throw Exception(
-            fmt::format("columnar reader error_type={}, error={}", uint8_t(columnar_reader.error_type), error_msg),
-            ErrorCodes::COLUMNAR_SNAPSHOT_ERROR);
+        throw Exception(
+            ErrorCodes::COLUMNAR_SNAPSHOT_ERROR,
+            "columnar reader error_type={}, error={}",
+            uint8_t(columnar_reader.error_type),
+            error_msg);

As per coding guidelines: "Prefer the fmt-style constructor for DB::Exception with error code first: throw Exception(ErrorCodes::SOME_CODE, "Message with {}", arg);".

Also applies to: 545-547

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 533 - 536,
Replace the ad-hoc fmt::format-based Exception throws with the error-code-first
DB::Exception constructor: instead of building error_msg and calling throw
Exception(fmt::format(...), ErrorCodes::COLUMNAR_SNAPSHOT_ERROR), call throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "pd client error: {}",
String(columnar_reader.error.buff.data, columnar_reader.error.buff.len)); do the
same for the other occurrence around the columnar reader failure (also update
the associated LOG_WARNING usage to keep the same formatted message but use the
raw buffer string as the argument).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 533-536: Replace the ad-hoc fmt::format-based Exception throws
with the error-code-first DB::Exception constructor: instead of building
error_msg and calling throw Exception(fmt::format(...),
ErrorCodes::COLUMNAR_SNAPSHOT_ERROR), call throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "pd client error: {}",
String(columnar_reader.error.buff.data, columnar_reader.error.buff.len)); do the
same for the other occurrence around the columnar reader failure (also update
the associated LOG_WARNING usage to keep the same formatted message but use the
raw buffer string as the argument).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 94cf982c-c896-45b1-8cce-f4ee4933785c

📥 Commits

Reviewing files that changed from the base of the PR and between 2a2027f and 78293ce.

📒 Files selected for processing (6)
  • dbms/src/Core/NamesAndTypes.h
  • dbms/src/Flash/Planner/Plans/PhysicalTableScan.cpp
  • dbms/src/Storages/KVStore/TMTContext.cpp
  • dbms/src/Storages/StorageDisaggregatedColumnar.cpp
  • dbms/src/Storages/StorageDisaggregatedColumnar.h
  • tests/fullstack-test-next-gen/placement/placement_in_sql.test
💤 Files with no reviewable changes (1)
  • dbms/src/Core/NamesAndTypes.h
✅ Files skipped from review due to trivial changes (2)
  • tests/fullstack-test-next-gen/placement/placement_in_sql.test
  • dbms/src/Flash/Planner/Plans/PhysicalTableScan.cpp

Signed-off-by: JaySon-Huang <tshent@qq.com>
Do not pass virtual column -3 to kvengine when creating columnar readers.
TiFlash fills _tidb_tid locally, consistent with genColumnDefinesForDisaggregatedRead().

Fixes pingcap#10862
@ti-chi-bot ti-chi-bot Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 25, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (1)

537-542: ⚡ Quick win

Use DB::Exception error-code-first fmt-style constructor.

This throw still builds a temporary string first; please switch to the project’s preferred constructor form.

Proposed fix
-        throw Exception(
-            fmt::format(
-                "columnar reader error_type={}, error={}",
-                static_cast<uint8_t>(columnar_reader.error_type),
-                error_msg),
-            ErrorCodes::COLUMNAR_SNAPSHOT_ERROR);
+        throw Exception(
+            ErrorCodes::COLUMNAR_SNAPSHOT_ERROR,
+            "columnar reader error_type={}, error={}",
+            static_cast<uint8_t>(columnar_reader.error_type),
+            error_msg);

As per coding guidelines, prefer the fmt-style constructor for DB::Exception with error code first: throw Exception(ErrorCodes::SOME_CODE, "Message with {}", arg);.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 537 - 542,
Replace the current throw that builds a temporary formatted string with the
DB::Exception fmt-style constructor by passing the error code first;
specifically change the throw using ErrorCodes::COLUMNAR_SNAPSHOT_ERROR so it
calls Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "columnar reader
error_type={}, error={}", static_cast<uint8_t>(columnar_reader.error_type),
error_msg) instead of constructing fmt::format(...) beforehand.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 537-542: Replace the current throw that builds a temporary
formatted string with the DB::Exception fmt-style constructor by passing the
error code first; specifically change the throw using
ErrorCodes::COLUMNAR_SNAPSHOT_ERROR so it calls
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "columnar reader error_type={},
error={}", static_cast<uint8_t>(columnar_reader.error_type), error_msg) instead
of constructing fmt::format(...) beforehand.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e31bd1ea-4b6a-4238-a08a-f73fc8105e8a

📥 Commits

Reviewing files that changed from the base of the PR and between 78293ce and 233be75.

📒 Files selected for processing (2)
  • dbms/src/Storages/StorageDisaggregatedColumnar.cpp
  • dbms/src/Storages/StorageDisaggregatedColumnar.h

Signed-off-by: JaySon-Huang <tshent@qq.com>
@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label May 25, 2026
@ti-chi-bot ti-chi-bot Bot added the approved label May 25, 2026
@ti-chi-bot ti-chi-bot Bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 25, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 25, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-05-25 06:15:16.383818284 +0000 UTC m=+246386.353983338: ☑️ agreed by CalvinNeo.
  • 2026-05-25 06:15:26.43048504 +0000 UTC m=+246396.400650106: ☑️ agreed by JinheLin.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 25, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CalvinNeo, JinheLin, yongman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [CalvinNeo,JinheLin,yongman]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 25, 2026

@JaySon-Huang: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-sanitizer-tsan 4166d32 link false /test pull-sanitizer-tsan
pull-sanitizer-asan 4166d32 link false /test pull-sanitizer-asan

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@JaySon-Huang
Copy link
Copy Markdown
Contributor Author

/retest-required

@ti-chi-bot ti-chi-bot Bot merged commit 2e721b3 into pingcap:master May 25, 2026
8 of 10 checks passed
@JaySon-Huang JaySon-Huang deleted the fix/partition-table-and-columnar-oob branch May 25, 2026 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

5 participants