columnar: Fix partition table reading and columnar OOB crash on generated columns#10858
Conversation
|
@JaySon-Huang I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughRemoves a redundant generated-column filter and reorders placeholder execution, aligns proxy operator headers with table-scan header, updates planner diagnostic messages/includes, refactors RNProxy reader/table-info construction and casts, conditionally skips KV/Region restore for columnar mode, and adds a test pre-drop. ChangesGenerated Columns Disaggregated Read Fix
Sequence Diagram(s)sequenceDiagram
participant Client
participant PipelineExecutorContext
participant StorageDisaggregated
participant RNProxySourceOp
participant executeGeneratedColumnPlaceholder
Client->>PipelineExecutorContext: request readThroughColumnar(...)
PipelineExecutorContext->>StorageDisaggregated: invoke readThroughColumnar
StorageDisaggregated->>RNProxySourceOp: add proxy source operators (group_builder.addConcurrency)
StorageDisaggregated->>executeGeneratedColumnPlaceholder: compute generated_column_infos
StorageDisaggregated->>executeGeneratedColumnPlaceholder: executeGeneratedColumnPlaceholder(...)
executeGeneratedColumnPlaceholder-->>PipelineExecutorContext: return filled virtual/generated columns
Estimated code review effort 🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR fixes correctness and stability issues in the disaggregated next-gen columnar read path: (1) partition table scans needing _tidb_tid now keep stream headers consistent with the expected table-scan schema, and (2) a generated-column handling bug that could cause out-of-bounds access (manifesting as std::bad_alloc) is removed.
Changes:
- Align RN proxy stream/source headers with
AddExtraTableIDColumnTransformActionso_tidb_tidis present in headers for partition scans. - Remove the redundant generated-column filtering pass in
genColumnDefinesForDisaggregatedReadThroughColumnar()(generated columns are already excluded upstream and filled later). - Improve schema/type mismatch diagnostics in
PhysicalTableScanby including structured schema/header dumps.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| dbms/src/Storages/StorageDisaggregatedColumnar.h | Use action.getHeader() to keep headers consistent when injecting _tidb_tid. |
| dbms/src/Storages/StorageDisaggregatedColumnar.cpp | Remove buggy redundant generated-column filtering; rely on existing skip + placeholder fill. |
| dbms/src/Flash/Planner/Plans/PhysicalTableScan.cpp | Enhance exception details for schema/type mismatches (adds JSON dumps). |
| dbms/src/Core/NamesAndTypes.h | Remove unused <map>/<set> includes (but see review comment: causes downstream compile break). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3f8a1f0 to
2a2027f
Compare
Signed-off-by: JaySon-Huang <tshent@qq.com>
…loop. genColumnDefinesForDisaggregatedRead already excludes generated columns from column_defines. The second-pass loop incorrectly indexed column_defines with table_scan indices, causing misaligned columns and std::bad_alloc. Fixes pingcap#10856
Signed-off-by: JaySon-Huang <tshent@qq.com>
2a2027f to
e468a83
Compare
Signed-off-by: JaySon-Huang <tshent@qq.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (1)
533-536: ⚡ Quick winUse error-code-first
DB::Exceptionconstructor in these new throw paths.Please switch to the fmt-style
Exception(ErrorCodes::..., "msg {}", arg)form instead of building a string viafmt::formatfirst.Proposed fix
- throw Exception(fmt::format("pd client error: {}", error_msg), ErrorCodes::COLUMNAR_SNAPSHOT_ERROR); + throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "pd client error: {}", error_msg); ... - throw Exception( - fmt::format("columnar reader error_type={}, error={}", uint8_t(columnar_reader.error_type), error_msg), - ErrorCodes::COLUMNAR_SNAPSHOT_ERROR); + throw Exception( + ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, + "columnar reader error_type={}, error={}", + uint8_t(columnar_reader.error_type), + error_msg);As per coding guidelines: "Prefer the fmt-style constructor for
DB::Exceptionwith error code first:throw Exception(ErrorCodes::SOME_CODE, "Message with {}", arg);".Also applies to: 545-547
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 533 - 536, Replace the ad-hoc fmt::format-based Exception throws with the error-code-first DB::Exception constructor: instead of building error_msg and calling throw Exception(fmt::format(...), ErrorCodes::COLUMNAR_SNAPSHOT_ERROR), call throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "pd client error: {}", String(columnar_reader.error.buff.data, columnar_reader.error.buff.len)); do the same for the other occurrence around the columnar reader failure (also update the associated LOG_WARNING usage to keep the same formatted message but use the raw buffer string as the argument).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 533-536: Replace the ad-hoc fmt::format-based Exception throws
with the error-code-first DB::Exception constructor: instead of building
error_msg and calling throw Exception(fmt::format(...),
ErrorCodes::COLUMNAR_SNAPSHOT_ERROR), call throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "pd client error: {}",
String(columnar_reader.error.buff.data, columnar_reader.error.buff.len)); do the
same for the other occurrence around the columnar reader failure (also update
the associated LOG_WARNING usage to keep the same formatted message but use the
raw buffer string as the argument).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 94cf982c-c896-45b1-8cce-f4ee4933785c
📒 Files selected for processing (6)
dbms/src/Core/NamesAndTypes.hdbms/src/Flash/Planner/Plans/PhysicalTableScan.cppdbms/src/Storages/KVStore/TMTContext.cppdbms/src/Storages/StorageDisaggregatedColumnar.cppdbms/src/Storages/StorageDisaggregatedColumnar.htests/fullstack-test-next-gen/placement/placement_in_sql.test
💤 Files with no reviewable changes (1)
- dbms/src/Core/NamesAndTypes.h
✅ Files skipped from review due to trivial changes (2)
- tests/fullstack-test-next-gen/placement/placement_in_sql.test
- dbms/src/Flash/Planner/Plans/PhysicalTableScan.cpp
Do not pass virtual column -3 to kvengine when creating columnar readers. TiFlash fills _tidb_tid locally, consistent with genColumnDefinesForDisaggregatedRead(). Fixes pingcap#10862
There was a problem hiding this comment.
🧹 Nitpick comments (1)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (1)
537-542: ⚡ Quick winUse
DB::Exceptionerror-code-first fmt-style constructor.This throw still builds a temporary string first; please switch to the project’s preferred constructor form.
Proposed fix
- throw Exception( - fmt::format( - "columnar reader error_type={}, error={}", - static_cast<uint8_t>(columnar_reader.error_type), - error_msg), - ErrorCodes::COLUMNAR_SNAPSHOT_ERROR); + throw Exception( + ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, + "columnar reader error_type={}, error={}", + static_cast<uint8_t>(columnar_reader.error_type), + error_msg);As per coding guidelines, prefer the fmt-style constructor for
DB::Exceptionwith error code first:throw Exception(ErrorCodes::SOME_CODE, "Message with {}", arg);.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 537 - 542, Replace the current throw that builds a temporary formatted string with the DB::Exception fmt-style constructor by passing the error code first; specifically change the throw using ErrorCodes::COLUMNAR_SNAPSHOT_ERROR so it calls Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "columnar reader error_type={}, error={}", static_cast<uint8_t>(columnar_reader.error_type), error_msg) instead of constructing fmt::format(...) beforehand.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 537-542: Replace the current throw that builds a temporary
formatted string with the DB::Exception fmt-style constructor by passing the
error code first; specifically change the throw using
ErrorCodes::COLUMNAR_SNAPSHOT_ERROR so it calls
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "columnar reader error_type={},
error={}", static_cast<uint8_t>(columnar_reader.error_type), error_msg) instead
of constructing fmt::format(...) beforehand.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e31bd1ea-4b6a-4238-a08a-f73fc8105e8a
📒 Files selected for processing (2)
dbms/src/Storages/StorageDisaggregatedColumnar.cppdbms/src/Storages/StorageDisaggregatedColumnar.h
Signed-off-by: JaySon-Huang <tshent@qq.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: CalvinNeo, JinheLin, yongman The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@JaySon-Huang: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest-required |
What problem does this PR solve?
Issue Number: close #10856, close #10862
Problem Summary:
column_defineswithtable_scanindices. SincegenColumnDefinesForDisaggregatedReadalready excludes generated columns, this misalignment could cause out-of-bounds access andstd::bad_alloc.select *on a partitioned table includes virtual column_tidb_tid(-3).createProxyReader()passed this virtual column to kvengine intable_info, but kvengine tries to decode it from memtable/row data and fails withSchema out of date: col:-3 read null for not null column.What is changed and how it works?
Check List
Tests
Manual test:
Side effects
Documentation
Release note
Summary by CodeRabbit
Bug Fixes
Refactor