Skip to content

[fix](spill) fix data race in spill dir selection and memory tracking under-report#62441

Open
wenzhenghu wants to merge 13 commits intoapache:masterfrom
HYDCP:fix-spill-disk-small-issues
Open

[fix](spill) fix data race in spill dir selection and memory tracking under-report#62441
wenzhenghu wants to merge 13 commits intoapache:masterfrom
HYDCP:fix-spill-disk-small-issues

Conversation

@wenzhenghu
Copy link
Copy Markdown
Contributor

Problem Summary:
There are two critical bugs in the spill-disk feature implementation:

  1. Data Race in SpillDataDir: In SpillStreamManager::_get_stores_for_spill, reading _disk_capacity_bytes without a lock concurrently with the GC thread updating it can lead to undefined behavior or torn reads.
  2. Memory Tracking Under-report in SpillWriter: In SpillWriter::_write_internal, the memory counter increment and its paired Defer deduction were placed inside a nested inner scope. This caused the memory to be prematurely deducted before the synchronous, time-consuming disk I/O _file_writer->append(buff) occurred, leading to significant under-reporting of memory usage.

Solution:

  1. Added a thread-safe get_disk_usage method guarded by _mutex in SpillDataDir for the pipeline worker threads to fetch capacity safely.
  2. Optimized the memory tracking in SpillWriter::_write_internal by moving the COUNTER_UPDATE after the disk capacity check, and unified the memory deduction and disk metric updates into a single Defer block at the outer scope. This guarantees that memory is accurately tracked during I/O and simplifies the error-handling paths.
  3. Added corresponding BE unit tests SpillDataDirThreadSafety and SpillWriterMemoryTracking to prevent regression.

@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 13, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@wenzhenghu
Copy link
Copy Markdown
Contributor Author

run buildall

jacktengg
jacktengg previously approved these changes Apr 13, 2026
@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@wenzhenghu
Copy link
Copy Markdown
Contributor Author

run buildall

@github-actions github-actions Bot removed the approved Indicates a PR has been approved by one committer. label Apr 13, 2026
@wenzhenghu
Copy link
Copy Markdown
Contributor Author

run buildall

@wenzhenghu
Copy link
Copy Markdown
Contributor Author

run buildall

@wenzhenghu
Copy link
Copy Markdown
Contributor Author

run buildall

@wenzhenghu
Copy link
Copy Markdown
Contributor Author

run buildall

@wenzhenghu
Copy link
Copy Markdown
Contributor Author

run buildall

@wenzhenghu
Copy link
Copy Markdown
Contributor Author

run buildall

@wenzhenghu
Copy link
Copy Markdown
Contributor Author

run buildall

@wenzhenghu
Copy link
Copy Markdown
Contributor Author

run buildall

@wenzhenghu
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking findings:

  1. SpillDataDir is still not fully synchronized. This patch fixes the unlocked read in get_disk_usage(), but debug_string() still reads _disk_capacity_bytes, _spill_data_limit_bytes, _spill_data_bytes, and _available_bytes without _mutex while the GC thread and worker threads keep mutating them. That leaves the same class of UB in the spill dir state.
  2. SpillWriterMemoryTracking is not a real regression test for the reported bug. The old implementation also restored MemoryUsage back to the initial value before write_block() returned, so the current assertion would pass on both the buggy and fixed code. The test needs to observe the counter while append() is still in progress or otherwise force the old timing window.

Critical checkpoints:

  • Goal of the task: Partially achieved. The writer-side accounting fix in spill_file_writer.cpp looks correct on the success and error paths I traced, but the data-race fix is incomplete because another accessor of the same shared fields remains unlocked.
  • Small, clear, focused change: Yes. The production changes are small and targeted.
  • Concurrency: Not fully correct yet. SpillDataDir state is shared between spill workers and the GC/update thread, and debug_string() still reads that shared state without the protecting mutex.
  • Special lifecycle management / static initialization: No new lifecycle or SIOF issue found in the touched code.
  • Configuration changes: None.
  • Compatibility / incompatible changes: None.
  • Parallel code paths: Needs follow-through. The same synchronization treatment applied to get_disk_usage() should also be applied to debug_string().
  • Special conditional checks: No blocking issue found.
  • Test coverage: Insufficient for the writer-memory regression; the new test does not fail on the pre-fix implementation.
  • Observability: No new observability concern beyond the existing race in GC logging.
  • Transaction / persistence: Not applicable.
  • Data writes / modifications: The SpillFileWriter accounting update itself looks correct for append success/failure, and I did not find a separate write-correctness regression there.
  • FE/BE variable propagation: Not applicable.
  • Performance: The extra mutex in get_disk_usage() is acceptable for this path.
  • Other issues: None beyond the two findings above.

Because of the remaining unsynchronized accessor and the ineffective regression test, I cannot approve this PR in its current form.

Comment thread be/src/exec/spill/spill_file_manager.h
Comment thread be/test/exec/operator/spill_sort_source_operator_test.cpp
@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (30/30) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.04% (20135/37965)
Line Coverage 36.58% (189379/517664)
Region Coverage 32.84% (146994/447646)
Branch Coverage 33.98% (64360/189421)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (30/30) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.88% (26723/37179)
Line Coverage 54.89% (283273/516080)
Region Coverage 51.91% (234509/451778)
Branch Coverage 53.33% (101336/190003)

@wenzhenghu
Copy link
Copy Markdown
Contributor Author

run buildall

@wenzhenghu
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found in the current patch.

Critical checkpoints:

  • Goal and proof: The PR goal is to fix a real data race in spill-dir usage selection and restore accurate MemoryUsage accounting during synchronous spill writes. The code now guards disk-usage reads with _mutex, keeps the serialized buffer accounted until append() returns, and adds focused BE tests for both paths.
  • Scope/minimality: The change is small and focused. It only adds a locked accessor, extends an existing lock to debug_string(), adjusts the Defer scope in SpillFileWriter::_write_internal(), and adds targeted tests.
  • Concurrency: Applicable. SpillDataDir::_disk_capacity_bytes / _available_bytes are now read under the same mutex used by update_capacity(), so the previous unsynchronized read is removed. No new lock-ordering or heavy work-under-lock regression was introduced by the fix itself.
  • Lifecycle/static initialization: No special lifecycle or static initialization risks were added. Writer/file ownership remains the same, and the weak/strong reference pattern is unchanged.
  • Configuration: No new config items.
  • Compatibility: No incompatible API, storage-format, or rolling-upgrade concerns in this patch.
  • Parallel paths: I did not find another equivalent spill-dir selection path that still reads these fields without synchronization. The spill writer accounting fix is applied on the main spill write path used by the existing operators.
  • Special conditional checks: The new status.ok() gate inside the outer Defer is appropriate because spill-dir usage and written-byte metrics should only advance after a successful append.
  • Test coverage: Good for the targeted bug fixes. The new tests cover concurrent capacity refresh vs. disk-usage reads and verify that memory remains accounted while the injected write delay is active. I did not run the tests locally in this review environment.
  • Observability: No additional observability seems required for these fixes; existing counters and warning logs remain sufficient.
  • Transaction/persistence: Not applicable.
  • Data write/modification correctness: Applicable. The patch improves accounting correctness around spill writes and does not change the spill file format or publish semantics.
  • FE/BE variable passing: Not applicable.
  • Performance: The added mutex acquisition around get_disk_usage() is appropriate for correctness and is not on a path that justifies a racy read. The memory-accounting scope change removes under-reporting without adding meaningful extra work.
  • Other issues: None blocking found from the reviewed code paths.

@wenzhenghu wenzhenghu requested a review from jacktengg April 14, 2026 07:15
@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (33/33) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.03% (20135/37969)
Line Coverage 36.58% (189379/517683)
Region Coverage 32.83% (146974/447721)
Branch Coverage 33.97% (64345/189425)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (33/33) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.67% (27393/37183)
Line Coverage 57.27% (295592/516099)
Region Coverage 54.39% (245781/451853)
Branch Coverage 56.09% (106582/190007)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants