[fix](spill) fix data race in spill dir selection and memory tracking under-report#62441
[fix](spill) fix data race in spill dir selection and memory tracking under-report#62441wenzhenghu wants to merge 13 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run buildall |
|
run buildall |
|
run buildall |
|
run buildall |
|
run buildall |
|
run buildall |
|
run buildall |
|
run buildall |
|
run buildall |
|
/review |
There was a problem hiding this comment.
Blocking findings:
SpillDataDiris still not fully synchronized. This patch fixes the unlocked read inget_disk_usage(), butdebug_string()still reads_disk_capacity_bytes,_spill_data_limit_bytes,_spill_data_bytes, and_available_byteswithout_mutexwhile the GC thread and worker threads keep mutating them. That leaves the same class of UB in the spill dir state.SpillWriterMemoryTrackingis not a real regression test for the reported bug. The old implementation also restoredMemoryUsageback to the initial value beforewrite_block()returned, so the current assertion would pass on both the buggy and fixed code. The test needs to observe the counter whileappend()is still in progress or otherwise force the old timing window.
Critical checkpoints:
- Goal of the task: Partially achieved. The writer-side accounting fix in
spill_file_writer.cpplooks correct on the success and error paths I traced, but the data-race fix is incomplete because another accessor of the same shared fields remains unlocked. - Small, clear, focused change: Yes. The production changes are small and targeted.
- Concurrency: Not fully correct yet.
SpillDataDirstate is shared between spill workers and the GC/update thread, anddebug_string()still reads that shared state without the protecting mutex. - Special lifecycle management / static initialization: No new lifecycle or SIOF issue found in the touched code.
- Configuration changes: None.
- Compatibility / incompatible changes: None.
- Parallel code paths: Needs follow-through. The same synchronization treatment applied to
get_disk_usage()should also be applied todebug_string(). - Special conditional checks: No blocking issue found.
- Test coverage: Insufficient for the writer-memory regression; the new test does not fail on the pre-fix implementation.
- Observability: No new observability concern beyond the existing race in GC logging.
- Transaction / persistence: Not applicable.
- Data writes / modifications: The
SpillFileWriteraccounting update itself looks correct for append success/failure, and I did not find a separate write-correctness regression there. - FE/BE variable propagation: Not applicable.
- Performance: The extra mutex in
get_disk_usage()is acceptable for this path. - Other issues: None beyond the two findings above.
Because of the remaining unsynchronized accessor and the ineffective regression test, I cannot approve this PR in its current form.
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
No blocking issues found in the current patch.
Critical checkpoints:
- Goal and proof: The PR goal is to fix a real data race in spill-dir usage selection and restore accurate
MemoryUsageaccounting during synchronous spill writes. The code now guards disk-usage reads with_mutex, keeps the serialized buffer accounted untilappend()returns, and adds focused BE tests for both paths. - Scope/minimality: The change is small and focused. It only adds a locked accessor, extends an existing lock to
debug_string(), adjusts theDeferscope inSpillFileWriter::_write_internal(), and adds targeted tests. - Concurrency: Applicable.
SpillDataDir::_disk_capacity_bytes/_available_bytesare now read under the same mutex used byupdate_capacity(), so the previous unsynchronized read is removed. No new lock-ordering or heavy work-under-lock regression was introduced by the fix itself. - Lifecycle/static initialization: No special lifecycle or static initialization risks were added. Writer/file ownership remains the same, and the weak/strong reference pattern is unchanged.
- Configuration: No new config items.
- Compatibility: No incompatible API, storage-format, or rolling-upgrade concerns in this patch.
- Parallel paths: I did not find another equivalent spill-dir selection path that still reads these fields without synchronization. The spill writer accounting fix is applied on the main spill write path used by the existing operators.
- Special conditional checks: The new
status.ok()gate inside the outerDeferis appropriate because spill-dir usage and written-byte metrics should only advance after a successful append. - Test coverage: Good for the targeted bug fixes. The new tests cover concurrent capacity refresh vs. disk-usage reads and verify that memory remains accounted while the injected write delay is active. I did not run the tests locally in this review environment.
- Observability: No additional observability seems required for these fixes; existing counters and warning logs remain sufficient.
- Transaction/persistence: Not applicable.
- Data write/modification correctness: Applicable. The patch improves accounting correctness around spill writes and does not change the spill file format or publish semantics.
- FE/BE variable passing: Not applicable.
- Performance: The added mutex acquisition around
get_disk_usage()is appropriate for correctness and is not on a path that justifies a racy read. The memory-accounting scope change removes under-reporting without adding meaningful extra work. - Other issues: None blocking found from the reviewed code paths.
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Problem Summary:
There are two critical bugs in the spill-disk feature implementation:
SpillStreamManager::_get_stores_for_spill, reading_disk_capacity_byteswithout a lock concurrently with the GC thread updating it can lead to undefined behavior or torn reads.SpillWriter::_write_internal, the memory counter increment and its pairedDeferdeduction were placed inside a nested inner scope. This caused the memory to be prematurely deducted before the synchronous, time-consuming disk I/O_file_writer->append(buff)occurred, leading to significant under-reporting of memory usage.Solution:
get_disk_usagemethod guarded by_mutexinSpillDataDirfor the pipeline worker threads to fetch capacity safely.SpillWriter::_write_internalby moving theCOUNTER_UPDATEafter the disk capacity check, and unified the memory deduction and disk metric updates into a singleDeferblock at the outer scope. This guarantees that memory is accurately tracked during I/O and simplifies the error-handling paths.SpillDataDirThreadSafetyandSpillWriterMemoryTrackingto prevent regression.