Skip to content

[fix](filecache) avoid crash when late holder cleanup sees removed cache cell#62437

Open
freemandealer wants to merge 3 commits intoapache:masterfrom
freemandealer:pick-master-pr-8342
Open

[fix](filecache) avoid crash when late holder cleanup sees removed cache cell#62437
freemandealer wants to merge 3 commits intoapache:masterfrom
freemandealer:pick-master-pr-8342

Conversation

@freemandealer
Copy link
Copy Markdown
Contributor

@freemandealer freemandealer commented Apr 13, 2026

Problem Summary:
When FileBlocksHolder is destroyed late, the corresponding file block may already be removed or replaced in block file cache metadata. BlockFileCache::remove() can dereference a stale cache cell during duplicate cleanup and crash.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
        • Cherry-pick only. The picked commits already include BE unit-test coverage, and no local build/test was requested for this task.
  • Behavior changed:

    • No.
    • Yes.
      • Avoid crash when late holder cleanup sees a removed or replaced cache cell, add warning logs for skipped duplicate remove, and add BE unit tests for the stale/replaced-cell cleanup paths.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…ache cell

When FileBlocksHolder is destroyed late, the corresponding file block may\nhave already been removed or replaced in block file cache metadata.\n\nBlockFileCache::remove() previously assumed get_cell() always returned\nthe current cell for the file block and dereferenced it unconditionally.\nThat can crash during the duplicate cleanup path after cache metadata has\nalready been cleared.\n\nThis change keeps the fix scoped to the stop-bleeding path: if the cache\ncell is missing or no longer points to the same file block, skip the\nduplicate remove.\n\nAdd a unit test that simulates late holder cleanup after cache metadata\nhas already been manually removed from the cache.
Add warning logs for the stop-bleeding duplicate remove path so we can\ndistinguish between a missing cache cell and a replaced cache cell.\n\nThe log includes the current block hash/offset/size/type/state and, for\ncell mismatch, the block currently attached to the cache cell as well.
Add a unit test for the stop-bleeding duplicate remove path where the old\nfile block loses its cache metadata first and a new file block is then\ncreated with the same hash and offset before the old holder is destroyed.\n\nThe test verifies that late holder cleanup skips the stale remove without\nremoving the new cache entry.
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 13, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@freemandealer
Copy link
Copy Markdown
Contributor Author

run buildall

@freemandealer freemandealer changed the title branch-3.1: [fix](filecache) avoid crash when late holder cleanup sees removed cache cell [fix](filecache) avoid crash when late holder cleanup sees removed cache cell Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants