Skip to content

[hotfix] disk space and file descriptor leaks in ChangelogStreamHandleReaderWithCache#27969

Open
Manishnemade12 wants to merge 2 commits into
apache:masterfrom
Manishnemade12:Leaks
Open

[hotfix] disk space and file descriptor leaks in ChangelogStreamHandleReaderWithCache#27969
Manishnemade12 wants to merge 2 commits into
apache:masterfrom
Manishnemade12:Leaks

Conversation

@Manishnemade12
Copy link
Copy Markdown

@Manishnemade12 Manishnemade12 commented Apr 19, 2026

[hotfix] disk space and file descriptor leaks in ChangelogStreamHandleReaderWithCache

What is the purpose of the change

This pull request fixes severe disk space and file descriptor leaks in ChangelogStreamHandleReaderWithCache that occur during filesystem or network errors. This is a hotfix because the leaks can cause TaskManager crashes due to exhausted disk space or file descriptors and should be applied to affected stable branches as appropriate.

Previously, if a network cluster error disrupted DFS stream copying in downloadToCacheFile, the partially transferred temporary cache file was permanently left on disk, eventually causing disk full crashes on TaskManagers. Furthermore, if openAndSeek threw an IOException during channel positioning, the instantiated FileInputStream was leaked without being closed, exhausting OS file handles.

Brief change log

  • Explicitly invoke file.delete() on the target temporary block when an IOException is encountered during DFS buffer copying in downloadToCacheFile, ensuring partial temporary files are removed on failures.
  • Wrap the fin.getChannel().position(offset) call in openAndSeek with a try-catch that guarantees IOUtils.closeQuietly(fin) is called if positioning fails, preventing leaked FileInputStream instances.

Verifying this change

This change is already covered by existing tests, such as the Changelog State Backend recovery tests that implicitly exercise filesystem operations and DFS stream caching logic.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes (Improves resilience of Checkpointing recovery on TaskManagers)
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

… on seek errors in ChangelogStreamHandleReaderWithCache
@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented Apr 19, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Copy Markdown
Contributor

@spuru9 spuru9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Manishnemade12 Can you add [hotfix] to the PR and update the PR description as per the new guidelines.

@Manishnemade12 Manishnemade12 changed the title Fix : disk space and file descriptor leaks in ChangelogStreamHandleReaderWithCache [hotfix] : disk space and file descriptor leaks in ChangelogStreamHandleReaderWithCache Apr 29, 2026
@github-actions github-actions Bot added the community-reviewed PR has been reviewed by the community. label Apr 29, 2026
@Manishnemade12 Manishnemade12 changed the title [hotfix] : disk space and file descriptor leaks in ChangelogStreamHandleReaderWithCache [hotfix] disk space and file descriptor leaks in ChangelogStreamHandleReaderWithCache Apr 29, 2026
@Manishnemade12
Copy link
Copy Markdown
Author

Manishnemade12 commented Apr 29, 2026

@Manishnemade12 Can you add [hotfix] to the PR and update the PR description as per the new guidelines.

@spuru9 i did changes . can you please review it now

Copy link
Copy Markdown
Contributor

@spuru9 spuru9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need to run mvn spotless:apply

…urce cleanup in ChangelogStreamHandleReaderWithCache
@Manishnemade12
Copy link
Copy Markdown
Author

Manishnemade12 commented May 2, 2026

@spuru9 i addressed all comment , can you please took once look there
image
build is also fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants