Skip to content

fix(captions): prevent Metal memory exhaustion when generating subtitles#1949

Open
ManthanNimodiya wants to merge 3 commits into
CapSoftware:mainfrom
ManthanNimodiya:fix/subtitle-memory-leak
Open

fix(captions): prevent Metal memory exhaustion when generating subtitles#1949
ManthanNimodiya wants to merge 3 commits into
CapSoftware:mainfrom
ManthanNimodiya:fix/subtitle-memory-leak

Conversation

@ManthanNimodiya

@ManthanNimodiya ManthanNimodiya commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

On Apple Silicon, each WhisperState allocates ~700 MB of Metal (unified) memory. Without serialisation, rapid re-clicks on the subtitle button spawned N concurrent transcription sessions, exhausting RAM (44 GB observed for ~60 retries).

Changes:

  • Added TRANSCRIPTION_LOCK: Mutex<()> to ensure at most one transcription runs at a time
  • Acquired the lock in transcribe_audio before entering the engine match
  • Released the cached WhisperContext immediately after Whisper finishes so Metal buffers (~500 MB) are freed rather than held until the editor closes

Greptile Summary

This PR changes subtitle transcription to reduce ML memory pressure. The main changes are:

  • A global mutex serializes Whisper and Parakeet transcription.
  • Whisper transcription releases the cached context after each run.
  • The comments document the Apple Silicon Metal memory spike that motivated the change.

Confidence Score: 4/5

The cancellation path for transcription needs a fix before merging.

  • Dropping the async command can release the global slot while the blocking ML worker continues.
  • A retry can then start a second worker and recreate overlapping memory-heavy sessions.
  • The unconditional Whisper cache eviction is a smaller cross-platform performance regression.

apps/desktop/src-tauri/src/captions.rs

Important Files Changed

Filename Overview
apps/desktop/src-tauri/src/captions.rs Adds transcription serialization and Whisper cache eviction, but the lock can be released before the blocking ML worker exits if the command is cancelled.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
apps/desktop/src-tauri/src/captions.rs:1119
**Cancelled Commands Release The Slot**

When the async command is dropped while the `spawn_blocking` transcription is still running, this guard is dropped but the blocking worker keeps using the ML resources. A retry can then acquire `TRANSCRIPTION_LOCK` and start another Whisper or Parakeet worker, so cancel-and-retry or window-close-and-retry can still create overlapping transcription sessions and hit the same memory exhaustion this lock is meant to prevent.

### Issue 2 of 2
apps/desktop/src-tauri/src/captions.rs:1158-1159
**Whisper Cache Always Evicted**

This clears the cached Whisper context after every Whisper run on all platforms, although the memory issue described here is specific to Apple Silicon Metal buffers. On Windows, Linux, and Intel macOS, repeated subtitle generation now reloads the Whisper model from disk each time instead of reusing the warmed `WHISPER_CONTEXT`, causing avoidable latency and memory churn with no Metal-memory benefit.

Reviews (1): Last reviewed commit: "fix(captions): serialise transcription a..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

Context used:

  • Context used - CLAUDE.md (source)
  • Context used - AGENTS.md (source)

Comment thread apps/desktop/src-tauri/src/captions.rs Outdated
Comment on lines +1154 to +1160
// Release the cached context immediately after use so Metal buffers
// (~500 MB on Apple Silicon) are freed rather than held until the
// editor closes. The next call will reload the model as needed.
{
let mut ctx = WHISPER_CONTEXT.lock().await;
*ctx = None;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Releasing the cached context on every platform means we’ll reload the model for every Whisper run (potential perf regression on non-Apple Silicon). If the memory pressure issue is specifically Apple Silicon, consider gating this to macos/aarch64.

Suggested change
// Release the cached context immediately after use so Metal buffers
// (~500 MB on Apple Silicon) are freed rather than held until the
// editor closes. The next call will reload the model as needed.
{
let mut ctx = WHISPER_CONTEXT.lock().await;
*ctx = None;
}
// Release the cached context immediately after use so Metal buffers
// (~500 MB on Apple Silicon) are freed rather than held until the
// editor closes. The next call will reload the model as needed.
#[cfg(all(target_os = "macos", target_arch = "aarch64"))]
{
let mut ctx = WHISPER_CONTEXT.lock().await;
*ctx = None;
}

Comment thread apps/desktop/src-tauri/src/captions.rs Outdated
// WhisperState / Parakeet session exists at a time. Without this, rapid
// re-clicks spawn N concurrent sessions each consuming ~700 MB of Metal
// (unified) memory on Apple Silicon, which produced the observed 44 GB spike.
let _transcription_guard = TRANSCRIPTION_LOCK.lock().await;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thought: if rapid re-clicks also trigger redundant extract_audio_from_video work, you may want to acquire TRANSCRIPTION_LOCK earlier (before extraction) so only one click does the full pipeline at a time. Current placement still prevents concurrent model sessions, but multiple extractions can run in parallel.

Comment thread apps/desktop/src-tauri/src/captions.rs Outdated
// WhisperState / Parakeet session exists at a time. Without this, rapid
// re-clicks spawn N concurrent sessions each consuming ~700 MB of Metal
// (unified) memory on Apple Silicon, which produced the observed 44 GB spike.
let _transcription_guard = TRANSCRIPTION_LOCK.lock().await;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Cancelled Commands Release The Slot

When the async command is dropped while the spawn_blocking transcription is still running, this guard is dropped but the blocking worker keeps using the ML resources. A retry can then acquire TRANSCRIPTION_LOCK and start another Whisper or Parakeet worker, so cancel-and-retry or window-close-and-retry can still create overlapping transcription sessions and hit the same memory exhaustion this lock is meant to prevent.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src-tauri/src/captions.rs
Line: 1119

Comment:
**Cancelled Commands Release The Slot**

When the async command is dropped while the `spawn_blocking` transcription is still running, this guard is dropped but the blocking worker keeps using the ML resources. A retry can then acquire `TRANSCRIPTION_LOCK` and start another Whisper or Parakeet worker, so cancel-and-retry or window-close-and-retry can still create overlapping transcription sessions and hit the same memory exhaustion this lock is meant to prevent.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1158 to +1159
let mut ctx = WHISPER_CONTEXT.lock().await;
*ctx = None;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Whisper Cache Always Evicted

This clears the cached Whisper context after every Whisper run on all platforms, although the memory issue described here is specific to Apple Silicon Metal buffers. On Windows, Linux, and Intel macOS, repeated subtitle generation now reloads the Whisper model from disk each time instead of reusing the warmed WHISPER_CONTEXT, causing avoidable latency and memory churn with no Metal-memory benefit.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src-tauri/src/captions.rs
Line: 1158-1159

Comment:
**Whisper Cache Always Evicted**

This clears the cached Whisper context after every Whisper run on all platforms, although the memory issue described here is specific to Apple Silicon Metal buffers. On Windows, Linux, and Intel macOS, repeated subtitle generation now reloads the Whisper model from disk each time instead of reusing the warmed `WHISPER_CONTEXT`, causing avoidable latency and memory churn with no Metal-memory benefit.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment thread apps/desktop/src-tauri/src/captions.rs Outdated
.await
.map_err(|e| format!("Parakeet task panicked: {e}"))?
tokio::task::spawn_blocking(move || {
let _guard = TRANSCRIPTION_LOCK.lock().unwrap();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::sync::Mutex::lock() can return a poisoned guard if a prior transcription panicked; unwrap() would then panic and permanently break subtitles until restart. Might be worth recovering here (same applies to the Whisper lock below).

Suggested change
let _guard = TRANSCRIPTION_LOCK.lock().unwrap();
let _guard = TRANSCRIPTION_LOCK
.lock()
.unwrap_or_else(|poisoned| poisoned.into_inner());

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant