Describe the bug
On occasion CI jobs will fail with this error, often caused by pip downloading huge CUDA packages. This is partly addressed with recent commits but needs to be fully fixed in conjunction with #8561.
Copilot output from one example fail:
The job is failing because the runner disk fills up while pip is trying to create an isolated build environment and pull in large CUDA-related wheels as build dependencies:
- Failure point (from logs): during “Installing build dependencies” for
pip install -e . (with BUILD_MONAI=1), pip errors with
OSError: [Errno 28] No space left on device
- The downloaded build deps include multiple large
nvidia-*cu12 packages, which is what typically blows the GitHub-hosted runner disk.
Fix: stop pip from pulling CUDA build deps (use CPU-only build deps) and/or avoid build isolation during the compiled install
1) Ensure the compiled install uses --no-build-isolation
In .github/workflows/pythonapp.yml, the “Run compiled” step currently does:
- name: Run compiled (${{ runner.os }})
run: |
python -m pip uninstall -y monai
BUILD_MONAI=1 python -m pip install -e . # compile the cpp extensions
shell: bash
Change it to:
- name: Run compiled (${{ runner.os }})
run: |
python -m pip uninstall -y monai
# avoid creating an isolated env that may pull huge CUDA build deps
BUILD_MONAI=1 python -m pip install --no-build-isolation -e .
shell: bash
Why this helps: build isolation forces pip to resolve and install whatever is listed in pyproject.toml’s build-system requirements into a temporary env. If that includes (directly or indirectly) CUDA-enabled Torch dependencies, the runner can run out of space.
2) Pin the build dependency Torch to CPU wheels in pyproject.toml (recommended)
If MONAI’s pyproject.toml build-system requires torch (common for extension builds), ensure it resolves to CPU wheels on CI by either:
- removing
torch from build-system.requires if it’s not strictly necessary at build time, or
- constraining it so it doesn’t drag CUDA packages.
Concretely, in pyproject.toml under:
[build-system]
requires = [ ... ]
Prefer a minimal set (example):
[build-system]
requires = ["setuptools", "wheel", "pybind11"]
build-backend = "setuptools.build_meta"
If torch must remain there for compilation logic, then the workflow should install the CPU torch first (you already do), and you should rely on --no-build-isolation (Fix #1) so the build uses that already-installed CPU torch instead of re-resolving a potentially CUDA variant.
3) Add a quick disk cleanup before the compiled install (optional, but useful)
You already clean /opt/hostedtoolcache in some jobs/steps; add similar cleanup right before compiling on Linux to maximize free space:
- if: runner.os == 'Linux'
name: Free disk space (Linux only)
run: |
df -h
sudo rm -rf /opt/hostedtoolcache/CodeQL || true
sudo rm -rf /opt/hostedtoolcache/go || true
sudo rm -rf /opt/hostedtoolcache/Java* || true
sudo rm -rf /opt/hostedtoolcache/Ruby* || true
df -h
shell: bash
Minimal change that should unblock PRs
If you want the smallest, highest-signal fix: apply Fix #1 (add --no-build-isolation to the compiled install step). That directly targets the failing phase shown in the logs (“Installing build dependencies”) and typically prevents pip from downloading the large nvidia-*cu12 wheels that exhaust disk space.
Describe the bug
On occasion CI jobs will fail with this error, often caused by pip downloading huge CUDA packages. This is partly addressed with recent commits but needs to be fully fixed in conjunction with #8561.
Copilot output from one example fail:
The job is failing because the runner disk fills up while
pipis trying to create an isolated build environment and pull in large CUDA-related wheels as build dependencies:pip install -e .(withBUILD_MONAI=1), pip errors withOSError: [Errno 28] No space left on devicenvidia-*cu12packages, which is what typically blows the GitHub-hosted runner disk.Fix: stop pip from pulling CUDA build deps (use CPU-only build deps) and/or avoid build isolation during the compiled install
1) Ensure the compiled install uses
--no-build-isolationIn
.github/workflows/pythonapp.yml, the “Run compiled” step currently does:Change it to:
Why this helps: build isolation forces pip to resolve and install whatever is listed in
pyproject.toml’s build-system requirements into a temporary env. If that includes (directly or indirectly) CUDA-enabled Torch dependencies, the runner can run out of space.2) Pin the build dependency Torch to CPU wheels in
pyproject.toml(recommended)If MONAI’s
pyproject.tomlbuild-system requirestorch(common for extension builds), ensure it resolves to CPU wheels on CI by either:torchfrombuild-system.requiresif it’s not strictly necessary at build time, orConcretely, in
pyproject.tomlunder:Prefer a minimal set (example):
If
torchmust remain there for compilation logic, then the workflow should install the CPU torch first (you already do), and you should rely on--no-build-isolation(Fix #1) so the build uses that already-installed CPU torch instead of re-resolving a potentially CUDA variant.3) Add a quick disk cleanup before the compiled install (optional, but useful)
You already clean
/opt/hostedtoolcachein some jobs/steps; add similar cleanup right before compiling on Linux to maximize free space:Minimal change that should unblock PRs
If you want the smallest, highest-signal fix: apply Fix #1 (add
--no-build-isolationto the compiled install step). That directly targets the failing phase shown in the logs (“Installing build dependencies”) and typically prevents pip from downloading the largenvidia-*cu12wheels that exhaust disk space.