Skip to content

Update qwen3.5-bf16-mi300x-sglang SGLang image to v0.5.12-rocm720-mi30x#1426

Merged
functionstackx merged 5 commits into
mainfrom
claude/issue-1154-qwen3.5-bf16-mi300x-sglang
May 18, 2026
Merged

Update qwen3.5-bf16-mi300x-sglang SGLang image to v0.5.12-rocm720-mi30x#1426
functionstackx merged 5 commits into
mainfrom
claude/issue-1154-qwen3.5-bf16-mi300x-sglang

Conversation

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Updates SGLang image for qwen3.5-bf16-mi300x-sglang from v0.5.10-rocm720-mi30x to v0.5.12-rocm720-mi30x.
\nRef #1154

Generated with Claude Code

Ref #1154

Co-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

3 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, straightforward SGLang image version bump.

Extended reasoning...

Overview

This PR bumps the SGLang image tag for the qwen3.5-bf16-mi300x-sglang config in .github/configs/amd-master.yaml from v0.5.10-rocm720-mi30x to v0.5.12-rocm720-mi30x and adds a corresponding entry to perf-changelog.yaml. It mirrors the pattern of several recent image-bump PRs (e.g. #1402, #1408, #1409) merged to main.

Security risks

None. This is a single image tag string change in a benchmark config plus a YAML changelog entry. No code paths, auth, secrets, or permissions are touched.

Level of scrutiny

Low. The change is mechanical, isolated to a single config block (only the image: field is touched, all sweep parameters are unchanged), and follows the established convention in this repo. The image tag follows the same naming scheme as the prior pin and as the sibling qwen3.5-bf16-mi325x-sglang entry, so the only real risk is whether the new SGLang version itself behaves correctly at runtime — which the full-sweep CI will exercise (the full-sweep-enabled label is set).

Other factors

The pr-link: XXX placeholder is consistent with the existing entry at line 2502 of perf-changelog.yaml, so this is the project's convention for auto-generated changelog stubs. No bugs were flagged by the bug-hunting system, no outstanding reviewer comments, and the timeline contains only duplicated recipe-reminder bot messages.

@github-actions
Copy link
Copy Markdown
Contributor

# Conflicts:
#	perf-changelog.yaml
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

functionstackx added a commit that referenced this pull request May 18, 2026
Three of the nine mi300x compute nodes are currently unusable:
  - chi-mi300x-033, chi-mi300x-037: down (Not responding)
  - chi-mi300x-049: drained for persistent /nvme_home disk-full
    (kept down by a watchdog re-applying State=DOWN every 10s)

Without a nodelist filter, salloc sometimes lands a job on a node
that's about to be drained or that has a half-extracted enroot dir,
causing 'pyxis: failed to create container filesystem (No space left
on device)' / 'srun: Node failure' / 'manifest unknown'-style errors
visible in PRs #1426 and #1403.

Add an explicit --nodelist of the 6 healthy nodes (mirroring how
runners/launch_b300-nv.sh:336 pins to the known-good B300 set).
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

@functionstackx
Copy link
Copy Markdown
Collaborator

/reuse-sweep-run

@functionstackx functionstackx merged commit 97ac477 into main May 18, 2026
3 of 5 checks passed
@functionstackx functionstackx deleted the claude/issue-1154-qwen3.5-bf16-mi300x-sglang branch May 18, 2026 06:10
@github-actions
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants