[Klaud Cold] Update glm5-fp8-b200-sglang (+mtp) SGLang image to v0.5.12-cu130#1447
[Klaud Cold] Update glm5-fp8-b200-sglang (+mtp) SGLang image to v0.5.12-cu130#1447functionstackx wants to merge 2 commits into
Conversation
Drop the 33/29-day-old pinned nightly lmsysorg/sglang:nightly-dev-cu13-20260317-1eea7448 in favor of the stable v0.5.12-cu130 release tag used by other b200 cu130 sglang recipes on main.
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
2 similar comments
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26006132068 |
| - config-keys: | ||
| - glm5-fp8-b200-sglang | ||
| - glm5-fp8-b200-sglang-mtp |
There was a problem hiding this comment.
🔴 The PR also bumps the image for glm5-fp8-b200-sglang-agentic at nvidia-master.yaml:2237 (same change as the two siblings), but the new perf-changelog entry only lists glm5-fp8-b200-sglang and glm5-fp8-b200-sglang-mtp under config-keys. Add - glm5-fp8-b200-sglang-agentic to the list (and ideally mention it in the title/description) so the changelog reflects all three bumps and downstream changelog-driven sweep generation picks up the agentic variant on merge.
Extended reasoning...
What the bug is
This PR bumps the image tag for three config entries in .github/configs/nvidia-master.yaml from lmsysorg/sglang:nightly-dev-cu13-20260317-1eea7448 to lmsysorg/sglang:v0.5.12-cu130:
glm5-fp8-b200-sglang(line 2191)glm5-fp8-b200-sglang-mtp(line 2210)glm5-fp8-b200-sglang-agentic(line 2237)
However, the new perf-changelog.yaml entry (lines 2632-2638) only lists the first two under config-keys. The PR title ("Update glm5-fp8-b200-sglang (+mtp)"), description ("33/29 days old" — only two age values), and commit message also omit the agentic sibling, strongly suggesting this is an oversight rather than an intentional exclusion.
Why it matters
Per AGENTS.md lines 113-124, every image bump in a *-master.yaml must be paired with a perf-changelog.yaml entry ("required - triggers benchmarks"). The utils/process_changelog.py helper get_config_keys_from_master() resolves the listed config-keys and feeds them to generate_sweep_configs.py test-config --config-keys .... Configs that are missing from the config-keys list are simply skipped by the changelog-driven sweep generation.
The full-sweep-enabled label on this PR happens to cover all three configs for the PR-time sweep, so functionally the PR itself will benchmark the agentic variant. But the merge-time / post-merge baseline refresh path triggered by run-sweep.yml on paths: perf-changelog.yaml keys off the added entry's config-keys — so once merged, the agentic variant's image bump will not get a baseline benchmark refresh from this changelog entry, and any downstream consumer reading the changelog history (e.g. for release notes or change tracking) will miss it.
Step-by-step proof
git show 09a23b0 -- .github/configs/nvidia-master.yaml— three blocks are modified, including the one at line 2237 forglm5-fp8-b200-sglang-agentic. The replacement string is identical to the one applied to the two siblings.perf-changelog.yamllines 2632-2638 list onlyglm5-fp8-b200-sglangandglm5-fp8-b200-sglang-mtpunderconfig-keys.- Compare with the previous dsr1 entry at lines 2614-2619 (also a multi-sibling bump from PR Update dsr1-fp8-b200-sglang and -mtp SGLang image to v0.5.12-cu130 #1416): it correctly lists both
dsr1-fp8-b200-sglanganddsr1-fp8-b200-sglang-mtp. The convention is to list every config whose image was bumped. utils/process_changelog.pyreads added entries from this file and usesconfig-keysdirectly as the--config-keysargument togenerate_sweep_configs.py. A config absent from the list is silently not benchmarked from that entry.
How to fix
Append - glm5-fp8-b200-sglang-agentic to the config-keys list in the new entry, and update the PR title/description (and optionally the changelog description string) to mention the agentic variant. The diff would be:
- config-keys:
- glm5-fp8-b200-sglang
- glm5-fp8-b200-sglang-mtp
- glm5-fp8-b200-sglang-agentic
description:
- "Update SGLang image from nightly-dev-cu13-20260317-1eea7448 (33d/29d old) to v0.5.12-cu130"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1447
Summary
glm5-fp8-b200-sglangandglm5-fp8-b200-sglang-mtpfromlmsysorg/sglang:nightly-dev-cu13-20260317-1eea7448(33/29 days old) tolmsysorg/sglang:v0.5.12-cu130(matches other b200 cu130 sglang recipes on main).Test plan
full-sweep-enabledlabel.🤖 Generated with Claude Code