[Klaud Cold] Update glm5-fp8-b200-sglang (+mtp) SGLang image to v0.5.12-cu130 by functionstackx · Pull Request #1447 · SemiAnalysisAI/InferenceX

functionstackx · 2026-05-17T23:41:56Z

Summary

Bumps glm5-fp8-b200-sglang and glm5-fp8-b200-sglang-mtp from lmsysorg/sglang:nightly-dev-cu13-20260317-1eea7448 (33/29 days old) to lmsysorg/sglang:v0.5.12-cu130 (matches other b200 cu130 sglang recipes on main).

Test plan

Full sweep passes with full-sweep-enabled label.

🤖 Generated with Claude Code

Drop the 33/29-day-old pinned nightly lmsysorg/sglang:nightly-dev-cu13-20260317-1eea7448 in favor of the stable v0.5.12-cu130 release tag used by other b200 cu130 sglang recipes on main.

github-actions · 2026-05-17T23:42:03Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-17T23:42:04Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-17T23:42:04Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-17T23:42:28Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26006132068
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26006132068

claude · 2026-05-17T23:48:03Z

+- config-keys:
+    - glm5-fp8-b200-sglang
+    - glm5-fp8-b200-sglang-mtp


🔴 The PR also bumps the image for glm5-fp8-b200-sglang-agentic at nvidia-master.yaml:2237 (same change as the two siblings), but the new perf-changelog entry only lists glm5-fp8-b200-sglang and glm5-fp8-b200-sglang-mtp under config-keys. Add - glm5-fp8-b200-sglang-agentic to the list (and ideally mention it in the title/description) so the changelog reflects all three bumps and downstream changelog-driven sweep generation picks up the agentic variant on merge.

Extended reasoning...

What the bug is

This PR bumps the image tag for three config entries in .github/configs/nvidia-master.yaml from lmsysorg/sglang:nightly-dev-cu13-20260317-1eea7448 to lmsysorg/sglang:v0.5.12-cu130:

glm5-fp8-b200-sglang (line 2191)

glm5-fp8-b200-sglang-mtp (line 2210)

glm5-fp8-b200-sglang-agentic (line 2237)

However, the new perf-changelog.yaml entry (lines 2632-2638) only lists the first two under config-keys. The PR title ("Update glm5-fp8-b200-sglang (+mtp)"), description ("33/29 days old" — only two age values), and commit message also omit the agentic sibling, strongly suggesting this is an oversight rather than an intentional exclusion.

Why it matters

Per AGENTS.md lines 113-124, every image bump in a *-master.yaml must be paired with a perf-changelog.yaml entry ("required - triggers benchmarks"). The utils/process_changelog.py helper get_config_keys_from_master() resolves the listed config-keys and feeds them to generate_sweep_configs.py test-config --config-keys .... Configs that are missing from the config-keys list are simply skipped by the changelog-driven sweep generation.

The full-sweep-enabled label on this PR happens to cover all three configs for the PR-time sweep, so functionally the PR itself will benchmark the agentic variant. But the merge-time / post-merge baseline refresh path triggered by run-sweep.yml on paths: perf-changelog.yaml keys off the added entry's config-keys — so once merged, the agentic variant's image bump will not get a baseline benchmark refresh from this changelog entry, and any downstream consumer reading the changelog history (e.g. for release notes or change tracking) will miss it.

Step-by-step proof

git show 09a23b0 -- .github/configs/nvidia-master.yaml — three blocks are modified, including the one at line 2237 for glm5-fp8-b200-sglang-agentic. The replacement string is identical to the one applied to the two siblings.

perf-changelog.yaml lines 2632-2638 list only glm5-fp8-b200-sglang and glm5-fp8-b200-sglang-mtp under config-keys.

Compare with the previous dsr1 entry at lines 2614-2619 (also a multi-sibling bump from PR Update dsr1-fp8-b200-sglang and -mtp SGLang image to v0.5.12-cu130 #1416): it correctly lists both dsr1-fp8-b200-sglang and dsr1-fp8-b200-sglang-mtp. The convention is to list every config whose image was bumped.

utils/process_changelog.py reads added entries from this file and uses config-keys directly as the --config-keys argument to generate_sweep_configs.py. A config absent from the list is silently not benchmarked from that entry.

How to fix

Append - glm5-fp8-b200-sglang-agentic to the config-keys list in the new entry, and update the PR title/description (and optionally the changelog description string) to mention the agentic variant. The diff would be:

- config-keys: - glm5-fp8-b200-sglang - glm5-fp8-b200-sglang-mtp - glm5-fp8-b200-sglang-agentic description: - "Update SGLang image from nightly-dev-cu13-20260317-1eea7448 (33d/29d old) to v0.5.12-cu130" pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1447

Update glm5-fp8-b200-sglang (+mtp) SGLang image to v0.5.12-cu130

09a23b0

Drop the 33/29-day-old pinned nightly lmsysorg/sglang:nightly-dev-cu13-20260317-1eea7448 in favor of the stable v0.5.12-cu130 release tag used by other b200 cu130 sglang recipes on main.

functionstackx requested a review from a team May 17, 2026 23:41

functionstackx added the full-sweep-enabled label May 17, 2026

functionstackx requested review from jgangani and kedarpotdar-nv as code owners May 17, 2026 23:41

github-project-automation Bot added this to InferenceMAX Board May 17, 2026

chore: fill pr-link for #1447

6c3182d

functionstackx changed the title ~~Update glm5-fp8-b200-sglang (+mtp) SGLang image to v0.5.12-cu130~~ [Klaud Cold] Update glm5-fp8-b200-sglang (+mtp) SGLang image to v0.5.12-cu130 May 17, 2026

claude Bot reviewed May 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Klaud Cold] Update glm5-fp8-b200-sglang (+mtp) SGLang image to v0.5.12-cu130#1447

[Klaud Cold] Update glm5-fp8-b200-sglang (+mtp) SGLang image to v0.5.12-cu130#1447
functionstackx wants to merge 2 commits into
mainfrom
update-glm5-fp8-b200-sglang-v0.5.12

functionstackx commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

claude Bot May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented May 17, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

claude Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant