Update glm5-fp8-b300-sglang and -mtp SGLang image to v0.5.12-cu130 by Klaud-Cold · Pull Request #1421 · SemiAnalysisAI/InferenceX

Klaud-Cold · 2026-05-17T03:16:15Z

Updates SGLang image for glm5-fp8-b300-sglang and glm5-fp8-b300-sglang-mtp from v0.5.11-cu130 to v0.5.12-cu130.
\nRef #1154

Generated with Claude Code

… to v0.5.12-cu130 Ref #1154 Co-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com>

github-actions · 2026-05-17T03:16:22Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-05-17T03:20:25Z

+    - glm5-fp8-b300-sglang-mtp
+  description:
+    - "Update SGLang image from v0.5.11-cu130 to v0.5.12-cu130"
+  pr-link: XXX


🔴 The new perf-changelog entry at line 2557 has pr-link: XXX instead of an actual PR URL. This placeholder was left in by PR generation tooling — it should be https://github.com/SemiAnalysisAI/InferenceX/pull/1421 to match the convention used by every other entry in the file.

Extended reasoning...

What the bug is: The newly added entry to perf-changelog.yaml (lines 2552–2557) ends with pr-link: XXX rather than a valid GitHub PR URL. The literal string XXX is the unsubstituted placeholder from the template documented in AGENTS.md (pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX), which the PR generation tooling is expected to replace with the actual PR number before committing.

Code path that triggers it: When the changelog entry was generated for this PR, the tooling produced the template skeleton but never substituted XXX with the assigned PR number (1421). The diff committed the template as-is.

Why existing code doesn't prevent it: The Pydantic ChangelogEntry validator (utils/matrix_logic/validation.py) only requires pr_link to be a non-empty string, so XXX passes schema validation. There is no regex or URL-format check that would have caught this at CI time.

Step-by-step proof:

Open perf-changelog.yaml and look at line 2557 — it reads pr-link: XXX.

Compare with the immediately preceding entry at line 2550: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1402.

Lines 2496, 2508, 2520, 2526, 2532, 2538, 2544, and 2550 all use the full https://github.com/SemiAnalysisAI/InferenceX/pull/<num> form — this is the established convention across all ~30 recent entries (and consistent with the entire 2500+ line file).

The PR number for this change is 1421 (per the PR metadata), so the correct value is https://github.com/SemiAnalysisAI/InferenceX/pull/1421.

Impact: The changelog's traceability purpose — linking each benchmark trigger to its originating PR — is broken for this entry. A future reader running git blame or scanning the changelog cannot navigate from the entry back to the PR that introduced the SGLang image bump. It doesn't break CI or runtime behavior, but it leaves obviously incomplete metadata in a tracked, referenced artifact.

Fix: Replace pr-link: XXX on line 2557 with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1421. One-line edit.

# Conflicts: # perf-changelog.yaml

github-actions · 2026-05-17T07:21:27Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25980017364
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25980017364

github-actions · 2026-05-17T18:34:29Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25984496952
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25984496952

…0 regression sgl-project/sglang#25551: v0.5.12-cu130 ships a DeepGemm whose TMA-descriptor init crashes with CUDA_ERROR_ILLEGAL_ADDRESS during CUDA graph capture on B300 (sm_120). The crash hits the shared-experts FP8 GEMM path on the very first batch size of every TP rank. Set SGL_ENABLE_JIT_DEEPGEMM=0 in both glm5_fp8_b300.sh and glm5_fp8_b300_mtp.sh — this bypasses the JIT-compiled DeepGemm kernel and falls back to the non-JIT path, sidestepping the regression while keeping the image bump (v0.5.11-cu130 -> v0.5.12-cu130) intact. Revert SGL_ENABLE_JIT_DEEPGEMM=1 once the upstream fix lands.

Update glm5-fp8-b300-sglang and glm5-fp8-b300-sglang-mtp SGLang image…

db458da

… to v0.5.12-cu130 Ref #1154 Co-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com>

Klaud-Cold requested a review from a team May 17, 2026 03:16

Klaud-Cold added the full-sweep-enabled label May 17, 2026

Klaud-Cold requested review from jgangani and kedarpotdar-nv as code owners May 17, 2026 03:16

github-project-automation Bot added this to InferenceMAX Board May 17, 2026

Klaud-Cold mentioned this pull request May 17, 2026

[Auto] Docker Image Updates Available - 2026-04-25 #1154

Open

claude Bot reviewed May 17, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into HEAD

9c0438f

# Conflicts: # perf-changelog.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update glm5-fp8-b300-sglang and -mtp SGLang image to v0.5.12-cu130#1421

Update glm5-fp8-b300-sglang and -mtp SGLang image to v0.5.12-cu130#1421
Klaud-Cold wants to merge 3 commits into
mainfrom
claude/issue-1154-glm5-fp8-b300-sglang

Klaud-Cold commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

claude Bot May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Klaud-Cold commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

claude Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant