Skip to content

Update glm5-fp8-b300-sglang and -mtp SGLang image to v0.5.12-cu130#1421

Open
Klaud-Cold wants to merge 3 commits into
mainfrom
claude/issue-1154-glm5-fp8-b300-sglang
Open

Update glm5-fp8-b300-sglang and -mtp SGLang image to v0.5.12-cu130#1421
Klaud-Cold wants to merge 3 commits into
mainfrom
claude/issue-1154-glm5-fp8-b300-sglang

Conversation

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Updates SGLang image for glm5-fp8-b300-sglang and glm5-fp8-b300-sglang-mtp from v0.5.11-cu130 to v0.5.12-cu130.
\nRef #1154

Generated with Claude Code

… to v0.5.12-cu130

Ref #1154

Co-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Comment thread perf-changelog.yaml Outdated
- glm5-fp8-b300-sglang-mtp
description:
- "Update SGLang image from v0.5.11-cu130 to v0.5.12-cu130"
pr-link: XXX
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new perf-changelog entry at line 2557 has pr-link: XXX instead of an actual PR URL. This placeholder was left in by PR generation tooling — it should be https://github.com/SemiAnalysisAI/InferenceX/pull/1421 to match the convention used by every other entry in the file.

Extended reasoning...

What the bug is: The newly added entry to perf-changelog.yaml (lines 2552–2557) ends with pr-link: XXX rather than a valid GitHub PR URL. The literal string XXX is the unsubstituted placeholder from the template documented in AGENTS.md (pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX), which the PR generation tooling is expected to replace with the actual PR number before committing.

Code path that triggers it: When the changelog entry was generated for this PR, the tooling produced the template skeleton but never substituted XXX with the assigned PR number (1421). The diff committed the template as-is.

Why existing code doesn't prevent it: The Pydantic ChangelogEntry validator (utils/matrix_logic/validation.py) only requires pr_link to be a non-empty string, so XXX passes schema validation. There is no regex or URL-format check that would have caught this at CI time.

Step-by-step proof:

  1. Open perf-changelog.yaml and look at line 2557 — it reads pr-link: XXX.
  2. Compare with the immediately preceding entry at line 2550: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1402.
  3. Lines 2496, 2508, 2520, 2526, 2532, 2538, 2544, and 2550 all use the full https://github.com/SemiAnalysisAI/InferenceX/pull/<num> form — this is the established convention across all ~30 recent entries (and consistent with the entire 2500+ line file).
  4. The PR number for this change is 1421 (per the PR metadata), so the correct value is https://github.com/SemiAnalysisAI/InferenceX/pull/1421.

Impact: The changelog's traceability purpose — linking each benchmark trigger to its originating PR — is broken for this entry. A future reader running git blame or scanning the changelog cannot navigate from the entry back to the PR that introduced the SGLang image bump. It doesn't break CI or runtime behavior, but it leaves obviously incomplete metadata in a tracked, referenced artifact.

Fix: Replace pr-link: XXX on line 2557 with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1421. One-line edit.

# Conflicts:
#	perf-changelog.yaml
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

…0 regression

sgl-project/sglang#25551: v0.5.12-cu130 ships a DeepGemm whose
TMA-descriptor init crashes with CUDA_ERROR_ILLEGAL_ADDRESS during
CUDA graph capture on B300 (sm_120). The crash hits the shared-experts
FP8 GEMM path on the very first batch size of every TP rank.

Set SGL_ENABLE_JIT_DEEPGEMM=0 in both glm5_fp8_b300.sh and
glm5_fp8_b300_mtp.sh — this bypasses the JIT-compiled DeepGemm kernel
and falls back to the non-JIT path, sidestepping the regression while
keeping the image bump (v0.5.11-cu130 -> v0.5.12-cu130) intact.

Revert SGL_ENABLE_JIT_DEEPGEMM=1 once the upstream fix lands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant