[CI] Disable auto CUDA arch injection to avoid duplicate gencode flags#7513
[CI] Disable auto CUDA arch injection to avoid duplicate gencode flags#7513EmmonsCurse wants to merge 3 commits into
Conversation
|
Thanks for your contribution! |
|
/skip-ci ci_iluvatar |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7513 +/- ##
==========================================
Coverage ? 73.58%
==========================================
Files ? 398
Lines ? 54988
Branches ? 8616
==========================================
Hits ? 40462
Misses ? 11817
Partials ? 2709
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/skip-ci all |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-23 22:11:56
📋 Review 摘要
PR 概述:禁用 Paddle 自动 CUDA arch 注入,避免重复 gencode flags 导致编译/链接错误。
变更范围:custom_ops/setup_ops.py(构建脚本 patch)、scripts/check_approval.sh(CI 审批检查)
影响面 Tag:CI OP
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | custom_ops/setup_ops.py:47 |
flag 匹配条件过宽,"compute_" in flag 可能误匹配非架构 flag |
| ❓ 疑问 | custom_ops/setup_ops.py:56 |
cflags=None 时 patch 不生效,是否覆盖所有触发场景需确认 |
总体评价
check_approval.sh 中统一使用 --merge-base 替换直接 diff,修复了 CI 检查在存在合并提交时误判的问题,改动正确。setup_ops.py 中的 Workaround 思路合理,但 flag 匹配逻辑建议收窄,降低误匹配风险;同时建议补充说明 cflags=None 场景是否已由第二道防线(_add_cuda_arch_flags override)兜底。
| """ | ||
| Patched version that returns empty list when user-provided gencode flags are detected. | ||
|
|
||
| This prevents Paddle from auto-adding duplicate gencode flags based on |
There was a problem hiding this comment.
🟡 建议 flag 匹配条件过于宽松,存在误判风险
当前条件 "compute_" in flag or "sm_" in flag 会对任意包含这些子串的 flag 触发(例如编译器优化参数 --use_compute_mode=sm_xx、日志参数等非架构类 flag),可能导致在不应屏蔽时意外返回空列表。
建议收窄匹配范围,使用更精确的前缀或正则:
import re
if isinstance(flag, str) and (
flag.startswith("-gencode")
or re.match(r'^-arch=compute_\d+', flag)
or re.match(r'^-code=sm_\d+', flag)
):
return []或至少改为更严格的前缀检查(如 "-arch=compute_" / "-code=sm_"),避免子串误匹配。
| return [] | ||
| return _original_get_cuda_arch_flags(cflags) | ||
|
|
||
|
|
There was a problem hiding this comment.
❓ 疑问 cflags=None 时 patch 逻辑不生效,是否已有兜底?
当前实现在 cflags 为空/None 时直接透传给原始函数,若 Paddle 内部调用 _get_cuda_arch_flags() 时不传 cflags(例如在 _add_cuda_arch_flags 内部独立调用),第一道 patch 将无效,自动 gencode 注入仍会触发。
第二道防线(_add_cuda_arch_flags = lambda self, flags: flags)依赖 extension_utils.CUDAExtension 存在该方法,但如果 Paddle 版本不满足 hasattr 条件,两道防线均失效。
建议在注释中明确说明:
cflags=None场景是否在实际调用链中不会发生;- 或者两道防线如何互补,覆盖不同的 Paddle 版本。
Motivation
Recent changes in Paddle #78704 modified the behavior of
CUDAExtension, introduced automatic CUDA architecture flag injection viaPADDLE_CUDA_ARCH_LIST, even when custom-gencodeflags are already specified.This results in duplicated CUDA arch flags during compilation, increasing binary size and potentially causing linker errors such as:
relocation truncated to fitTo maintain stable builds and avoid unnecessary code generation, a workaround is required.
Modifications
extension_utils._get_cuda_arch_flagsto return an empty list when user-defined-gencodeflags are detected, preventing Paddle from auto-injecting CUDA arch flags.CUDAExtension._add_cuda_arch_flagsto ensure no additional arch flags are appended internally.get_gencode_flags, avoiding reliance onPADDLE_CUDA_ARCH_LIST.-gencodeentries.arch=compute_xxa,code=sm_xxapairs (e.g.,90a,100a) and avoided incomplete flags likearch=compute_90a.Usage or Command
N/A
Accuracy Tests
N/A
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.