Skip to content

[CI] Disable auto CUDA arch injection to avoid duplicate gencode flags#7513

Closed
EmmonsCurse wants to merge 3 commits into
PaddlePaddle:developfrom
EmmonsCurse:fix_build_error_in_90or100
Closed

[CI] Disable auto CUDA arch injection to avoid duplicate gencode flags#7513
EmmonsCurse wants to merge 3 commits into
PaddlePaddle:developfrom
EmmonsCurse:fix_build_error_in_90or100

Conversation

@EmmonsCurse
Copy link
Copy Markdown
Collaborator

@EmmonsCurse EmmonsCurse commented Apr 20, 2026

Motivation

Recent changes in Paddle #78704 modified the behavior of CUDAExtension, introduced automatic CUDA architecture flag injection via PADDLE_CUDA_ARCH_LIST, even when custom -gencode flags are already specified.

This results in duplicated CUDA arch flags during compilation, increasing binary size and potentially causing linker errors such as:

  • relocation truncated to fit

To maintain stable builds and avoid unnecessary code generation, a workaround is required.

Modifications

  • Patched extension_utils._get_cuda_arch_flags to return an empty list when user-defined -gencode flags are detected, preventing Paddle from auto-injecting CUDA arch flags.
  • Added a secondary safeguard by overriding CUDAExtension._add_cuda_arch_flags to ensure no additional arch flags are appended internally.
  • Explicitly controlled CUDA architecture flags via get_gencode_flags, avoiding reliance on PADDLE_CUDA_ARCH_LIST.
  • Effectively disabled Paddle’s automatic CUDA arch injection mechanism to prevent duplicated -gencode entries.
  • Ensured correct generation of arch=compute_xxa,code=sm_xxa pairs (e.g., 90a, 100a) and avoided incomplete flags like arch=compute_90a.
  • Reduced the risk of compilation and linking issues (e.g., relocation overflow) caused by conflicting or duplicated CUDA arch flags.

Usage or Command

N/A

Accuracy Tests

N/A

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 20, 2026

Thanks for your contribution!

@EmmonsCurse
Copy link
Copy Markdown
Collaborator Author

EmmonsCurse commented Apr 20, 2026

/skip-ci ci_iluvatar
/skip-ci ci_hpu
/skip-ci stable_test
/skip-ci pre_ce_test

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@a0c39cc). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7513   +/-   ##
==========================================
  Coverage           ?   73.58%           
==========================================
  Files              ?      398           
  Lines              ?    54988           
  Branches           ?     8616           
==========================================
  Hits               ?    40462           
  Misses             ?    11817           
  Partials           ?     2709           
Flag Coverage Δ
GPU 73.58% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@EmmonsCurse
Copy link
Copy Markdown
Collaborator Author

/skip-ci all

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-23 22:11:56

📋 Review 摘要

PR 概述:禁用 Paddle 自动 CUDA arch 注入,避免重复 gencode flags 导致编译/链接错误。
变更范围custom_ops/setup_ops.py(构建脚本 patch)、scripts/check_approval.sh(CI 审批检查)
影响面 TagCI OP

问题

级别 文件 概述
🟡 建议 custom_ops/setup_ops.py:47 flag 匹配条件过宽,"compute_" in flag 可能误匹配非架构 flag
❓ 疑问 custom_ops/setup_ops.py:56 cflags=None 时 patch 不生效,是否覆盖所有触发场景需确认

总体评价

check_approval.sh 中统一使用 --merge-base 替换直接 diff,修复了 CI 检查在存在合并提交时误判的问题,改动正确。setup_ops.py 中的 Workaround 思路合理,但 flag 匹配逻辑建议收窄,降低误匹配风险;同时建议补充说明 cflags=None 场景是否已由第二道防线(_add_cuda_arch_flags override)兜底。

Comment thread custom_ops/setup_ops.py
"""
Patched version that returns empty list when user-provided gencode flags are detected.

This prevents Paddle from auto-adding duplicate gencode flags based on
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 flag 匹配条件过于宽松,存在误判风险

当前条件 "compute_" in flag or "sm_" in flag 会对任意包含这些子串的 flag 触发(例如编译器优化参数 --use_compute_mode=sm_xx、日志参数等非架构类 flag),可能导致在不应屏蔽时意外返回空列表。

建议收窄匹配范围,使用更精确的前缀或正则:

import re

if isinstance(flag, str) and (
    flag.startswith("-gencode")
    or re.match(r'^-arch=compute_\d+', flag)
    or re.match(r'^-code=sm_\d+', flag)
):
    return []

或至少改为更严格的前缀检查(如 "-arch=compute_" / "-code=sm_"),避免子串误匹配。

Comment thread custom_ops/setup_ops.py
return []
return _original_get_cuda_arch_flags(cflags)


Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 cflags=None 时 patch 逻辑不生效,是否已有兜底?

当前实现在 cflags 为空/None 时直接透传给原始函数,若 Paddle 内部调用 _get_cuda_arch_flags() 时不传 cflags(例如在 _add_cuda_arch_flags 内部独立调用),第一道 patch 将无效,自动 gencode 注入仍会触发。

第二道防线(_add_cuda_arch_flags = lambda self, flags: flags)依赖 extension_utils.CUDAExtension 存在该方法,但如果 Paddle 版本不满足 hasattr 条件,两道防线均失效。

建议在注释中明确说明:

  1. cflags=None 场景是否在实际调用链中不会发生;
  2. 或者两道防线如何互补,覆盖不同的 Paddle 版本。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants