[CI] Disable auto CUDA arch injection to avoid duplicate gencode flags by EmmonsCurse · Pull Request #7513 · PaddlePaddle/FastDeploy

EmmonsCurse · 2026-04-20T07:11:44Z

Motivation

Recent changes in Paddle #78704 modified the behavior of CUDAExtension, introduced automatic CUDA architecture flag injection via PADDLE_CUDA_ARCH_LIST, even when custom -gencode flags are already specified.

This results in duplicated CUDA arch flags during compilation, increasing binary size and potentially causing linker errors such as:

relocation truncated to fit

To maintain stable builds and avoid unnecessary code generation, a workaround is required.

Modifications

Patched extension_utils._get_cuda_arch_flags to return an empty list when user-defined -gencode flags are detected, preventing Paddle from auto-injecting CUDA arch flags.
Added a secondary safeguard by overriding CUDAExtension._add_cuda_arch_flags to ensure no additional arch flags are appended internally.
Explicitly controlled CUDA architecture flags via get_gencode_flags, avoiding reliance on PADDLE_CUDA_ARCH_LIST.
Effectively disabled Paddle’s automatic CUDA arch injection mechanism to prevent duplicated -gencode entries.
Ensured correct generation of arch=compute_xxa,code=sm_xxa pairs (e.g., 90a, 100a) and avoided incomplete flags like arch=compute_90a.
Reduced the risk of compilation and linking issues (e.g., relocation overflow) caused by conflicting or duplicated CUDA arch flags.

Usage or Command

N/A

Accuracy Tests

N/A

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-04-20T07:11:51Z

Thanks for your contribution!

EmmonsCurse · 2026-04-20T07:11:54Z

/skip-ci ci_iluvatar
/skip-ci ci_hpu
/skip-ci stable_test
/skip-ci pre_ce_test

codecov-commenter · 2026-04-20T10:20:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@a0c39cc). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7513   +/-   ##
==========================================
  Coverage           ?   73.58%           
==========================================
  Files              ?      398           
  Lines              ?    54988           
  Branches           ?     8616           
==========================================
  Hits               ?    40462           
  Misses             ?    11817           
  Partials           ?     2709

Flag	Coverage Δ
GPU	`73.58% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

EmmonsCurse · 2026-04-23T13:41:01Z

/skip-ci all

PaddlePaddle-bot

🤖 AI Code Review | 2026-04-23 22:11:56

📋 Review 摘要

PR 概述：禁用 Paddle 自动 CUDA arch 注入，避免重复 gencode flags 导致编译/链接错误。
变更范围：custom_ops/setup_ops.py（构建脚本 patch）、scripts/check_approval.sh（CI 审批检查）
影响面 Tag：CI OP

问题

级别	文件	概述
🟡 建议	`custom_ops/setup_ops.py:47`	flag 匹配条件过宽，`"compute_" in flag` 可能误匹配非架构 flag
❓ 疑问	`custom_ops/setup_ops.py:56`	`cflags=None` 时 patch 不生效，是否覆盖所有触发场景需确认

总体评价

check_approval.sh 中统一使用 --merge-base 替换直接 diff，修复了 CI 检查在存在合并提交时误判的问题，改动正确。setup_ops.py 中的 Workaround 思路合理，但 flag 匹配逻辑建议收窄，降低误匹配风险；同时建议补充说明 cflags=None 场景是否已由第二道防线（_add_cuda_arch_flags override）兜底。

PaddlePaddle-bot · 2026-04-23T14:13:33Z

+    """
+    Patched version that returns empty list when user-provided gencode flags are detected.
+
+    This prevents Paddle from auto-adding duplicate gencode flags based on


🟡 建议 flag 匹配条件过于宽松，存在误判风险

当前条件 "compute_" in flag or "sm_" in flag 会对任意包含这些子串的 flag 触发（例如编译器优化参数 --use_compute_mode=sm_xx、日志参数等非架构类 flag），可能导致在不应屏蔽时意外返回空列表。

建议收窄匹配范围，使用更精确的前缀或正则：

import re if isinstance(flag, str) and ( flag.startswith("-gencode") or re.match(r'^-arch=compute_\d+', flag) or re.match(r'^-code=sm_\d+', flag) ): return []

或至少改为更严格的前缀检查（如 "-arch=compute_" / "-code=sm_"），避免子串误匹配。

PaddlePaddle-bot · 2026-04-23T14:13:33Z

+                return []
+    return _original_get_cuda_arch_flags(cflags)
+
+


❓ 疑问 cflags=None 时 patch 逻辑不生效，是否已有兜底？

当前实现在 cflags 为空/None 时直接透传给原始函数，若 Paddle 内部调用 _get_cuda_arch_flags() 时不传 cflags（例如在 _add_cuda_arch_flags 内部独立调用），第一道 patch 将无效，自动 gencode 注入仍会触发。

第二道防线（_add_cuda_arch_flags = lambda self, flags: flags）依赖 extension_utils.CUDAExtension 存在该方法，但如果 Paddle 版本不满足 hasattr 条件，两道防线均失效。

建议在注释中明确说明：

cflags=None 场景是否在实际调用链中不会发生；

或者两道防线如何互补，覆盖不同的 Paddle 版本。

[CI] Disable auto CUDA arch injection to avoid duplicate gencode flags

4d89772

EmmonsCurse had a problem deploying to Metax_ci April 20, 2026 07:11 — with GitHub Actions Error

This comment was marked as outdated.

Sign in to view

[CI] Workaround for auto CUDA arch injection

d796e33

EmmonsCurse had a problem deploying to Metax_ci April 20, 2026 08:15 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

Update check_approval.sh

11886c0

EmmonsCurse had a problem deploying to Metax_ci April 23, 2026 13:42 — with GitHub Actions Failure

PaddlePaddle-bot reviewed Apr 23, 2026

View reviewed changes

EmmonsCurse closed this Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Disable auto CUDA arch injection to avoid duplicate gencode flags#7513

[CI] Disable auto CUDA arch injection to avoid duplicate gencode flags#7513
EmmonsCurse wants to merge 3 commits into
PaddlePaddle:developfrom
EmmonsCurse:fix_build_error_in_90or100

EmmonsCurse commented Apr 20, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented Apr 20, 2026

Uh oh!

EmmonsCurse commented Apr 20, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Apr 20, 2026

Uh oh!

EmmonsCurse commented Apr 23, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Apr 23, 2026

Uh oh!

PaddlePaddle-bot Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

EmmonsCurse commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented Apr 20, 2026

Uh oh!

EmmonsCurse commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Apr 20, 2026

Codecov Report

Uh oh!

EmmonsCurse commented Apr 23, 2026

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

总体评价

Uh oh!

PaddlePaddle-bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EmmonsCurse commented Apr 20, 2026 •

edited

Loading

EmmonsCurse commented Apr 20, 2026 •

edited

Loading