[PD] Fix PD interaction and error response by juncaipeng · Pull Request #7500 · PaddlePaddle/FastDeploy

juncaipeng · 2026-04-20T05:15:17Z

Motivation

修复 PD 分离式推理场景下的多项问题：

decode 端 tasks_list 注册时机过早，导致请求在 prefill 完成前被 batch output 处理引发空指针；
错误消息不统一，不便于 Router 层识别可重试错误；
非流式请求在 decode 抢占时缺少自动重试机制。

Modifications

将 tasks_list 注册从 preallocate_resource_in_d 延迟到 add_prefilled_request，配合 _process_batch_output 增加 None 检查，避免worker生成无用的输出导致错误；
统一错误消息前缀为 PD Error，Router 和 Serving 层据此识别 PD 错误并设置 finish_reason=pd_reschedule，便于roter支持重调度；
Router 新增 preempt_retry_count / preempt_retry_exclude_decode 参数，非流式请求 decode 抢占时自动重试；
serving_chat 错误路径保留已生成 token，返回部分结果而非直接抛异常；
splitwise_connector 修复send_cache_info_to_prefill 逻辑错误，避免资源不足后不会及时通知P实例。

paddle-bot · 2026-04-20T05:15:22Z

Thanks for your contribution!

Copilot

Pull request overview

该 PR 旨在改进 Splitwise（Prefill/Decode 分离）模式下的 PD 交互与错误返回：在发生 decode 侧资源不足/抢占等情况时，能更一致地向上游传递“PD Error”并尝试在 Router 侧做重试与更友好的错误响应。

Changes:

Router 增加 splitwise 模式下的 preempt 重试能力，并新增对应 CLI 参数。
PD 链路中统一/增强错误透传：decode->prefill 的 cache_sync 发送逻辑、以及引擎侧/输出侧错误码与错误文案。
OpenAI 协议层扩展 finish_reason，API 层在错误响应时尽量返回已生成内容并标记 pd_reschedule。

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
fastdeploy/splitwise/splitwise_connector.py	调整 decode 向 prefill 发送 cache_sync 的聚合/发送逻辑，扩大错误通知覆盖面。
fastdeploy/router/router.py	增加 preempt 重试参数与重试主流程（含可选排除上次 decode 实例）。
fastdeploy/output/token_processor.py	prefill 发送 cache 失败时的错误码与错误信息调整（引入 “PD Error” 文案）。
fastdeploy/input/base_processor.py	遇到错误响应时跳过 token 解码，直接上抛给上游处理。
fastdeploy/entrypoints/openai/serving_chat.py	非流式错误场景下补齐 outputs 并返回已生成文本；新增 pd_reschedule finish_reason 判定。
fastdeploy/entrypoints/openai/protocol.py	扩展 OpenAI 协议 finish_reason 可选值：pd_reschedule。
fastdeploy/engine/common_engine.py	PD 相关错误日志/错误响应文案与错误码调整（含 preempted 场景）。

codecov-commenter · 2026-04-21T04:12:09Z

Codecov Report

❌ Patch coverage is 41.88034% with 68 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@9236d0c). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/router/router.py	20.00%	51 Missing and 1 partial ⚠️
fastdeploy/entrypoints/openai/serving_chat.py	57.89%	5 Missing and 3 partials ⚠️
fastdeploy/input/base_processor.py	28.57%	3 Missing and 2 partials ⚠️
...astdeploy/entrypoints/openai/serving_completion.py	75.00%	1 Missing and 1 partial ⚠️
fastdeploy/splitwise/splitwise_connector.py	75.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7500   +/-   ##
==========================================
  Coverage           ?   72.23%           
==========================================
  Files              ?      419           
  Lines              ?    57845           
  Branches           ?     9072           
==========================================
  Hits               ?    41785           
  Misses             ?    13210           
  Partials           ?     2850

Flag	Coverage Δ
GPU	`72.23% <41.88%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Jiang-Jia-Jun · 2026-04-21T08:15:28Z

-                        raise ValueError("{}".format(data["error_msg"]))
                    idx = int(data["request_id"].split("_")[-1])
-                    # api_server_logger.debug(f"Client {request_id} received: {data}")
+                    if data.get("error_code", 200) != 200:


此处直接解码成文本有风险，在正常的解码逻辑中，会处理乱码（即单独一个token解码为乱码，连续解码才正常）

只改动了serving_chat，completion接口没适配

建议此处不用解码，而是直接返回

标识符表明此请求是重调度，应该finish_reason可以标识

直接不做解码返回，即text=""，增加返回completion_token_ids

Router模块在收到对应返回时

生成新的请求（结构体内容与原请求一致，不管是chat或者completion）

请求中增加字段generated_token_ids（内容赋值为收到的completion_token_ids，目前所有多模、含内部模型已支持，开源模型@ liyukun 待会儿提上来 )

这样两个接口都兼容，同时复用内部原有逻辑

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

fastdeploy/engine/sched/resource_manager_v1.py:1520

preallocate_resource_in_d 不再把 request 写入 tasks_list，而 add_prefilled_request 才写入。这样 decode 侧在“已分配 block 但尚未收到 prefill 首 token”阶段，update_metrics/available_gpu_block_num 等统计会漏算这些已占用的 block（因为当前实现只从 tasks_list 收集 block_tables），可能导致监控指标显著偏乐观，排查资源问题时被误导。建议至少在 metrics 统计时改为从 self.requests（或其它能覆盖预分配请求的集合）聚合 block_tables，或在预分配阶段记录占用以保证指标准确。

            request.block_tables = self._allocate_gpu_blocks(request, need_prealloc_prefill_blocks)
            request.num_computed_tokens = request.need_prefill_tokens
            request.disaggregate_info["block_tables"] = request.block_tables
            allocated_position = self.get_available_position()
            request.idx = allocated_position
            self.stop_flags[request.idx] = False
            self.requests[request.request_id] = request
            self.req_dict[request.request_id] = allocated_position
        return True

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-04-24 15:32:20

📋 Review 摘要

PR 概述：修复 PD 分离式推理中的多项问题，包括 decode 端 tasks_list 注册时机、错误消息统一化、非流式请求抢占自动重试及相关 serving/connector 逻辑修复。
变更范围：fastdeploy/engine/、fastdeploy/entrypoints/openai/、fastdeploy/router/、fastdeploy/splitwise/、fastdeploy/input/、fastdeploy/output/
影响面 Tag：[PD Disaggregation] [Engine] [APIServer]

📝 PR 规范检查

标题使用了非官方 Tag [PD]，官方 Tag 列表中最近义为 [BugFix]（主要为 Bug 修复）或 [PD Disaggregation]；PR 描述缺少 Usage or Command、Accuracy Tests、Checklist 段落。

标题建议（可直接复制）：

[BugFix] Fix PD interaction race condition and error response handling

PR 描述建议（可直接复制）：

## Motivation
修复 PD 分离式推理场景下的多项问题：
1. decode 端 tasks_list 注册时机过早，导致请求在 prefill 完成前被 batch output 处理引发空指针；
2. 错误消息不统一，不便于 Router 层识别可重试错误；
3. 非流式请求在 decode 抢占时缺少自动重试机制；
4. splitwise_connector 中 send_cache_info_to_prefill 存在逻辑错误，资源不足后未能及时通知 P 实例。

## Modifications
1. `resource_manager_v1.py`：将 `tasks_list` 注册从 `preallocate_resource_in_d` 延迟到 `add_prefilled_request`，并在 `_process_batch_output` 增加 None 检查；
2. `common_engine.py` / `token_processor.py`：统一错误消息前缀为 `PD Error`，Router 和 Serving 层据此识别 PD 错误并设置 `finish_reason=pd_reschedule`；
3. `router.py`：新增 `preempt_retry_count` / `preempt_retry_exclude_decode` 参数，非流式请求 decode 抢占时自动重试；
4. `serving_chat.py` / `serving_completion.py`：错误路径保留已生成 token，返回部分结果而非直接抛异常；
5. `splitwise_connector.py`：修复 `send_cache_info_to_prefill` 逻辑，确保资源不足时也能及时通知 P 实例。

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.

问题

级别	文件	概述
🟡 建议	`fastdeploy/entrypoints/openai/serving_chat.py:661`	错误路径 `completion_tokens` 值为空字符串，与正常路径类型不一致
🟡 建议	`fastdeploy/entrypoints/openai/serving_completion.py:331`	错误路径 `completion_tokens` 值为空字符串，与正常路径类型不一致

总体评价

本 PR 修复了 PD 分离场景下多个关键问题（tasks_list 注册时机、None 检查、错误消息统一、connector 通知逻辑），整体设计合理，retry 机制与 completion_token_ids 传递链路设计合理，测试覆盖较为完整，无阻塞性问题。有 2 处 completion_tokens 类型一致性细节建议改进。

PaddlePaddle-bot · 2026-04-24T07:39:50Z

+                        # Error response - include already-generated tokens in the response
+                        data["outputs"] = {
+                            "text": "",
+                            "completion_tokens": "",


🟡 建议 completion_tokens 值为空字符串 ""，应改为 None 或 0。

当 finish_reason == "pd_reschedule" 时，响应会返回 completion_token_ids（token 列表）但 completion_tokens（token 计数）为 ""，二者语义不一致，可能误导 API 调用方。

建议修改为：

"completion_tokens": None,

PaddlePaddle-bot · 2026-04-24T07:39:50Z

-                        raise ValueError("{}".format(data["error_msg"]))
+                        data["outputs"] = {
+                            "text": "",
+                            "completion_tokens": "",


🟡 建议 completion_tokens 值为空字符串 ""，应改为 None 或 0。

错误路径返回的 outputs 中 completion_tokens 类型不一致（正常路径为数字/None），建议统一为 None：

"completion_tokens": None,

Copilot AI review requested due to automatic review settings April 20, 2026 05:15

juncaipeng had a problem deploying to Metax_ci April 20, 2026 05:15 — with GitHub Actions Failure

Copilot started reviewing on behalf of juncaipeng April 20, 2026 05:15 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

juncaipeng force-pushed the refine_pd branch from 12bc2ef to cdf030c Compare April 20, 2026 10:05

juncaipeng had a problem deploying to Metax_ci April 20, 2026 10:05 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

juncaipeng closed this Apr 21, 2026

juncaipeng reopened this Apr 21, 2026

juncaipeng force-pushed the refine_pd branch from cdf030c to 461f037 Compare April 21, 2026 02:42

juncaipeng had a problem deploying to Metax_ci April 21, 2026 02:42 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

Copilot AI review requested due to automatic review settings April 21, 2026 05:09

juncaipeng force-pushed the refine_pd branch from 461f037 to a4a201c Compare April 21, 2026 05:09

juncaipeng had a problem deploying to Metax_ci April 21, 2026 05:09 — with GitHub Actions Failure

Copilot started reviewing on behalf of juncaipeng April 21, 2026 05:09 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread fastdeploy/output/token_processor.py Outdated

Comment thread tests/output/test_token_processor.py

Comment thread fastdeploy/engine/common_engine.py Outdated

Comment thread fastdeploy/entrypoints/openai/protocol.py Outdated

Comment thread tests/splitwise/test_splitwise_connector.py

This comment was marked as outdated.

Sign in to view

juncaipeng force-pushed the refine_pd branch from a4a201c to 954a83d Compare April 21, 2026 07:19

juncaipeng had a problem deploying to Metax_ci April 21, 2026 07:19 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

Jiang-Jia-Jun requested changes Apr 21, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 22, 2026 02:43

juncaipeng force-pushed the refine_pd branch from 954a83d to e0d6c1b Compare April 22, 2026 02:43

juncaipeng had a problem deploying to Metax_ci April 22, 2026 02:43 — with GitHub Actions Failure

Copilot started reviewing on behalf of juncaipeng April 22, 2026 02:44 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread fastdeploy/output/token_processor.py

Comment thread tests/splitwise/test_splitwise_connector.py

Comment thread fastdeploy/router/router.py

Comment thread fastdeploy/router/router.py

Comment thread fastdeploy/input/base_processor.py

This comment was marked as outdated.

Sign in to view

juncaipeng force-pushed the refine_pd branch from e0d6c1b to 50d8e9e Compare April 22, 2026 09:54

juncaipeng had a problem deploying to Metax_ci April 22, 2026 09:54 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

Copilot AI review requested due to automatic review settings April 23, 2026 13:24

juncaipeng force-pushed the refine_pd branch from 50d8e9e to 0023ced Compare April 23, 2026 13:24

juncaipeng had a problem deploying to Metax_ci April 23, 2026 13:24 — with GitHub Actions Error

Copilot started reviewing on behalf of juncaipeng April 23, 2026 13:24 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread fastdeploy/entrypoints/openai/serving_chat.py

Comment thread fastdeploy/engine/sched/resource_manager_v1.py

Comment thread tests/splitwise/test_splitwise_connector.py

Comment thread fastdeploy/router/router.py

Comment thread fastdeploy/router/router.py

juncaipeng force-pushed the refine_pd branch from 0023ced to edff727 Compare April 23, 2026 13:45

juncaipeng temporarily deployed to Metax_ci April 23, 2026 13:45 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

Copilot AI review requested due to automatic review settings April 24, 2026 02:57

juncaipeng force-pushed the refine_pd branch from edff727 to a9e6671 Compare April 24, 2026 02:57

juncaipeng temporarily deployed to Metax_ci April 24, 2026 02:57 — with GitHub Actions Inactive

Copilot started reviewing on behalf of juncaipeng April 24, 2026 02:58 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Comment thread fastdeploy/router/router.py

Comment thread fastdeploy/output/token_processor.py

Comment thread fastdeploy/engine/sched/resource_manager_v1.py

This comment was marked as outdated.

Sign in to view

Fix PD interaction and error response

fac0de6

juncaipeng force-pushed the refine_pd branch from a9e6671 to fac0de6 Compare April 24, 2026 05:15

juncaipeng temporarily deployed to Metax_ci April 24, 2026 05:15 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

Jiang-Jia-Jun approved these changes Apr 24, 2026

View reviewed changes

Jiang-Jia-Jun merged commit ee81b57 into PaddlePaddle:develop Apr 24, 2026
40 of 46 checks passed

PaddlePaddle-bot reviewed Apr 24, 2026

View reviewed changes

xiaoguoguo626807 pushed a commit to xiaoguoguo626807/FastDeploy that referenced this pull request May 7, 2026

Fix PD interaction and error response (PaddlePaddle#7500)

8ea885b

sunlei1024 pushed a commit to sunlei1024/FastDeploy that referenced this pull request May 7, 2026

Fix PD interaction and error response (PaddlePaddle#7500)

9ce994b

Conversation

juncaipeng commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Uh oh!

paddle-bot Bot commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Jiang-Jia-Jun Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

juncaipeng commented Apr 20, 2026 •

edited

Loading

codecov-commenter commented Apr 21, 2026 •

edited

Loading

Jiang-Jia-Jun Apr 21, 2026 •

edited

Loading