Skip to content

[PD] Fix PD interaction and error response#7500

Merged
Jiang-Jia-Jun merged 1 commit into
PaddlePaddle:developfrom
juncaipeng:refine_pd
Apr 24, 2026
Merged

[PD] Fix PD interaction and error response#7500
Jiang-Jia-Jun merged 1 commit into
PaddlePaddle:developfrom
juncaipeng:refine_pd

Conversation

@juncaipeng
Copy link
Copy Markdown
Collaborator

@juncaipeng juncaipeng commented Apr 20, 2026

Motivation

修复 PD 分离式推理场景下的多项问题:

  1. decode 端 tasks_list 注册时机过早,导致请求在 prefill 完成前被 batch output 处理引发空指针;
  2. 错误消息不统一,不便于 Router 层识别可重试错误;
  3. 非流式请求在 decode 抢占时缺少自动重试机制。

Modifications

  1. tasks_list 注册从 preallocate_resource_in_d 延迟到 add_prefilled_request,配合 _process_batch_output 增加 None 检查,避免worker生成无用的输出导致错误;
  2. 统一错误消息前缀为 PD Error,Router 和 Serving 层据此识别 PD 错误并设置 finish_reason=pd_reschedule,便于roter支持重调度;
  3. Router 新增 preempt_retry_count / preempt_retry_exclude_decode 参数,非流式请求 decode 抢占时自动重试;
  4. serving_chat 错误路径保留已生成 token,返回部分结果而非直接抛异常;
  5. splitwise_connector 修复send_cache_info_to_prefill 逻辑错误,避免资源不足后不会及时通知P实例。

Copilot AI review requested due to automatic review settings April 20, 2026 05:15
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 20, 2026

Thanks for your contribution!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 旨在改进 Splitwise(Prefill/Decode 分离)模式下的 PD 交互与错误返回:在发生 decode 侧资源不足/抢占等情况时,能更一致地向上游传递“PD Error”并尝试在 Router 侧做重试与更友好的错误响应。

Changes:

  • Router 增加 splitwise 模式下的 preempt 重试能力,并新增对应 CLI 参数。
  • PD 链路中统一/增强错误透传:decode->prefill 的 cache_sync 发送逻辑、以及引擎侧/输出侧错误码与错误文案。
  • OpenAI 协议层扩展 finish_reason,API 层在错误响应时尽量返回已生成内容并标记 pd_reschedule。

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
fastdeploy/splitwise/splitwise_connector.py 调整 decode 向 prefill 发送 cache_sync 的聚合/发送逻辑,扩大错误通知覆盖面。
fastdeploy/router/router.py 增加 preempt 重试参数与重试主流程(含可选排除上次 decode 实例)。
fastdeploy/output/token_processor.py prefill 发送 cache 失败时的错误码与错误信息调整(引入 “PD Error” 文案)。
fastdeploy/input/base_processor.py 遇到错误响应时跳过 token 解码,直接上抛给上游处理。
fastdeploy/entrypoints/openai/serving_chat.py 非流式错误场景下补齐 outputs 并返回已生成文本;新增 pd_reschedule finish_reason 判定。
fastdeploy/entrypoints/openai/protocol.py 扩展 OpenAI 协议 finish_reason 可选值:pd_reschedule。
fastdeploy/engine/common_engine.py PD 相关错误日志/错误响应文案与错误码调整(含 preempted 场景)。

Comment thread fastdeploy/router/router.py
Comment thread fastdeploy/entrypoints/openai/serving_chat.py Outdated
Comment thread fastdeploy/router/router.py
Comment thread fastdeploy/router/router.py
Comment thread fastdeploy/router/router.py
Comment thread fastdeploy/router/router.py
Comment thread fastdeploy/router/router.py Outdated
Comment thread fastdeploy/router/router.py Outdated
PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 41.88034% with 68 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@9236d0c). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/router/router.py 20.00% 51 Missing and 1 partial ⚠️
fastdeploy/entrypoints/openai/serving_chat.py 57.89% 5 Missing and 3 partials ⚠️
fastdeploy/input/base_processor.py 28.57% 3 Missing and 2 partials ⚠️
...astdeploy/entrypoints/openai/serving_completion.py 75.00% 1 Missing and 1 partial ⚠️
fastdeploy/splitwise/splitwise_connector.py 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7500   +/-   ##
==========================================
  Coverage           ?   72.23%           
==========================================
  Files              ?      419           
  Lines              ?    57845           
  Branches           ?     9072           
==========================================
  Hits               ?    41785           
  Misses             ?    13210           
  Partials           ?     2850           
Flag Coverage Δ
GPU 72.23% <41.88%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Comment thread fastdeploy/output/token_processor.py Outdated
Comment thread tests/output/test_token_processor.py
Comment thread fastdeploy/engine/common_engine.py Outdated
Comment thread fastdeploy/entrypoints/openai/protocol.py Outdated
Comment thread tests/splitwise/test_splitwise_connector.py
PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

raise ValueError("{}".format(data["error_msg"]))
idx = int(data["request_id"].split("_")[-1])
# api_server_logger.debug(f"Client {request_id} received: {data}")
if data.get("error_code", 200) != 200:
Copy link
Copy Markdown
Collaborator

@Jiang-Jia-Jun Jiang-Jia-Jun Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 此处直接解码成文本有风险,在正常的解码逻辑中,会处理乱码(即单独一个token解码为乱码,连续解码才正常)
  2. 只改动了serving_chat,completion接口没适配

建议此处不用解码,而是直接返回

  • 标识符表明此请求是重调度,应该finish_reason可以标识
  • 直接不做解码返回,即text="",增加返回completion_token_ids
  • Router模块在收到对应返回时
    • 生成新的请求(结构体内容与原请求一致,不管是chat或者completion)
    • 请求中增加字段generated_token_ids(内容赋值为收到的completion_token_ids,目前所有多模、含内部模型已支持,开源模型@ liyukun 待会儿提上来 )

这样两个接口都兼容,同时复用内部原有逻辑

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

fastdeploy/engine/sched/resource_manager_v1.py:1520

  • preallocate_resource_in_d 不再把 request 写入 tasks_list,而 add_prefilled_request 才写入。这样 decode 侧在“已分配 block 但尚未收到 prefill 首 token”阶段,update_metrics/available_gpu_block_num 等统计会漏算这些已占用的 block(因为当前实现只从 tasks_list 收集 block_tables),可能导致监控指标显著偏乐观,排查资源问题时被误导。建议至少在 metrics 统计时改为从 self.requests(或其它能覆盖预分配请求的集合)聚合 block_tables,或在预分配阶段记录占用以保证指标准确。
            request.block_tables = self._allocate_gpu_blocks(request, need_prealloc_prefill_blocks)
            request.num_computed_tokens = request.need_prefill_tokens
            request.disaggregate_info["block_tables"] = request.block_tables
            allocated_position = self.get_available_position()
            request.idx = allocated_position
            self.stop_flags[request.idx] = False
            self.requests[request.request_id] = request
            self.req_dict[request.request_id] = allocated_position
        return True

Comment thread fastdeploy/output/token_processor.py
Comment thread tests/splitwise/test_splitwise_connector.py
Comment thread fastdeploy/router/router.py
Comment thread fastdeploy/router/router.py
Comment thread fastdeploy/input/base_processor.py
PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Comment thread fastdeploy/entrypoints/openai/serving_chat.py
Comment thread fastdeploy/engine/sched/resource_manager_v1.py
Comment thread tests/splitwise/test_splitwise_connector.py
Comment thread fastdeploy/router/router.py
Comment thread fastdeploy/router/router.py
PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Comment thread fastdeploy/router/router.py
Comment thread fastdeploy/output/token_processor.py
Comment thread fastdeploy/engine/sched/resource_manager_v1.py
PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit ee81b57 into PaddlePaddle:develop Apr 24, 2026
40 of 46 checks passed
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-04-24 15:32:20

📋 Review 摘要

PR 概述:修复 PD 分离式推理中的多项问题,包括 decode 端 tasks_list 注册时机、错误消息统一化、非流式请求抢占自动重试及相关 serving/connector 逻辑修复。
变更范围fastdeploy/engine/fastdeploy/entrypoints/openai/fastdeploy/router/fastdeploy/splitwise/fastdeploy/input/fastdeploy/output/
影响面 Tag[PD Disaggregation] [Engine] [APIServer]

📝 PR 规范检查

标题使用了非官方 Tag [PD],官方 Tag 列表中最近义为 [BugFix](主要为 Bug 修复)或 [PD Disaggregation];PR 描述缺少 Usage or CommandAccuracy TestsChecklist 段落。

标题建议(可直接复制):

  • [BugFix] Fix PD interaction race condition and error response handling

PR 描述建议(可直接复制):

## Motivation
修复 PD 分离式推理场景下的多项问题:
1. decode 端 tasks_list 注册时机过早,导致请求在 prefill 完成前被 batch output 处理引发空指针;
2. 错误消息不统一,不便于 Router 层识别可重试错误;
3. 非流式请求在 decode 抢占时缺少自动重试机制;
4. splitwise_connector 中 send_cache_info_to_prefill 存在逻辑错误,资源不足后未能及时通知 P 实例。

## Modifications
1. `resource_manager_v1.py`:将 `tasks_list` 注册从 `preallocate_resource_in_d` 延迟到 `add_prefilled_request`,并在 `_process_batch_output` 增加 None 检查;
2. `common_engine.py` / `token_processor.py`:统一错误消息前缀为 `PD Error`,Router 和 Serving 层据此识别 PD 错误并设置 `finish_reason=pd_reschedule`3. `router.py`:新增 `preempt_retry_count` / `preempt_retry_exclude_decode` 参数,非流式请求 decode 抢占时自动重试;
4. `serving_chat.py` / `serving_completion.py`:错误路径保留已生成 token,返回部分结果而非直接抛异常;
5. `splitwise_connector.py`:修复 `send_cache_info_to_prefill` 逻辑,确保资源不足时也能及时通知 P 实例。

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.

问题

级别 文件 概述
🟡 建议 fastdeploy/entrypoints/openai/serving_chat.py:661 错误路径 completion_tokens 值为空字符串,与正常路径类型不一致
🟡 建议 fastdeploy/entrypoints/openai/serving_completion.py:331 错误路径 completion_tokens 值为空字符串,与正常路径类型不一致

总体评价

本 PR 修复了 PD 分离场景下多个关键问题(tasks_list 注册时机、None 检查、错误消息统一、connector 通知逻辑),整体设计合理,retry 机制与 completion_token_ids 传递链路设计合理,测试覆盖较为完整,无阻塞性问题。有 2 处 completion_tokens 类型一致性细节建议改进。

# Error response - include already-generated tokens in the response
data["outputs"] = {
"text": "",
"completion_tokens": "",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 completion_tokens 值为空字符串 "",应改为 None0

finish_reason == "pd_reschedule" 时,响应会返回 completion_token_ids(token 列表)但 completion_tokens(token 计数)为 "",二者语义不一致,可能误导 API 调用方。

建议修改为:

"completion_tokens": None,

raise ValueError("{}".format(data["error_msg"]))
data["outputs"] = {
"text": "",
"completion_tokens": "",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 completion_tokens 值为空字符串 "",应改为 None0

错误路径返回的 outputscompletion_tokens 类型不一致(正常路径为数字/None),建议统一为 None

"completion_tokens": None,

xiaoguoguo626807 pushed a commit to xiaoguoguo626807/FastDeploy that referenced this pull request May 7, 2026
sunlei1024 pushed a commit to sunlei1024/FastDeploy that referenced this pull request May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants