[Optimization]Change default workers and max-concurrency when launch api-server#7457
Conversation
|
Thanks for your contribution! |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-17 12:15 CST
📋 Review 摘要
PR 概述:将 API-Server 的 workers 默认值从 1 调整为 4,max-concurrency 从 512 调整为 2048,以避免 GPU 运行快但请求不足的性能瓶颈;multi_api_server 模式下保持原值不变。
变更范围:entrypoints/api_server.py、entrypoints/openai/utils.py
影响面 Tag:APIServer
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | openai/utils.py:344 |
环境变量读取方式与项目惯例不一致,建议使用 envs 模块 |
总体评价
变更逻辑清晰,通过环境变量 FD_ENABLE_MULTI_API_SERVER 区分单机和多机模式分别设置默认值,思路合理。建议将环境变量读取方式统一为项目已有的 envs 模块,保持代码风格一致性。
|
|
||
|
|
||
| def make_arg_parser(parser: FlexibleArgumentParser) -> FlexibleArgumentParser: | ||
| _is_multi_server = os.environ.get("FD_ENABLE_MULTI_API_SERVER") == "1" |
There was a problem hiding this comment.
🟡 建议 环境变量读取方式与项目惯例不一致
项目中其他地方统一使用 fastdeploy.envs 模块访问 FD_ENABLE_MULTI_API_SERVER(如 envs.FD_ENABLE_MULTI_API_SERVER),而这里直接使用 os.environ.get() 读取。虽然功能等价,但不一致的读取方式在未来维护时可能导致混淆(例如 envs 模块增加缓存或校验逻辑时,这里不会受益)。
建议改为使用已导入的 envs 模块:
_is_multi_server = envs.FD_ENABLE_MULTI_API_SERVER注意:文件顶部已有 import fastdeploy.envs as envs(第 31 行),无需额外导入。
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7457 +/- ##
==========================================
Coverage ? 73.26%
==========================================
Files ? 398
Lines ? 54976
Branches ? 8613
==========================================
Hits ? 40279
Misses ? 12007
Partials ? 2690
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ddlePaddle#7457) Co-authored-by: zhangxiao35 <zhangxiao35@baidu.com>
|
✅ Cherry-pick successful! Created PR: #7516 |
…ddlePaddle#7457) Co-authored-by: zhangxiao35 <zhangxiao35@baidu.com>
… when launch api-server(#7457) (#7516) * Change default workers and max-concurrency when launch api-server (#7457) Co-authored-by: zhangxiao35 <zhangxiao35@baidu.com> * [CI] Add --workers=1 to keep test behavior consistent with default change --------- Co-authored-by: K11OntheBoat <ruianmaidanglao@163.com> Co-authored-by: zhangxiao35 <zhangxiao35@baidu.com>
Motivation
API-Server 启动服务时,将workers默认从1改为4,服务最大concurrency同步改为2048. 避免了GPU 运行快,但是请求不足的性能瓶颈. Blackwell架构,长输入短输出数据,512并发场景,GLM4.5-Air模型TPS提升1.74倍.
multi_api_server中这两个参数保持不变.
Modifications
API-Server 启动服务时,将workers默认从1改为4,服务最大concurrency同步改为2048.
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.