[Cherry-Pick][Optimization]Change default workers and max-concurrency when launch api-server(#7457)#7516
Conversation
|
Thanks for your contribution! |
…ddlePaddle#7457) Co-authored-by: zhangxiao35 <zhangxiao35@baidu.com>
9f6954b to
3a41c4d
Compare
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-20 22:40:39
📋 Review 摘要
PR 概述:将 API-Server 启动时 workers 默认值从 1 改为 4,max-concurrency 从 512 改为 2048,提升 GPU 利用率;multi_api_server 模式保持原值不变。
变更范围:entrypoints/api_server.py、entrypoints/openai/utils.py、CI 配置
影响面 Tag:[APIServer] [CI]
问题
未发现阻塞性问题。
总体评价
变更逻辑清晰,通过 FD_ENABLE_MULTI_API_SERVER 环境变量合理区分单服务与多服务模式的默认参数,CI 测试也通过显式传参 --workers 1 进行了适配,避免默认值变更影响已有测试用例。Cherry-pick 内容与原 PR #7457 一致。
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release/2.6 #7516 +/- ##
==============================================
Coverage ? 74.18%
==============================================
Files ? 376
Lines ? 52999
Branches ? 8279
==============================================
Hits ? 39317
Misses ? 10919
Partials ? 2763
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Cherry-pick of #7457 (authored by @K11OntheBoat) to
release/2.6.devPR:#7457
Motivation
API-Server 启动服务时,将workers默认从1改为4,服务最大concurrency同步改为2048. 避免了GPU 运行快,但是请求不足的性能瓶颈. Blackwell架构,长输入短输出数据,512并发场景,GLM4.5-Air模型TPS提升1.74倍.
multi_api_server中这两个参数保持不变.
Modifications
API-Server 启动服务时,将workers默认从1改为4,服务最大concurrency同步改为2048.
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.