Skip to content

fix: keep failed chat continuations out of conversation cache#2947

Merged
dgageot merged 4 commits into
docker:mainfrom
dgageot:board/53a31f9447d73c9b
Jun 2, 2026
Merged

fix: keep failed chat continuations out of conversation cache#2947
dgageot merged 4 commits into
docker:mainfrom
dgageot:board/53a31f9447d73c9b

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented Jun 1, 2026

Summary

Fixes #2890.

  • Make X-Conversation-Id continuations transactional: work on a cloned cached session and commit it back only when the run succeeds.
  • Reject continuation requests that do not include a usable new user message, instead of replaying the previous turn.
  • Add Session.Clone() as a faithful deep-copy primitive and harden clone helpers so cloned sessions do not share mutable message/config/eval/tool-definition state.
  • Add package tests and an e2e test covering failed-turn cache behavior.

Validation

  • task lint
  • task test
  • task build

@dgageot dgageot requested a review from a team as a code owner June 1, 2026 11:59
@aheritier aheritier added area/api For features/issues/fixes related to the usage of the cagent API area/sessions For features/issues/fixes related to session lifecycle (resume, persistence, export) kind/fix PR fixes a bug (maps to fix: commit prefix) labels Jun 1, 2026
Copy link
Copy Markdown

@docker-agent docker-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

This PR correctly implements the transactional conversation-cache pattern. The changes were reviewed across all 8 modified files.

What was reviewed:

  • Transactional clone logic in pkg/chatserver/server.go (resolveSession / commitConversation): the clone→run→commit-on-success-only flow is correct and race-free.
  • Deep-copy completeness in pkg/session/session.go and pkg/session/branch.go: all mutable slice/map/pointer fields (messages, tool definitions, eval criteria, eval results, configs) are properly deep-copied; cloned sessions do not share mutable state with the original.
  • Error handling for the "no usable new user message" rejection path: correctly returns 400 for empty/missing user messages rather than silently replaying the prior turn.
  • Test coverage in conversations_test.go, conversations_transaction_test.go, clone_test.go, and e2e/chatserver_test.go: tests are well-structured and assert the intended behavior.
  • The return nil in the refactored non-streaming handler path is intentional and correct — chatCompletion already writes the error response directly, so returning the error to Echo would cause a double-response. The tradeoff is acknowledged in the inline comment.

No confirmed bugs found in the changed code.

Copy link
Copy Markdown
Contributor

@aheritier aheritier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. CI is green for head SHA 72b361c2a50e47925fa43a4124aefee83edd892c (build-and-test, build-image, license-check, lint, PR Review, and save-context all succeeded; build-and-push-image was skipped).

I reviewed the cache/transaction semantics and concurrency path in detail:

  • Continuations now work on a clone of the cached session before appending the latest user message (pkg/chatserver/server.go:423-430), so failed turns cannot mutate the canonical cached pointer.
  • The cache is advanced only on successful runs (pkg/chatserver/server.go:449-453), while both streaming and non-streaming handlers pass the actual agent run error into that commit decision (pkg/chatserver/server.go:398-407).
  • Per-conversation locking still prevents two concurrent continuations for the same X-Conversation-Id from racing (pkg/chatserver/server.go:373-377, pkg/chatserver/conversation_lock.go:25-39).
  • The clone path deep-copies mutable session/message state rather than sharing slices/maps/pointers (pkg/session/session.go:339-404), with regression coverage for config, eval fields, tool definitions, sub-sessions, and item metadata.
  • Test coverage includes focused transaction tests (pkg/chatserver/conversations_transaction_test.go) and an e2e regression proving a failed continuation is not replayed on the next successful turn (e2e/chatserver_test.go:88-104).

No blocking findings.

@dgageot dgageot merged commit b3dd62a into docker:main Jun 2, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api For features/issues/fixes related to the usage of the cagent API area/sessions For features/issues/fixes related to session lifecycle (resume, persistence, export) kind/fix PR fixes a bug (maps to fix: commit prefix)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chatserver: failed X-Conversation-Id turns mutate cached session and affect retries

3 participants