Skip to content

[agentserver] Spec compliance: error shapes, session headers, isolation, diagnostic logging, startup config logging, Foundry User-Agent#46364

Merged
RaviPidaparthi merged 31 commits into
mainfrom
feature/spec-compliance-error-shapes
Apr 18, 2026
Merged

[agentserver] Spec compliance: error shapes, session headers, isolation, diagnostic logging, startup config logging, Foundry User-Agent#46364
RaviPidaparthi merged 31 commits into
mainfrom
feature/spec-compliance-error-shapes

Conversation

@RaviPidaparthi
Copy link
Copy Markdown
Member

@RaviPidaparthi RaviPidaparthi commented Apr 17, 2026

What

Spec compliance improvements for the agentserver family — azure-ai-agentserver-responses v1.0.0b2, azure-ai-agentserver-core v2.0.0b2, azure-ai-agentserver-invocations v1.0.0b2.

Changes

Error shapes

  • Error code field now uses spec-compliant values: "invalid_request_error" for 400/404, "server_error" for 500.
  • Deleted-resource errors return HTTP 404 (was 400).
  • Cancel terminal-state message updated to "Cannot cancel a response in terminal state.".
  • SSE replay rejection messages use spec-compliant wording.
  • Foundry storage errors explicitly caught and mapped to appropriate HTTP status codes.

B40 malformed ID error shape

  • All endpoints reject malformed response IDs with HTTP 400 before touching storage.
  • Error shape updated per spec: code: "invalid_parameters", param: "responseId{<value>}" (was code: "invalid_request_error", param: "response_id").
  • previous_response_id body-field validation also uses code: "invalid_parameters".
  • New invalid_parameters_response() builder in _validation.py.

x-agent-session-id response header (§8)

  • x-agent-session-id HTTP response header now set on every protocol endpoint response (success and error).
  • POST /responses: header echoes the per-request B39-resolved session ID (payload → env var → derivation → random).
  • GET / DELETE / CANCEL / INPUT_ITEMS: header echoes FOUNDRY_AGENT_SESSION_ID from host.config.
  • New _session_headers() helper on _ResponseEndpointHandler for consistent header propagation.

Eager eviction

  • Terminal responses (completed, failed, cancelled, incomplete) are immediately evicted from in-memory runtime state after persistence.
  • Subsequent GET/DELETE/Cancel operations fall through to the provider (storage) path.
  • store=false responses are also evicted (nothing to fall back to → 404).
  • try_evict() and mark_deleted() on _RuntimeState.

Chat isolation enforcement

  • When a response is created with x-agent-chat-isolation-key, all subsequent operations must include the same key.
  • Mismatched or missing keys return an indistinguishable 404.

Stream flag removal

  • Removed stream mode flag stamping on persisted Response objects — the persisted response no longer carries the stream flag.

Diagnostic logging

  • InboundRequestLoggingMiddleware — pure-ASGI middleware logging every inbound HTTP request (method, path, status, duration, correlation headers, OTel trace ID). Status >= 400 → WARNING. Query strings excluded. Moved from responses to core so all protocol hosts get it automatically.
  • Handler-level INFO logs at all 5 endpoints (create, get, delete, cancel, input_items) with response ID, status, output count.
  • Orchestrator handler invocation log with handler function name and response ID.
  • Isolation key presence (has_user_isolation_key, has_chat_isolation_key) logged at all endpoint entry points and in FoundryStorageLoggingPolicy.

Startup configuration logging

  • Core: AgentServerHost lifespan emits 3 INFO-level log lines at startup:
    1. Platform environment: is_hosted, agent_name, agent_version, port, session_id, sse_keepalive_interval
    2. Connectivity (masked): project_endpoint (scheme://host only), otlp_endpoint (scheme://host only), appinsights_configured (boolean — connection string is never logged)
    3. Host options: shutdown_timeout, registered protocols
  • Responses: Logs storage_provider type, default_model, default_fetch_history_count, shutdown_grace_period at construction.
  • Invocations: Logs openapi_spec_configured at construction.
  • New _mask_uri() helper: strips URIs to scheme://host, returns "(not set)" for empty, "(redacted)" for unparseable input.

Storage logging & Foundry User-Agent

  • FoundryStorageLoggingPolicy — Azure Core pipeline policy for Foundry HTTP call logging with masked URLs (everything before /storage redacted), api-version query param preserved.
  • _ServerVersionUserAgentPolicy — lightweight SansIOHTTPPolicy that lazily evaluates a callback to set the User-Agent header on outbound Foundry storage HTTP requests. Uses self._build_server_version (same source of truth as x-platform-server response header) so both headers are always identical, including segments from core, responses, and additional_server_version.

Pyright fixes

  • Fixed 13 pre-existing pyright errors in production source (0 remaining).
  • OTel type stubs → # type: ignore, Starlette Middleware factory → # type: ignore[arg-type].

Ruff lint fixes

  • Fixed all I001 (import sorting) and E501 (line too long) across all 3 packages.

Housekeeping

  • Removed .NET references from code comments and docstrings.
  • Added cspell ignore words for internal abbreviations (hdrs, myproj, myhost).

Packages

Package Version Key changes
azure-ai-agentserver-responses 1.0.0b2 Error shapes, isolation, session headers, eager eviction, diagnostic logging, startup config logging, storage logging, Foundry User-Agent
azure-ai-agentserver-core 2.0.0b2 InboundRequestLoggingMiddleware, startup config logging, _mask_uri(), pyright fixes
azure-ai-agentserver-invocations 1.0.0b2 Startup config logging, core dep >=2.0.0b2

Tests

  • 1055 tests pass (responses: 861, core: 89, invocations: 105), 5 skipped.
  • New test files: test_chat_isolation_enforcement.py, test_malformed_id_validation.py, test_eager_eviction.py, test_inbound_request_logging.py, test_foundry_logging_policy.py, test_startup_logging.py (13 tests: 8 for _mask_uri, 5 for startup log assertions including secret non-leakage).
  • Updated: test_session_id_resolution.py (header assertions on all endpoints), test_malformed_id_validation.py (B40 shape assertions).

Align error payloads with the container-spec behaviour contract:

Error code compliance:
- error.code uses 'invalid_request_error' for 400/404 (was 'invalid_request',
  'not_found', 'invalid_mode')
- error.code uses 'server_error' for 500 (was 'internal_error')
- RequestValidationError default code updated to 'invalid_request_error'

Post-delete behaviour (spec alignment with .NET PR #58252):
- GET, input_items, and second DELETE on deleted responses now return 404
  (was 400)
- deleted_response() factory now delegates to not_found_response()

Cancel/SSE message alignment:
- Cancel incomplete: 'Cannot cancel a response in terminal state.'
  (was 'Cannot cancel an incomplete response.')
- SSE replay non-bg: 'This response cannot be streamed because it was not
  created with background=true.'
- SSE replay non-stream: '...stream=true.'

Storage error propagation:
- FoundryStorageError subclasses now explicitly caught in GET, cancel, and
  input_items handlers instead of being swallowed by broad except clauses
- FoundryResourceNotFoundError -> 404, FoundryBadRequestError -> 400,
  FoundryApiError -> error_response (500)

Storage call logging:
- FoundryStorageLoggingPolicy: per-retry pipeline policy logging method, URI,
  status code, duration (ms), and correlation headers at the
  azure.ai.agentserver logger
- Replaces built-in HttpLoggingPolicy to avoid double-logging

Tests:
- Added error.code assertions to all existing error tests across
  cancel, delete, get, create, and input_items endpoint tests
- Updated post-delete tests from expecting 400 to 404
- Added new tests: SSE replay unknown ID, 404 message contains ID,
  500 error body shape, SSE replay message variants
- Added FoundryStorageLoggingPolicy unit tests (4 tests)
- 791 tests passing

Version bumped to 1.0.0b2.
Copilot AI review requested due to automatic review settings April 17, 2026 03:39
@github-actions github-actions Bot added the Hosted Agents sdk/agentserver/* label Apr 17, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns azure-ai-agentserver-responses behavior and error payload shapes with the container-spec contract (mirroring the .NET implementation), including post-delete 404 semantics and improved Foundry storage observability.

Changes:

  • Standardizes error.code values (invalid_request_error for 400/404, server_error for 500) and updates related endpoint behaviors/messages.
  • Changes post-delete behavior so GET/input_items/second DELETE return HTTP 404 and routes deleted responses through not_found_response().
  • Introduces FoundryStorageLoggingPolicy and wires it into the Foundry storage pipeline; expands handler mapping for Foundry storage errors.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sdk/agentserver/azure-ai-agentserver-responses/tests/unit/test_foundry_logging_policy.py Adds unit coverage for the new Foundry storage logging policy.
sdk/agentserver/azure-ai-agentserver-responses/tests/contract/test_input_items_endpoint.py Updates error envelope assertions and post-delete expectations (400 → 404).
sdk/agentserver/azure-ai-agentserver-responses/tests/contract/test_get_endpoint.py Adds/updates 404 error-shape and SSE replay rejection message assertions.
sdk/agentserver/azure-ai-agentserver-responses/tests/contract/test_delete_endpoint.py Updates delete-related error-code assertions and post-delete GET expectation (400 → 404).
sdk/agentserver/azure-ai-agentserver-responses/tests/contract/test_cross_api_e2e.py Adds assertions for spec-aligned cancel error message/code.
sdk/agentserver/azure-ai-agentserver-responses/tests/contract/test_create_endpoint.py Adds error.code assertions and validates 500 error envelope shape.
sdk/agentserver/azure-ai-agentserver-responses/tests/contract/test_cancel_endpoint.py Extends test helper to assert error.code and updates expected messages.
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/store/_foundry_provider.py Replaces built-in HTTP logging policy with FoundryStorageLoggingPolicy in the async pipeline.
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/store/_foundry_logging_policy.py Adds the custom per-retry logging policy implementation.
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/models/errors.py Updates RequestValidationError default code.
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/hosting/_validation.py Updates error code mapping and makes deleted responses return 404 via not_found_response().
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/hosting/_endpoint_handler.py Updates server error code shape, SSE replay rejection messages, and expands Foundry error handling.
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/_version.py Bumps package version to 1.0.0b2.
sdk/agentserver/azure-ai-agentserver-responses/CHANGELOG.md Documents breaking changes and new logging policy for 1.0.0b2.

Comment thread sdk/agentserver/azure-ai-agentserver-responses/CHANGELOG.md Outdated
Chat isolation key enforcement:
- Store chat_isolation_key on ResponseExecution and _RuntimeState
- Enforce key matching on GET, DELETE, Cancel, and InputItems endpoints
- Mismatched/missing keys return indistinguishable 404
- Backward-compatible: no enforcement when created without a key

Malformed ID validation:
- All endpoints reject malformed response_id path params (wrong prefix,
  too short) with 400 before touching storage
- previous_response_id in POST body also validated
- Update existing tests using fake IDs to use well-formed IdGenerator IDs

14 chat isolation tests + 19 malformed ID tests (33 new, 824 total)
@RaviPidaparthi RaviPidaparthi changed the title [agentserver-responses] Spec compliance: error shapes, post-delete 404, storage logging [agentserver-responses] Spec compliance: error shapes, chat isolation, malformed ID validation, storage logging Apr 17, 2026
Port eager eviction from .NET PR #58252. After a response reaches a
terminal state (completed, failed, cancelled, incomplete), the in-memory
record is removed from RuntimeState so that subsequent GET, DELETE,
Cancel, and SSE replay requests fall through to the durable storage
provider.

Key changes:
- RuntimeState.try_evict(): removes terminal records while preserving
  chat isolation keys for provider-fallback enforcement
- RuntimeState.mark_deleted(): supports DELETE provider fallback
- Eviction wired into all 5 orchestrator terminal paths
  (bg non-stream, sync, bg+stream Path A, non-bg stream Path B, cancel)
- Provider fallback paths added to handle_get, handle_delete,
  handle_cancel for evicted responses
- B1 background check in cancel provider fallback (matches .NET)
- Cancel idempotency: cancelled responses return 200 via provider
- B2 stream/background checks in SSE replay provider fallback
- background + stream mode flags stamped on all persisted responses
- SSE events saved for replay after eviction (including fallback events)
- store=false cancel returns 404 (matching .NET)
- SSE datetime serialization fix in _build_sse_frame
- 9 new eager eviction unit tests
…sted responses

The stream flag is not part of the ResponseObject contract and should
not be persisted.  After eager eviction, the server cannot distinguish
bg+non-stream from bg+stream-with-expired-TTL, so the SSE replay
fallback now uses a combined error message matching .NET's
SseReplayResult:

  'This response cannot be streamed because it was not created with
   stream=true or the stream TTL has expired.'

Added TODO documenting the deliberate spec violation — the container
spec prescribes distinct error messages but the provider doesn't carry
enough context to distinguish the two cases.
@RaviPidaparthi RaviPidaparthi changed the title [agentserver-responses] Spec compliance: error shapes, chat isolation, malformed ID validation, storage logging [agentserver-responses] Spec compliance: error shapes, eager eviction, chat isolation, storage logging Apr 17, 2026
…andler diagnostic logging

- Add InboundRequestLoggingMiddleware (pure ASGI): logs method, path, status,
  duration, correlation headers (x-request-id, x-ms-client-request-id), and
  OTel trace ID. Status >= 400 → WARNING; exceptions → forced 500 WARNING.
  Query strings are excluded from logs.
- Add INFO-level handler diagnostic logs to all 5 endpoints: create (params),
  get (entry + retrieval), delete (entry + success), cancel (entry + success),
  input_items (entry).
- Add orchestrator handler invocation log with handler function name.
- Wire middleware in ResponsesAgentServerHost via add_middleware().
- 13 new contract tests for middleware and handler logging.
- Update CHANGELOG.md with logging features.

Matches .NET PR #58274 (InboundRequestLoggingMiddleware + handler logging).
@RaviPidaparthi RaviPidaparthi changed the title [agentserver-responses] Spec compliance: error shapes, eager eviction, chat isolation, storage logging [agentserver-responses] Spec compliance: error shapes, eager eviction, chat isolation, storage logging, diagnostic logging Apr 17, 2026
The azure-ai-agentserver-core package was bumped to 2.0.0b1 but the
githubcopilot package still had the old <1.0.0b18 upper bound, causing
the Analyze dependencies CI gate to fail.
Moves the middleware from azure-ai-agentserver-responses to
azure-ai-agentserver-core so all protocol hosts get consistent
inbound request logging automatically.

- Created _middleware.py in core with the middleware class
- Wired into AgentServerHost.__init__ middleware list
- Exported from core __init__.py
- Removed explicit add_middleware() call from ResponsesAgentServerHost
- Updated CHANGELOG to reflect the move

Addresses review feedback from @ankitbko.
- core 2.0.0b2: Added InboundRequestLoggingMiddleware, CHANGELOG updated
- invocations 1.0.0b2: Core dep bumped to >=2.0.0b2, CHANGELOG updated
- responses: Core dep bumped to >=2.0.0b2
… B40)

- Add x-agent-session-id response header on all protocol endpoints per
  container spec §8. POST /responses uses the per-request resolved
  session ID; GET/DELETE/CANCEL/INPUT_ITEMS use the env var.
- Update B40 malformed ID error shape: code 'invalid_parameters',
  param 'responseId{<value>}' matching spec contract.
- Add _session_headers() helper for consistent header propagation
  across all handler code paths including error responses.
- Add invalid_parameters_response() validation helper for B40.
- Update previous_response_id validation to use 'invalid_parameters'.
- Fix header inconsistency: all JSONResponse calls now include headers.
- TDD: tests written first, verified red, then implementation, green.

Tests: 851 responses + 76 core + 105 invocations passing.
- _tracing.py: Suppress OTel type stub gaps (start_as_current_span
  context manager, LoggerProvider API vs SDK mismatch, duck-typed
  BaggageLogRecordProcessor) with targeted type: ignore comments.
- _base.py: Suppress Starlette Middleware factory protocol mismatch
  for pure-ASGI middleware classes (arg-type).

All 13 production-source pyright errors resolved (0 remaining).
Test-only errors from generated model unions are pre-existing.
- Core: AgentServerHost lifespan emits 3 INFO log lines at startup:
  1. Platform environment (is_hosted, agent_name, agent_version, port,
     session_id, sse_keepalive_interval)
  2. Connectivity (project_endpoint masked, otlp_endpoint masked,
     appinsights_configured boolean — connection string never logged)
  3. Host options (shutdown_timeout, registered protocols)
- Core: _mask_uri() helper strips URI to scheme://host, returns
  '(not set)' for empty values, '(redacted)' for unparseable input
- Responses: Logs storage_provider type, default_model,
  default_fetch_history_count, shutdown_grace_period at construction
- Invocations: Logs openapi_spec_configured at construction
- 13 new unit tests (8 for _mask_uri, 5 for startup log assertions)
- Updated all 3 CHANGELOGs
- Fix I001 import sorting in _base.py, _config.py, _tracing.py,
  _endpoint_handler.py, _sse.py (auto-fixed via ruff --fix)
- Fix E501 line-too-long in _orchestrator.py (2 occurrences, wrapped
  long getattr expressions)
@RaviPidaparthi RaviPidaparthi changed the title [agentserver] Spec compliance: error shapes, session headers, isolation, diagnostic logging [agentserver] Spec compliance: error shapes, session headers, isolation, diagnostic logging, startup config logging Apr 17, 2026
@RaviPidaparthi RaviPidaparthi requested a review from Copilot April 17, 2026 23:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated 5 comments.

Comment thread sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_base.py Outdated
…rdening

- _mask_uri docstring: Include '(redacted)' return case in :return: docs
- FoundryStorageLoggingPolicy: Mask everything before /storage in URLs
  (host, scheme, /api/projects/{name} prefix all redacted); query params
  stripped. Only /storage/... resource path is logged for debugging.
- SSE session headers: _parse_starting_after now accepts headers param;
  _build_live_stream_response and _try_replay_persisted_stream merge
  session headers into SSE headers so x-agent-session-id is present on
  all streaming responses and cursor-parse error responses.
- test_valid_format_nonexistent_previous_response_id: Strengthened to
  explicitly assert code != 'invalid_parameters' and 'Malformed' not in
  message when status is 400, instead of weak OR-based assertion.
- 6 new _mask_storage_url unit tests covering project path redaction,
  query stripping, /storage path preservation, edge cases.
_mask_storage_url now keeps the api-version query parameter in the
masked output for debugging while still stripping all other params.
Updated docstring example and tests accordingly.
@RaviPidaparthi RaviPidaparthi changed the title [agentserver] Spec compliance: error shapes, session headers, isolation, diagnostic logging, startup config logging [agentserver] Spec compliance: error shapes, session headers, isolation, diagnostic logging, startup config logging, Foundry User-Agent Apr 18, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 3 comments.

The _chat_isolation_keys dict duplicated what ResponseExecution.chat_isolation_key
already carries.  For in-flight responses the record is available; for
post-eviction responses Foundry storage enforces isolation server-side.

- Remove _chat_isolation_keys dict and all add/delete/evict bookkeeping
- Make check_chat_isolation a @staticmethod(stored_key, request_key)
- Endpoint callers pass record.chat_isolation_key directly
- Remove redundant INPUT_ITEMS KeyError fallback check (already checked above)
- Update eviction test to verify static helper + record lifecycle
- FoundryStorageProvider: update get_server_version=None docstring to say
  'uses Azure Core default User-Agent policy' (matches actual behavior)
- FoundryStorageLoggingPolicy: remove misleading AsyncHTTPPolicy generic
  type args; use bare AsyncHTTPPolicy with type: ignore[type-arg]
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 3 comments.

- Restore sdk_moniker='ai-agentserver-responses/{VERSION}' in default
  UserAgentPolicy path (was dropped when adding _ServerVersionUserAgentPolicy)
- Add exc_info=True to logging policy transport-failure warning
- Rename test_malformed_previous_response_id_returns_400_with_details to
  _returns_400; assert error.type/code/param (no details array in wire shape)
- Update module docstring to match actual contract
Core:
- _base.py: break long Middleware() line (127 > 120 chars)

Responses (_endpoint_handler.py):
- Extract _handle_get_fallback(): provider fallback for evicted/missing records
- Extract _handle_get_stream(): stream=true path for in-flight records
- Extract _handle_cancel_fallback(): provider fallback for cancel
- Extract _check_cancel_terminal_status(): shared terminal-status logic
  used by both in-flight and provider-fallback cancel paths
- handle_get: 61 -> ~30 statements
- handle_cancel: 55 statements / 22 branches -> ~30 / ~12
- Add pylint disable for too-many-positional-arguments on constructors
  and get_input_items protocol methods (6 sites)
- Add pylint disable for import-error/no-name-in-module on cross-package
  azure.ai.agentserver.core imports (4 sites)
…d by pylint 3.2.7

CI uses pylint==3.2.7 which does not have the too-many-positional-arguments
check (added in 3.3+). The inline pylint disable comments were triggering
unknown-option-value warnings (W0012), causing exit code 4.
@RaviPidaparthi RaviPidaparthi merged commit a3381c6 into main Apr 18, 2026
19 checks passed
@RaviPidaparthi RaviPidaparthi deleted the feature/spec-compliance-error-shapes branch April 18, 2026 09:16
VenkataAnilKumar pushed a commit to VenkataAnilKumar/azure-sdk-for-python that referenced this pull request Apr 18, 2026
…on, diagnostic logging, startup config logging, Foundry User-Agent (Azure#46364)

* Spec compliance: error shapes, post-delete 404, storage logging

Align error payloads with the container-spec behaviour contract:

Error code compliance:
- error.code uses 'invalid_request_error' for 400/404 (was 'invalid_request',
  'not_found', 'invalid_mode')
- error.code uses 'server_error' for 500 (was 'internal_error')
- RequestValidationError default code updated to 'invalid_request_error'

Post-delete behaviour (spec alignment with .NET PR #58252):
- GET, input_items, and second DELETE on deleted responses now return 404
  (was 400)
- deleted_response() factory now delegates to not_found_response()

Cancel/SSE message alignment:
- Cancel incomplete: 'Cannot cancel a response in terminal state.'
  (was 'Cannot cancel an incomplete response.')
- SSE replay non-bg: 'This response cannot be streamed because it was not
  created with background=true.'
- SSE replay non-stream: '...stream=true.'

Storage error propagation:
- FoundryStorageError subclasses now explicitly caught in GET, cancel, and
  input_items handlers instead of being swallowed by broad except clauses
- FoundryResourceNotFoundError -> 404, FoundryBadRequestError -> 400,
  FoundryApiError -> error_response (500)

Storage call logging:
- FoundryStorageLoggingPolicy: per-retry pipeline policy logging method, URI,
  status code, duration (ms), and correlation headers at the
  azure.ai.agentserver logger
- Replaces built-in HttpLoggingPolicy to avoid double-logging

Tests:
- Added error.code assertions to all existing error tests across
  cancel, delete, get, create, and input_items endpoint tests
- Updated post-delete tests from expecting 400 to 404
- Added new tests: SSE replay unknown ID, 404 message contains ID,
  500 error body shape, SSE replay message variants
- Added FoundryStorageLoggingPolicy unit tests (4 tests)
- 791 tests passing

Version bumped to 1.0.0b2.

* Address PR review: add FoundryBadRequestError handlers, clean up imports, fix pyright/pylint, reclassify changelog

* Set 1.0.0b2 release date to 2026-04-17

* feat: chat isolation enforcement & malformed ID validation

Chat isolation key enforcement:
- Store chat_isolation_key on ResponseExecution and _RuntimeState
- Enforce key matching on GET, DELETE, Cancel, and InputItems endpoints
- Mismatched/missing keys return indistinguishable 404
- Backward-compatible: no enforcement when created without a key

Malformed ID validation:
- All endpoints reject malformed response_id path params (wrong prefix,
  too short) with 400 before touching storage
- previous_response_id in POST body also validated
- Update existing tests using fake IDs to use well-formed IdGenerator IDs

14 chat isolation tests + 19 malformed ID tests (33 new, 824 total)

* feat(agentserver-responses): eager eviction of terminal response records

Port eager eviction from .NET PR #58252. After a response reaches a
terminal state (completed, failed, cancelled, incomplete), the in-memory
record is removed from RuntimeState so that subsequent GET, DELETE,
Cancel, and SSE replay requests fall through to the durable storage
provider.

Key changes:
- RuntimeState.try_evict(): removes terminal records while preserving
  chat isolation keys for provider-fallback enforcement
- RuntimeState.mark_deleted(): supports DELETE provider fallback
- Eviction wired into all 5 orchestrator terminal paths
  (bg non-stream, sync, bg+stream Path A, non-bg stream Path B, cancel)
- Provider fallback paths added to handle_get, handle_delete,
  handle_cancel for evicted responses
- B1 background check in cancel provider fallback (matches .NET)
- Cancel idempotency: cancelled responses return 200 via provider
- B2 stream/background checks in SSE replay provider fallback
- background + stream mode flags stamped on all persisted responses
- SSE events saved for replay after eviction (including fallback events)
- store=false cancel returns 404 (matching .NET)
- SSE datetime serialization fix in _build_sse_frame
- 9 new eager eviction unit tests

* fix(agentserver-responses): remove stream mode flag stamping on persisted responses

The stream flag is not part of the ResponseObject contract and should
not be persisted.  After eager eviction, the server cannot distinguish
bg+non-stream from bg+stream-with-expired-TTL, so the SSE replay
fallback now uses a combined error message matching .NET's
SseReplayResult:

  'This response cannot be streamed because it was not created with
   stream=true or the stream TTL has expired.'

Added TODO documenting the deliberate spec violation — the container
spec prescribes distinct error messages but the provider doesn't carry
enough context to distinguish the two cases.

* feat(agentserver-responses): inbound request logging middleware and handler diagnostic logging

- Add InboundRequestLoggingMiddleware (pure ASGI): logs method, path, status,
  duration, correlation headers (x-request-id, x-ms-client-request-id), and
  OTel trace ID. Status >= 400 → WARNING; exceptions → forced 500 WARNING.
  Query strings are excluded from logs.
- Add INFO-level handler diagnostic logs to all 5 endpoints: create (params),
  get (entry + retrieval), delete (entry + success), cancel (entry + success),
  input_items (entry).
- Add orchestrator handler invocation log with handler function name.
- Wire middleware in ResponsesAgentServerHost via add_middleware().
- 13 new contract tests for middleware and handler logging.
- Update CHANGELOG.md with logging features.

Matches .NET PR #58274 (InboundRequestLoggingMiddleware + handler logging).

* chore: remove .NET references from code comments and docstrings

* fix: add type ignore for Starlette add_middleware typing

* fix: update githubcopilot core dependency to >=2.0.0b1

The azure-ai-agentserver-core package was bumped to 2.0.0b1 but the
githubcopilot package still had the old <1.0.0b18 upper bound, causing
the Analyze dependencies CI gate to fail.

* Revert "fix: update githubcopilot core dependency to >=2.0.0b1"

This reverts commit 903e498.

* Move InboundRequestLoggingMiddleware to core

Moves the middleware from azure-ai-agentserver-responses to
azure-ai-agentserver-core so all protocol hosts get consistent
inbound request logging automatically.

- Created _middleware.py in core with the middleware class
- Wired into AgentServerHost.__init__ middleware list
- Exported from core __init__.py
- Removed explicit add_middleware() call from ResponsesAgentServerHost
- Updated CHANGELOG to reflect the move

Addresses review feedback from @ankitbko.

* Bump core to 2.0.0b2, invocations to 1.0.0b2 for middleware move

- core 2.0.0b2: Added InboundRequestLoggingMiddleware, CHANGELOG updated
- invocations 1.0.0b2: Core dep bumped to >=2.0.0b2, CHANGELOG updated
- responses: Core dep bumped to >=2.0.0b2

* feat(responses): add x-agent-session-id header + B40 error shape (§8, B40)

- Add x-agent-session-id response header on all protocol endpoints per
  container spec §8. POST /responses uses the per-request resolved
  session ID; GET/DELETE/CANCEL/INPUT_ITEMS use the env var.
- Update B40 malformed ID error shape: code 'invalid_parameters',
  param 'responseId{<value>}' matching spec contract.
- Add _session_headers() helper for consistent header propagation
  across all handler code paths including error responses.
- Add invalid_parameters_response() validation helper for B40.
- Update previous_response_id validation to use 'invalid_parameters'.
- Fix header inconsistency: all JSONResponse calls now include headers.
- TDD: tests written first, verified red, then implementation, green.

Tests: 851 responses + 76 core + 105 invocations passing.

* Address PR review: fix CHANGELOG B40 wire shape, use public API in test

* Log isolation key presence (not values) in endpoint handler diagnostics

* Log isolation header presence in Foundry storage logging policy

Add has_user_isolation_key/has_chat_isolation_key booleans to both
success and failure log lines. Values are never logged. Includes
3 new unit tests covering presence, absence, and failure paths.

* Fix pyright errors in agentserver source code

- _tracing.py: Suppress OTel type stub gaps (start_as_current_span
  context manager, LoggerProvider API vs SDK mismatch, duck-typed
  BaggageLogRecordProcessor) with targeted type: ignore comments.
- _base.py: Suppress Starlette Middleware factory protocol mismatch
  for pure-ASGI middleware classes (arg-type).

All 13 production-source pyright errors resolved (0 remaining).
Test-only errors from generated model unions are pre-existing.

* Add startup configuration logging across all agentserver packages

- Core: AgentServerHost lifespan emits 3 INFO log lines at startup:
  1. Platform environment (is_hosted, agent_name, agent_version, port,
     session_id, sse_keepalive_interval)
  2. Connectivity (project_endpoint masked, otlp_endpoint masked,
     appinsights_configured boolean — connection string never logged)
  3. Host options (shutdown_timeout, registered protocols)
- Core: _mask_uri() helper strips URI to scheme://host, returns
  '(not set)' for empty values, '(redacted)' for unparseable input
- Responses: Logs storage_provider type, default_model,
  default_fetch_history_count, shutdown_grace_period at construction
- Invocations: Logs openapi_spec_configured at construction
- 13 new unit tests (8 for _mask_uri, 5 for startup log assertions)
- Updated all 3 CHANGELOGs

* Fix ruff lint issues across agentserver packages

- Fix I001 import sorting in _base.py, _config.py, _tracing.py,
  _endpoint_handler.py, _sse.py (auto-fixed via ruff --fix)
- Fix E501 line-too-long in _orchestrator.py (2 occurrences, wrapped
  long getattr expressions)

* Address PR review comments: URL masking, SSE session headers, test hardening

- _mask_uri docstring: Include '(redacted)' return case in :return: docs
- FoundryStorageLoggingPolicy: Mask everything before /storage in URLs
  (host, scheme, /api/projects/{name} prefix all redacted); query params
  stripped. Only /storage/... resource path is logged for debugging.
- SSE session headers: _parse_starting_after now accepts headers param;
  _build_live_stream_response and _try_replay_persisted_stream merge
  session headers into SSE headers so x-agent-session-id is present on
  all streaming responses and cursor-parse error responses.
- test_valid_format_nonexistent_previous_response_id: Strengthened to
  explicitly assert code != 'invalid_parameters' and 'Malformed' not in
  message when status is 400, instead of weak OR-based assertion.
- 6 new _mask_storage_url unit tests covering project path redaction,
  query stripping, /storage path preservation, edge cases.

* Preserve api-version query param in Foundry storage URL masking

_mask_storage_url now keeps the api-version query parameter in the
masked output for debugging while still stripping all other params.
Updated docstring example and tests accordingly.

* Foundry storage User-Agent matches x-platform-server via lazy callback; cspell fixes

* Fix post-eviction chat isolation: let Foundry storage enforce instead of local 404

* Remove redundant _chat_isolation_keys dict — use record directly

The _chat_isolation_keys dict duplicated what ResponseExecution.chat_isolation_key
already carries.  For in-flight responses the record is available; for
post-eviction responses Foundry storage enforces isolation server-side.

- Remove _chat_isolation_keys dict and all add/delete/evict bookkeeping
- Make check_chat_isolation a @staticmethod(stored_key, request_key)
- Endpoint callers pass record.chat_isolation_key directly
- Remove redundant INPUT_ITEMS KeyError fallback check (already checked above)
- Update eviction test to verify static helper + record lifecycle

* Address PR review: fix docstring and logging policy generics

- FoundryStorageProvider: update get_server_version=None docstring to say
  'uses Azure Core default User-Agent policy' (matches actual behavior)
- FoundryStorageLoggingPolicy: remove misleading AsyncHTTPPolicy generic
  type args; use bare AsyncHTTPPolicy with type: ignore[type-arg]

* Address PR review round 4: sdk_moniker, exc_info, test name

- Restore sdk_moniker='ai-agentserver-responses/{VERSION}' in default
  UserAgentPolicy path (was dropped when adding _ServerVersionUserAgentPolicy)
- Add exc_info=True to logging policy transport-failure warning
- Rename test_malformed_previous_response_id_returns_400_with_details to
  _returns_400; assert error.type/code/param (no details array in wire shape)
- Update module docstring to match actual contract

* Fix pylint: extract helpers to reduce statement/branch counts

Core:
- _base.py: break long Middleware() line (127 > 120 chars)

Responses (_endpoint_handler.py):
- Extract _handle_get_fallback(): provider fallback for evicted/missing records
- Extract _handle_get_stream(): stream=true path for in-flight records
- Extract _handle_cancel_fallback(): provider fallback for cancel
- Extract _check_cancel_terminal_status(): shared terminal-status logic
  used by both in-flight and provider-fallback cancel paths
- handle_get: 61 -> ~30 statements
- handle_cancel: 55 statements / 22 branches -> ~30 / ~12

* Suppress remaining pylint warnings in agentserver-responses

- Add pylint disable for too-many-positional-arguments on constructors
  and get_input_items protocol methods (6 sites)
- Add pylint disable for import-error/no-name-in-module on cross-package
  azure.ai.agentserver.core imports (4 sites)

* fix: remove too-many-positional-arguments disable comments unsupported by pylint 3.2.7

CI uses pylint==3.2.7 which does not have the too-many-positional-arguments
check (added in 3.3+). The inline pylint disable comments were triggering
unknown-option-value warnings (W0012), causing exit code 4.
fafhrd91 pushed a commit to fafhrd91/azure-sdk-for-python that referenced this pull request Apr 28, 2026
…on, diagnostic logging, startup config logging, Foundry User-Agent (Azure#46364)

* Spec compliance: error shapes, post-delete 404, storage logging

Align error payloads with the container-spec behaviour contract:

Error code compliance:
- error.code uses 'invalid_request_error' for 400/404 (was 'invalid_request',
  'not_found', 'invalid_mode')
- error.code uses 'server_error' for 500 (was 'internal_error')
- RequestValidationError default code updated to 'invalid_request_error'

Post-delete behaviour (spec alignment with .NET PR #58252):
- GET, input_items, and second DELETE on deleted responses now return 404
  (was 400)
- deleted_response() factory now delegates to not_found_response()

Cancel/SSE message alignment:
- Cancel incomplete: 'Cannot cancel a response in terminal state.'
  (was 'Cannot cancel an incomplete response.')
- SSE replay non-bg: 'This response cannot be streamed because it was not
  created with background=true.'
- SSE replay non-stream: '...stream=true.'

Storage error propagation:
- FoundryStorageError subclasses now explicitly caught in GET, cancel, and
  input_items handlers instead of being swallowed by broad except clauses
- FoundryResourceNotFoundError -> 404, FoundryBadRequestError -> 400,
  FoundryApiError -> error_response (500)

Storage call logging:
- FoundryStorageLoggingPolicy: per-retry pipeline policy logging method, URI,
  status code, duration (ms), and correlation headers at the
  azure.ai.agentserver logger
- Replaces built-in HttpLoggingPolicy to avoid double-logging

Tests:
- Added error.code assertions to all existing error tests across
  cancel, delete, get, create, and input_items endpoint tests
- Updated post-delete tests from expecting 400 to 404
- Added new tests: SSE replay unknown ID, 404 message contains ID,
  500 error body shape, SSE replay message variants
- Added FoundryStorageLoggingPolicy unit tests (4 tests)
- 791 tests passing

Version bumped to 1.0.0b2.

* Address PR review: add FoundryBadRequestError handlers, clean up imports, fix pyright/pylint, reclassify changelog

* Set 1.0.0b2 release date to 2026-04-17

* feat: chat isolation enforcement & malformed ID validation

Chat isolation key enforcement:
- Store chat_isolation_key on ResponseExecution and _RuntimeState
- Enforce key matching on GET, DELETE, Cancel, and InputItems endpoints
- Mismatched/missing keys return indistinguishable 404
- Backward-compatible: no enforcement when created without a key

Malformed ID validation:
- All endpoints reject malformed response_id path params (wrong prefix,
  too short) with 400 before touching storage
- previous_response_id in POST body also validated
- Update existing tests using fake IDs to use well-formed IdGenerator IDs

14 chat isolation tests + 19 malformed ID tests (33 new, 824 total)

* feat(agentserver-responses): eager eviction of terminal response records

Port eager eviction from .NET PR #58252. After a response reaches a
terminal state (completed, failed, cancelled, incomplete), the in-memory
record is removed from RuntimeState so that subsequent GET, DELETE,
Cancel, and SSE replay requests fall through to the durable storage
provider.

Key changes:
- RuntimeState.try_evict(): removes terminal records while preserving
  chat isolation keys for provider-fallback enforcement
- RuntimeState.mark_deleted(): supports DELETE provider fallback
- Eviction wired into all 5 orchestrator terminal paths
  (bg non-stream, sync, bg+stream Path A, non-bg stream Path B, cancel)
- Provider fallback paths added to handle_get, handle_delete,
  handle_cancel for evicted responses
- B1 background check in cancel provider fallback (matches .NET)
- Cancel idempotency: cancelled responses return 200 via provider
- B2 stream/background checks in SSE replay provider fallback
- background + stream mode flags stamped on all persisted responses
- SSE events saved for replay after eviction (including fallback events)
- store=false cancel returns 404 (matching .NET)
- SSE datetime serialization fix in _build_sse_frame
- 9 new eager eviction unit tests

* fix(agentserver-responses): remove stream mode flag stamping on persisted responses

The stream flag is not part of the ResponseObject contract and should
not be persisted.  After eager eviction, the server cannot distinguish
bg+non-stream from bg+stream-with-expired-TTL, so the SSE replay
fallback now uses a combined error message matching .NET's
SseReplayResult:

  'This response cannot be streamed because it was not created with
   stream=true or the stream TTL has expired.'

Added TODO documenting the deliberate spec violation — the container
spec prescribes distinct error messages but the provider doesn't carry
enough context to distinguish the two cases.

* feat(agentserver-responses): inbound request logging middleware and handler diagnostic logging

- Add InboundRequestLoggingMiddleware (pure ASGI): logs method, path, status,
  duration, correlation headers (x-request-id, x-ms-client-request-id), and
  OTel trace ID. Status >= 400 → WARNING; exceptions → forced 500 WARNING.
  Query strings are excluded from logs.
- Add INFO-level handler diagnostic logs to all 5 endpoints: create (params),
  get (entry + retrieval), delete (entry + success), cancel (entry + success),
  input_items (entry).
- Add orchestrator handler invocation log with handler function name.
- Wire middleware in ResponsesAgentServerHost via add_middleware().
- 13 new contract tests for middleware and handler logging.
- Update CHANGELOG.md with logging features.

Matches .NET PR #58274 (InboundRequestLoggingMiddleware + handler logging).

* chore: remove .NET references from code comments and docstrings

* fix: add type ignore for Starlette add_middleware typing

* fix: update githubcopilot core dependency to >=2.0.0b1

The azure-ai-agentserver-core package was bumped to 2.0.0b1 but the
githubcopilot package still had the old <1.0.0b18 upper bound, causing
the Analyze dependencies CI gate to fail.

* Revert "fix: update githubcopilot core dependency to >=2.0.0b1"

This reverts commit 903e498.

* Move InboundRequestLoggingMiddleware to core

Moves the middleware from azure-ai-agentserver-responses to
azure-ai-agentserver-core so all protocol hosts get consistent
inbound request logging automatically.

- Created _middleware.py in core with the middleware class
- Wired into AgentServerHost.__init__ middleware list
- Exported from core __init__.py
- Removed explicit add_middleware() call from ResponsesAgentServerHost
- Updated CHANGELOG to reflect the move

Addresses review feedback from @ankitbko.

* Bump core to 2.0.0b2, invocations to 1.0.0b2 for middleware move

- core 2.0.0b2: Added InboundRequestLoggingMiddleware, CHANGELOG updated
- invocations 1.0.0b2: Core dep bumped to >=2.0.0b2, CHANGELOG updated
- responses: Core dep bumped to >=2.0.0b2

* feat(responses): add x-agent-session-id header + B40 error shape (§8, B40)

- Add x-agent-session-id response header on all protocol endpoints per
  container spec §8. POST /responses uses the per-request resolved
  session ID; GET/DELETE/CANCEL/INPUT_ITEMS use the env var.
- Update B40 malformed ID error shape: code 'invalid_parameters',
  param 'responseId{<value>}' matching spec contract.
- Add _session_headers() helper for consistent header propagation
  across all handler code paths including error responses.
- Add invalid_parameters_response() validation helper for B40.
- Update previous_response_id validation to use 'invalid_parameters'.
- Fix header inconsistency: all JSONResponse calls now include headers.
- TDD: tests written first, verified red, then implementation, green.

Tests: 851 responses + 76 core + 105 invocations passing.

* Address PR review: fix CHANGELOG B40 wire shape, use public API in test

* Log isolation key presence (not values) in endpoint handler diagnostics

* Log isolation header presence in Foundry storage logging policy

Add has_user_isolation_key/has_chat_isolation_key booleans to both
success and failure log lines. Values are never logged. Includes
3 new unit tests covering presence, absence, and failure paths.

* Fix pyright errors in agentserver source code

- _tracing.py: Suppress OTel type stub gaps (start_as_current_span
  context manager, LoggerProvider API vs SDK mismatch, duck-typed
  BaggageLogRecordProcessor) with targeted type: ignore comments.
- _base.py: Suppress Starlette Middleware factory protocol mismatch
  for pure-ASGI middleware classes (arg-type).

All 13 production-source pyright errors resolved (0 remaining).
Test-only errors from generated model unions are pre-existing.

* Add startup configuration logging across all agentserver packages

- Core: AgentServerHost lifespan emits 3 INFO log lines at startup:
  1. Platform environment (is_hosted, agent_name, agent_version, port,
     session_id, sse_keepalive_interval)
  2. Connectivity (project_endpoint masked, otlp_endpoint masked,
     appinsights_configured boolean — connection string never logged)
  3. Host options (shutdown_timeout, registered protocols)
- Core: _mask_uri() helper strips URI to scheme://host, returns
  '(not set)' for empty values, '(redacted)' for unparseable input
- Responses: Logs storage_provider type, default_model,
  default_fetch_history_count, shutdown_grace_period at construction
- Invocations: Logs openapi_spec_configured at construction
- 13 new unit tests (8 for _mask_uri, 5 for startup log assertions)
- Updated all 3 CHANGELOGs

* Fix ruff lint issues across agentserver packages

- Fix I001 import sorting in _base.py, _config.py, _tracing.py,
  _endpoint_handler.py, _sse.py (auto-fixed via ruff --fix)
- Fix E501 line-too-long in _orchestrator.py (2 occurrences, wrapped
  long getattr expressions)

* Address PR review comments: URL masking, SSE session headers, test hardening

- _mask_uri docstring: Include '(redacted)' return case in :return: docs
- FoundryStorageLoggingPolicy: Mask everything before /storage in URLs
  (host, scheme, /api/projects/{name} prefix all redacted); query params
  stripped. Only /storage/... resource path is logged for debugging.
- SSE session headers: _parse_starting_after now accepts headers param;
  _build_live_stream_response and _try_replay_persisted_stream merge
  session headers into SSE headers so x-agent-session-id is present on
  all streaming responses and cursor-parse error responses.
- test_valid_format_nonexistent_previous_response_id: Strengthened to
  explicitly assert code != 'invalid_parameters' and 'Malformed' not in
  message when status is 400, instead of weak OR-based assertion.
- 6 new _mask_storage_url unit tests covering project path redaction,
  query stripping, /storage path preservation, edge cases.

* Preserve api-version query param in Foundry storage URL masking

_mask_storage_url now keeps the api-version query parameter in the
masked output for debugging while still stripping all other params.
Updated docstring example and tests accordingly.

* Foundry storage User-Agent matches x-platform-server via lazy callback; cspell fixes

* Fix post-eviction chat isolation: let Foundry storage enforce instead of local 404

* Remove redundant _chat_isolation_keys dict — use record directly

The _chat_isolation_keys dict duplicated what ResponseExecution.chat_isolation_key
already carries.  For in-flight responses the record is available; for
post-eviction responses Foundry storage enforces isolation server-side.

- Remove _chat_isolation_keys dict and all add/delete/evict bookkeeping
- Make check_chat_isolation a @staticmethod(stored_key, request_key)
- Endpoint callers pass record.chat_isolation_key directly
- Remove redundant INPUT_ITEMS KeyError fallback check (already checked above)
- Update eviction test to verify static helper + record lifecycle

* Address PR review: fix docstring and logging policy generics

- FoundryStorageProvider: update get_server_version=None docstring to say
  'uses Azure Core default User-Agent policy' (matches actual behavior)
- FoundryStorageLoggingPolicy: remove misleading AsyncHTTPPolicy generic
  type args; use bare AsyncHTTPPolicy with type: ignore[type-arg]

* Address PR review round 4: sdk_moniker, exc_info, test name

- Restore sdk_moniker='ai-agentserver-responses/{VERSION}' in default
  UserAgentPolicy path (was dropped when adding _ServerVersionUserAgentPolicy)
- Add exc_info=True to logging policy transport-failure warning
- Rename test_malformed_previous_response_id_returns_400_with_details to
  _returns_400; assert error.type/code/param (no details array in wire shape)
- Update module docstring to match actual contract

* Fix pylint: extract helpers to reduce statement/branch counts

Core:
- _base.py: break long Middleware() line (127 > 120 chars)

Responses (_endpoint_handler.py):
- Extract _handle_get_fallback(): provider fallback for evicted/missing records
- Extract _handle_get_stream(): stream=true path for in-flight records
- Extract _handle_cancel_fallback(): provider fallback for cancel
- Extract _check_cancel_terminal_status(): shared terminal-status logic
  used by both in-flight and provider-fallback cancel paths
- handle_get: 61 -> ~30 statements
- handle_cancel: 55 statements / 22 branches -> ~30 / ~12

* Suppress remaining pylint warnings in agentserver-responses

- Add pylint disable for too-many-positional-arguments on constructors
  and get_input_items protocol methods (6 sites)
- Add pylint disable for import-error/no-name-in-module on cross-package
  azure.ai.agentserver.core imports (4 sites)

* fix: remove too-many-positional-arguments disable comments unsupported by pylint 3.2.7

CI uses pylint==3.2.7 which does not have the too-many-positional-arguments
check (added in 3.3+). The inline pylint disable comments were triggering
unknown-option-value warnings (W0012), causing exit code 4.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Hosted Agents sdk/agentserver/*

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants