Skip to content

feat(governance): Policy Agent batch settings + per-task manual-grant reason#28746

Open
yan-3005 wants to merge 8 commits into
mainfrom
policy-agent-dar-concurrency
Open

feat(governance): Policy Agent batch settings + per-task manual-grant reason#28746
yan-3005 wants to merge 8 commits into
mainfrom
policy-agent-dar-concurrency

Conversation

@yan-3005

@yan-3005 yan-3005 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Fixes #28759

Part of the Data Access Request concurrency work — open-metadata/openmetadata-collate#4414

What

Supporting OpenMetadata changes for the Collate Data Access Request (DAR) concurrency work — clubbing concurrent policy-agent grants into a single ingestion run per pipeline (Collate PR links to this one).

  1. workflowSettings — add policyAgentConfiguration:
    • pollingIntervalSeconds (default 30), batchWindowSeconds (default 60, the scheduler/window), batchMaxSize (default 200), batchWorkerThreads (default 4, the coordinator's worker-pool size — bounds how many distinct pipelines provision concurrently). All bounded minimum: 1.
    • Read by the Collate batch coordinator.
  2. policyAgentTaskDefinition — add an optional per-node pollingIntervalSeconds. When unset it inherits the server-wide policyAgentConfiguration.pollingIntervalSeconds (then the built-in default); it has no own default so "unset" stays meaningful.
  3. CreateTask — optionally carry a workflow-supplied reason into the created task's payload as manualGrantReason (its own structured field), instead of overloading the description.

Generic approval workflows are unaffected

The CreateTask change is a strict no-op for any workflow that doesn't set the manualGrantReason workflow variable (Glossary approval, tag/description suggestions, test-case resolution, …): the read is wrapped in try/catch + instanceof, and mergeManualGrantReason(payload, null) returns the payload unchanged. So those task types are byte-for-byte identical to before.

Tests

  • CreateTaskManualGrantReasonTest — no-op for null/blank reason, sets payload.manualGrantReason, preserves existing payload keys, overwrites on stage re-entry.
  • CreateTaskTest (44 existing) still pass — the generic task-creation path is unchanged.

🤖 Generated with Claude Code

  1. ingestion (api/step.py) — add an opt-in report_all_failures flag. Summary.from_step caps reported failure details to the first 10 (status payload size); the Collate Policy Agent needs the complete per-policy failure list because the batch coordinator decides each clubbed request's outcome (grant vs manual) from exactly which policy ids failed — a truncated list silently granted the un-reported failures. Default stays False, so every other connector keeps the 10-cap.

Greptile Summary

This PR wires in OpenMetadata-side configuration and task-payload support for the Collate Policy Agent batch concurrency work. It adds policyAgentConfiguration to workflowSettings (four tunable integers with safe defaults), a per-node pollingIntervalSeconds override on policyAgentTaskDefinition, and a manualGrantReason field stamped into the task payload on the DAR approval path — all generated TypeScript types are updated to match.

  • workflowSettings.json + TS: New policyAgentConfiguration object (pollingIntervalSeconds, batchWindowSeconds, batchMaxSize, batchWorkerThreads) — all optional with sensible defaults; read by the Collate batch coordinator.
  • CreateTask.java: resolveManualGrantReason + mergeManualGrantReason stamp a workflow variable into payload.manualGrantReason; both are strict no-ops for any workflow that doesn't set the variable, guarded by try/catch and blank-check.
  • step.py: Opt-in report_all_failures flag (via getattr with False default) lets Policy Agent steps bypass the 10-failure cap so the coordinator can make accurate per-policy grant/manual decisions without silent truncation.

Confidence Score: 4/5

Safe to merge; all changes are additive and backward-compatible — generic approval workflows are byte-for-byte unaffected by the manualGrantReason path.

The two non-generated files with behavioral changes (step.py and CreateTask.java) both have minor observability gaps: the report_all_failures opt-in is undeclared on the base class (a Policy Agent step that forgets to set it silently reverts to truncated failures), and resolveManualGrantReason swallows exceptions without any log trace (making a DAR-path variable resolution failure invisible). Neither gap causes incorrect behavior for existing workflows, but both could complicate debugging if something goes wrong in the batch coordinator integration.

ingestion/src/metadata/ingestion/api/step.py and CreateTask.java are the two files worth a second look — the rest are generated TypeScript and JSON schema additions.

Important Files Changed

Filename Overview
ingestion/src/metadata/ingestion/api/step.py Adds opt-in report_all_failures flag via getattr to bypass the 10-failure cap; the flag is not declared on the base Step class, making it easy to silently miss on subclasses.
openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/userTask/CreateTask.java Adds resolveManualGrantReason + mergeManualGrantReason to stamp a DAR-path reason into the task payload; logic is safe for generic workflows, but exception swallowing in the resolver is entirely silent on the DAR path.
openmetadata-service/src/test/java/org/openmetadata/service/governance/workflows/elements/nodes/userTask/CreateTaskManualGrantReasonTest.java New unit tests covering null/blank no-op, null payload with reason, key preservation, and overwrite-on-update cases for mergeManualGrantReason.
openmetadata-spec/src/main/resources/json/schema/configuration/workflowSettings.json Adds policyAgentConfiguration definition with four integer fields (all minimum:1, all with defaults) and wires it into the top-level properties; no issues found.
openmetadata-spec/src/main/resources/json/schema/governance/workflows/elements/nodes/automatedTask/policyAgentTaskDefinition.json Adds optional pollingIntervalSeconds (minimum:1, no default intentionally) to the config object so per-node override stays distinguishable from unset.
openmetadata-ui/src/main/resources/ui/src/generated/configuration/workflowSettings.ts Generated TypeScript interface for PolicyAgentConfiguration — mirrors the JSON schema accurately.
openmetadata-ui/src/main/resources/ui/src/generated/settings/settings.ts Generated settings TypeScript — adds PolicyAgentConfiguration interface and policyAgentConfiguration field consistent with the schema.
openmetadata-ui/src/main/resources/ui/src/generated/api/governance/createWorkflowDefinition.ts Generated TypeScript — adds pollingIntervalSeconds to NodeConfiguration matching the updated policy agent task definition schema.
openmetadata-ui/src/main/resources/ui/src/generated/governance/workflows/elements/nodes/automatedTask/policyAgentTaskDefinition.ts Generated TypeScript — adds optional pollingIntervalSeconds to Config interface in the policy agent task definition.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant WF as Workflow Engine
    participant CT as CreateTask.java
    participant PA as Policy Agent Step (Python)
    participant Coord as Batch Coordinator (Collate)

    WF->>CT: notify(delegateTask) [DAR path]
    CT->>CT: resolveManualGrantReason(delegateTask)
    CT->>CT: mergeManualGrantReason(payload, reason)
    CT-->>WF: Task saved with payload.manualGrantReason

    WF->>PA: run pipeline batch
    PA->>PA: "report_all_failures=True → Summary.from_step() returns full failure list"
    PA-->>Coord: StepSummary (all failures, not capped at 10)
    Coord->>Coord: per-policy grant vs manual-review decision
    Coord-->>WF: batch result (policyAgentConfiguration settings applied)
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant WF as Workflow Engine
    participant CT as CreateTask.java
    participant PA as Policy Agent Step (Python)
    participant Coord as Batch Coordinator (Collate)

    WF->>CT: notify(delegateTask) [DAR path]
    CT->>CT: resolveManualGrantReason(delegateTask)
    CT->>CT: mergeManualGrantReason(payload, reason)
    CT-->>WF: Task saved with payload.manualGrantReason

    WF->>PA: run pipeline batch
    PA->>PA: "report_all_failures=True → Summary.from_step() returns full failure list"
    PA-->>Coord: StepSummary (all failures, not capped at 10)
    Coord->>Coord: per-policy grant vs manual-review decision
    Coord-->>WF: batch result (policyAgentConfiguration settings applied)
Loading

Reviews (1): Last reviewed commit: "feat(ingestion): let a step opt out of t..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

… reason

Supporting OpenMetadata changes for the Collate Data Access Request (DAR)
concurrency work — clubbing concurrent policy-agent grants into a single
ingestion run.

- workflowSettings: add policyAgentConfiguration (pollingIntervalSeconds,
  batchWindowSeconds, batchMaxSize), read by the Collate batch coordinator.
- policyAgentTaskDefinition: add pollingIntervalSeconds node config.
- CreateTask: optionally carry a workflow-supplied reason into the created
  task's payload as `manualGrantReason` (its own structured field). Strict
  no-op (try/catch + instanceof) for workflows that do not set it, so generic
  approval workflows (Glossary, etc.) are byte-for-byte unaffected.
- Tests: CreateTaskManualGrantReasonTest; the 44 generic CreateTaskTest pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 5, 2026 09:19
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

✅ PR checks passed

The linked issue has a description and all required Shipping project fields set. Thanks!

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@yan-3005 yan-3005 added safe to test Add this label to run secure Github workflows on PRs backend labels Jun 5, 2026
@yan-3005 yan-3005 self-assigned this Jun 5, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends OpenMetadata governance/workflow capabilities to support Policy Agent batching for Data Access Request (DAR) concurrency, and adds support for carrying a workflow-supplied manual-grant reason into created Task payloads as a structured field.

Changes:

  • Added policyAgentConfiguration to workflowSettings (polling interval + batching window/size controls) for the Policy Agent batch coordinator.
  • Extended PolicyAgentTaskDefinition config with pollingIntervalSeconds (per-node override).
  • Updated CreateTask to optionally merge a workflow-supplied manualGrantReason into task payload, with new unit tests covering merge behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
openmetadata-spec/src/main/resources/json/schema/governance/workflows/elements/nodes/automatedTask/policyAgentTaskDefinition.json Adds per-node polling interval setting for Policy Agent task nodes.
openmetadata-spec/src/main/resources/json/schema/configuration/workflowSettings.json Introduces server-wide Policy Agent batching configuration under workflow settings.
openmetadata-service/src/test/java/org/openmetadata/service/governance/workflows/elements/nodes/userTask/CreateTaskManualGrantReasonTest.java Adds unit tests validating manualGrantReason payload merge behavior.
openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/userTask/CreateTask.java Merges optional workflow manualGrantReason into task payload during create/update.

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

✅ TypeScript Types Auto-Updated

The generated TypeScript types have been automatically updated based on JSON schema changes in this PR.

@github-actions github-actions Bot requested a review from a team as a code owner June 5, 2026 09:26
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (11 flaky)

✅ 4293 passed · ❌ 0 failed · 🟡 11 flaky · ⏭️ 88 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 299 0 2 4
✅ Shard 2 812 0 0 9
🟡 Shard 3 809 0 3 8
🟡 Shard 4 847 0 3 12
🟡 Shard 5 733 0 1 47
🟡 Shard 6 793 0 2 8
🟡 11 flaky test(s) (passed on retry)
  • Flow/Tour.spec.ts › Tour should work from help section (shard 1, 2 retries)
  • Pages/Roles.spec.ts › Roles page should work properly (shard 1, 1 retry)
  • Features/KnowledgeCenterList.spec.ts › Knowledge Center List - Verify list visibility and card functionality (shard 3, 1 retry)
  • Features/Table.spec.ts › Table pagination with sorting should works (shard 3, 1 retry)
  • Features/Table.spec.ts › Search for column, copy link, and verify side panel behavior (shard 3, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 4, 1 retry)
  • Pages/CustomProperties.spec.ts › Sql Query (shard 4, 1 retry)
  • Pages/CustomProperties.spec.ts › Entity Reference List (shard 4, 1 retry)
  • Pages/Entity.spec.ts › User as Owner with unsorted list (shard 5, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage service type filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

- workflowSettings.policyAgentConfiguration: add `minimum: 1` to
  pollingIntervalSeconds / batchWindowSeconds / batchMaxSize; add
  `batchWorkerThreads` (default 4) so the coordinator's worker-pool size
  is configurable instead of a hard-coded cap.
- policyAgentTaskDefinition: add `minimum: 1` to pollingIntervalSeconds and
  correct the fallback doc path to
  workflowSettings.policyAgentConfiguration.pollingIntervalSeconds.
- Regenerate UI TypeScript for WorkflowSettings and PolicyAgentTaskDefinition
  so the generated types stay in sync with the schema.
- Align the new test's license-header year (2026) with neighbouring tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 5, 2026 12:05

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 8 changed files in this pull request and generated 2 comments.

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

✅ TypeScript Types Auto-Updated

The generated TypeScript types have been automatically updated based on JSON schema changes in this PR.

…er-wide default

Removing the per-node default (30) makes the generated field nullable, so an
unset node value is now distinguishable from a configured one. The Collate
PolicyAgentTask resolves it to the node override → server-wide
policyAgentConfiguration.pollingIntervalSeconds → built-in default, which is
what the field description already promised. (minimum:1 stays.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 5, 2026 12:30

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 8 changed files in this pull request and generated 1 comment.

mergeManualGrantReason called JsonUtils.getMap (a Jackson convertValue) even
when the payload was already a Map, which can re-type nested values and can
throw. Copy the existing map entries directly (mirroring withGrantExpirationDate)
and only fall back to a guarded conversion for the unexpected non-Map case, so
task creation is never broken by a conversion issue.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 62%
62.44% (67087/107426) 43.95% (37140/84492) 45.82% (11474/25037)

Copilot AI review requested due to automatic review settings June 12, 2026 10:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 8 changed files in this pull request and generated 1 comment.

Comment on lines +969 to +972
LOG.warn(
"[CreateTask] Could not merge manualGrantReason into payload of type {}",
payload.getClass().getSimpleName());
return payload;
Summary.from_step caps the reported failure details to the first 10 (status
payload size). That's fine for diagnostics but unsafe when downstream logic needs
the COMPLETE per-item failure list — e.g. the Collate Policy Agent batch
coordinator, which decides each clubbed request's outcome (grant vs manual) from
exactly which policy ids failed; a truncated list silently grants the un-reported
failures. Add an opt-in `report_all_failures` flag (default False → every
existing connector keeps the 10-cap). The error COUNT was already accurate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@yan-3005 yan-3005 requested a review from a team as a code owner June 17, 2026 06:49
@gitar-bot

gitar-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown
Code Review 👍 Approved with suggestions 4 resolved / 5 findings

Implements Data Access Request concurrency settings and adds manual-grant reason support, addressing issues with interval bounds and task state management. Consider updating the error logging in supersedePriorApprovalTask to capture stack traces for better visibility.

💡 Quality: Best-effort catch logs only getMessage(), dropping stack trace

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/userTask/CreateTask.java:655-660

The catch block in supersedePriorApprovalTask logs e.getMessage() without the exception object, so the stack trace is lost. Because this path is intentionally best-effort (swallows all exceptions so it never aborts new-task creation), the log line is the only signal that supersession failed; without the stack trace these failures are very hard to diagnose in production. Pass the throwable as the last SLF4J argument so the stack trace is captured: LOG.warn("[CreateTask] Failed to supersede prior approval task(s) for entity '{}'", entity.getFullyQualifiedName(), e);.

Pass the throwable to SLF4J so the full stack trace is logged.
} catch (Exception e) {
  LOG.warn(
      "[CreateTask] Failed to supersede prior approval task(s) for entity '{}'",
      entity.getFullyQualifiedName(),
      e);
}
✅ 4 resolved
Quality: Schema doc references wrong settings path

📄 openmetadata-spec/src/main/resources/json/schema/governance/workflows/elements/nodes/automatedTask/policyAgentTaskDefinition.json:29
The new per-node pollingIntervalSeconds description says it "Falls back to the server-wide default (workflowSettings.policyAgentPollingIntervalSeconds) when unset." That path does not exist. The field added in workflowSettings.json is nested under policyAgentConfiguration, so the correct reference is workflowSettings.policyAgentConfiguration.pollingIntervalSeconds. Update the description to match the actual schema to avoid confusing operators configuring the agent.

Edge Case: No minimum bound on new integer interval settings

📄 openmetadata-spec/src/main/resources/json/schema/configuration/workflowSettings.json:86-99 📄 openmetadata-spec/src/main/resources/json/schema/governance/workflows/elements/nodes/automatedTask/policyAgentTaskDefinition.json:27-31
The new integer settings (pollingIntervalSeconds, batchWindowSeconds, batchMaxSize) have defaults but no minimum constraint. A value of 0 or a negative number for the polling interval or batch window could cause tight polling loops, divide-by-zero, or no batching in the coordinator that reads them. Consider adding "minimum": 1 to these fields so invalid configurations are rejected at the schema layer rather than producing degenerate runtime behavior in the Collate coordinator.

Edge Case: Concurrent runs can leave two live approvals despite invariant

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/userTask/CreateTask.java:615-629
supersedePriorApprovalTask enforces "one live approval per (entity, workflow)" by listing committed non-terminal tasks (listNonTerminalTasksByEntityAndCategory) and superseding those from earlier runs. This is not atomic: two concurrent CreateTask.notify executions for the same (entity, workflow) can each list the prior tasks before the other's taskRepository.create(...) has committed. In that window neither run observes the other's task, so neither is superseded and the system ends up with two simultaneously-live approval tasks — exactly the invariant the comment at lines 615-619 claims to hold. Given the whole PR is about DAR concurrency (clubbing concurrent grants), this race is directly in the target scenario. Consider guarding the supersede/create against a DB-level uniqueness constraint or a serialized lock keyed on (entityFqn, workflowDefinitionId), or at minimum soften the comment from an absolute "invariant" to a best-effort guarantee so downstream code does not rely on uniqueness.

Edge Case: Superseded Flowable process left running if task read non-terminal

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/userTask/CreateTask.java:707-721
terminateSupersededInstance re-reads the prior task via findCommittedTask and only terminates the Flowable instance when isTerminalTaskStatus(prior.getStatus()) is true (and skips when prior == null). The safe direction is intentional (don't orphan a still-live approval), but it has the opposite failure mode: if the async worker observes a stale/non-terminal status (e.g. the closeTask commit is not yet visible on this connection, or the prior task was concurrently reopened) it silently leaves the Flowable process intact while the OM approval task is closed. The result is a workflow process that believes the approval is still pending but has no actionable OM task — a stuck/orphaned process that is never retried (the dispatch is fire-and-forget). Consider adding a bounded retry on the committed-read (mirroring the existing WORKFLOW_MANAGED_DRAFT_LOOKUP retry pattern in this class) before giving up, or a reconciliation sweep for closed tasks whose instance is still live.

🤖 Prompt for agents
Code Review: Implements Data Access Request concurrency settings and adds manual-grant reason support, addressing issues with interval bounds and task state management. Consider updating the error logging in `supersedePriorApprovalTask` to capture stack traces for better visibility.

1. 💡 Quality: Best-effort catch logs only getMessage(), dropping stack trace
   Files: openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/userTask/CreateTask.java:655-660

   The catch block in `supersedePriorApprovalTask` logs `e.getMessage()` without the exception object, so the stack trace is lost. Because this path is intentionally best-effort (swallows all exceptions so it never aborts new-task creation), the log line is the only signal that supersession failed; without the stack trace these failures are very hard to diagnose in production. Pass the throwable as the last SLF4J argument so the stack trace is captured: `LOG.warn("[CreateTask] Failed to supersede prior approval task(s) for entity '{}'", entity.getFullyQualifiedName(), e);`.

   Fix (Pass the throwable to SLF4J so the full stack trace is logged.):
   } catch (Exception e) {
     LOG.warn(
         "[CreateTask] Failed to supersede prior approval task(s) for entity '{}'",
         entity.getFullyQualifiedName(),
         e);
   }

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Comment on lines +938 to +946
private static String resolveManualGrantReason(DelegateTask delegateTask) {
try {
Object reason =
new WorkflowVariableHandler(delegateTask)
.getNamespacedVariable(GLOBAL_NAMESPACE, "manualGrantReason");
return reason instanceof String s ? s : null;
} catch (Exception e) {
return null;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent exception swallowing on the DAR path

resolveManualGrantReason swallows all exceptions without any trace. For non-DAR workflows this is intentional (the variable simply isn't set), but on the Policy Agent path a genuine error — e.g., a type mismatch or a deserialization failure — would silently return null, causing manualGrantReason to be dropped from the task payload with no indication in the logs. A LOG.debug or LOG.trace call in the catch block would make this scenario diagnosable without adding noise for the common non-DAR case.

Comment on lines +125 to +128
failures=(
(step.status.failures if getattr(step, "report_all_failures", False) else step.status.failures[0:10])
if step.status.failures
else None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 report_all_failures is undeclared on the base Step class

The attribute is consumed via getattr(step, "report_all_failures", False), but Step never declares it (not as a class variable, not in __init__, and there's no Protocol or type annotation). Any step that forgets to set it will silently revert to the 10-failure cap. For the Policy Agent use case, where truncation causes incorrect security decisions (as the comment notes), a missed attribute is indistinguishable from the deliberate opt-out. Declaring report_all_failures: bool = False in Step.__init__ would make the contract explicit and allow static analysis to catch misuse.

@sonarqubecloud

Copy link
Copy Markdown

@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Governance workflow support for Data Access Request (DAR) concurrency batching

2 participants