feat(data-retention): granular PII redaction stages (input + block outputs)#5272
feat(data-retention): granular PII redaction stages (input + block outputs)#5272TheodoreSpeaks wants to merge 19 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
PR SummaryHigh Risk Overview Runtime: Workflow input is masked before execution; block outputs are masked before compaction, downstream blocks, agent memory, and child workflows. Streaming blocks can drain without forwarding raw chunks when block-output redaction is on. In-flight stages use Presidio & throughput: New Config: Reviewed by Cursor Bugbot for commit 965eb65. Bugbot is set up for automated code reviews on this repo. Configure here. |
Greptile SummaryThis PR adds granular PII redaction stages for workflow execution and logs. The main changes are:
Confidence Score: 4/5This is close, but the restore path should be fixed before merging.
apps/sim/lib/workflows/executor/execution-core.ts
|
| Filename | Overview |
|---|---|
| apps/sim/lib/workflows/executor/execution-core.ts | Adds execution-time policy resolution and restore masking, but restored offloaded block outputs can still be hydrated as raw data. |
| apps/sim/lib/billing/retention.ts | Resolves stored PII rules into per-stage effective policies. |
| apps/sim/executor/execution/block-executor.ts | Masks block outputs before compaction and buffers streaming output when block-output redaction is enabled. |
| apps/sim/lib/logs/execution/pii-redaction.ts | Adds reusable object-string redaction with scrub or throw failure handling. |
| apps/sim/lib/logs/execution/pii-large-values.ts | Adds hydrate, mask, and re-store handling for offloaded values in log payloads. |
Reviews (14): Last reviewed commit: "feat(data-retention): gate granular PII ..." | Re-trigger Greptile
|
@greptile review |
|
@greptile review |
…nconditionally (no fail-open)
|
@greptile review |
|
@greptile review |
…redaction # Conflicts: # apps/sim/ee/data-retention/components/data-retention-settings.tsx
… = off everywhere
…block-output redaction is off
|
@greptile review |
| // Limitation: this walks inline strings only — values offloaded to | ||
| // large-value storage are still refs here and are not re-masked. In the | ||
| // normal flow that is safe (a run with the stage on masks before offload); | ||
| // the gap is the narrow case of a run that offloaded a large value while | ||
| // the stage was OFF and is resumed after the stage is turned ON. |
There was a problem hiding this comment.
When block-output redaction is enabled after a workflow already offloaded large block outputs, this restore path only masks inline strings in the snapshot. The offloaded payloads stay behind large-value refs. On resume or run-from-block, downstream blocks can still read the raw restored payload, and log persistence can skip the large-value scrub because block-output redaction is now enabled. This leaves raw PII reachable from prior block outputs after the stage is turned on.
| abortSignal: ctx.abortSignal, | ||
| // Propagate in-flight block-output redaction into child workflows so | ||
| // nested blocks mask outputs too (recurses: each child forwards it). | ||
| piiBlockOutputRedaction: ctx.piiBlockOutputRedaction, |
There was a problem hiding this comment.
Child workflows skip input redaction
Medium Severity
The new workflow-input PII stage runs only in executeWorkflowCore on top-level processedInput. Nested child runs are started with a direct Executor and pass childWorkflowInput unchanged. Only the block-output policy is forwarded on the context, so when the input stage is on and block outputs are off, mapped or explicit child input can execute and produce downstream state without in-flight input masking.
Reviewed by Cursor Bugbot for commit 8f86d77. Configure here.
…logs (preserve redacted content)
|
@greptile review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 6e9587a. Configure here.
| snapshot.state.blockStates = await redactObjectStrings( | ||
| snapshot.state.blockStates, | ||
| blockOutputOpts | ||
| ) |
There was a problem hiding this comment.
Refs stay unmasked When a paused run or run-from-block snapshot contains a large-value ref that was created before block-output redaction was enabled, this call only masks inline strings. Large-value refs are treated as opaque by
redactObjectStrings, so the ref still points at the original offloaded bytes. The later warm-up step can materialize that raw value for downstream blocks, letting them read or send unredacted PII even though the block-output stage is enabled.
…n logs stage is on
|
@greptile review |
| snapshot.state.blockStates = await redactObjectStrings( | ||
| snapshot.state.blockStates, | ||
| blockOutputOpts | ||
| ) |
There was a problem hiding this comment.
This restore path still only masks inline strings. When a paused run or run-from-block snapshot contains a large-value ref created before block-output redaction was enabled, redactObjectStrings leaves the ref untouched. The later warm-up can materialize that original offloaded value for downstream blocks, so the resumed workflow can read raw PII even though block-output redaction is now enabled. This path needs to hydrate, mask, and re-store restored refs before downstream state can use them.
| snapshot.state.blockStates = await redactObjectStrings( | ||
| snapshot.state.blockStates, | ||
| blockOutputOpts | ||
| ) | ||
| } |
There was a problem hiding this comment.
This restore path still leaves old offloaded block outputs unmasked. It only runs redactObjectStrings over restored blockStates, and that redactor treats large-value refs as opaque, so a paused run or run-from-block snapshot created before block-output redaction was enabled can still point at raw stored bytes. When the restored state is warmed and downstream blocks read that ref, they can receive the original PII even though the block-output stage is enabled. The restore path needs to hydrate, mask, and re-store those refs before exposing the state to execution.
… (env-tunable), remove request timeouts, sync large-value walk
…daction flag - New pii-granular-redaction feature flag (fallback PII_GRANULAR_REDACTION), layered on pii-redaction, gating the execution-altering input + block-output stages - Route returns piiGranularRedactionEnabled and rejects enabling granular stages when off - UI shows only the Logs stage tab unless the flag is on; clamps active stage - Drop the per-search Select all toggle; add a Deselect all action to the PII section header
|
@greptile review |
| if (snapshot.state?.blockStates) { | ||
| snapshot.state.blockStates = await redactObjectStrings( | ||
| snapshot.state.blockStates, | ||
| blockOutputOpts | ||
| ) | ||
| } | ||
| if (runFromBlock?.sourceSnapshot?.blockStates) { | ||
| runFromBlock.sourceSnapshot.blockStates = await redactObjectStrings( | ||
| runFromBlock.sourceSnapshot.blockStates, | ||
| blockOutputOpts | ||
| ) |
There was a problem hiding this comment.
This restore path masks only inline strings. redactObjectStrings leaves large-value refs untouched, and the snapshot warm-up runs after this block. When a paused run or run-from-block snapshot contains a ref created before block-output redaction was enabled, this code keeps the ref pointing at the original offloaded value. The later warm-up can materialize raw PII into blockStates, so downstream blocks can read unmasked data even though block-output redaction is enabled. The restore path needs to hydrate, mask, and re-store these refs before execution can use the restored state.


Summary
Type of Change
Testing
Tested manually. Unit tests for resolver back-compat,
redactObjectStrings+ failure modes, and the contract schema.bun run lint,check:api-validation:strict, andcheck:migrations origin/stagingall pass.Checklist