Hardening batch: fail-loudly invariants, audit visibility, migration rails, test depth#18
Open
Anmolnoor wants to merge 13 commits into
Open
Hardening batch: fail-loudly invariants, audit visibility, migration rails, test depth#18Anmolnoor wants to merge 13 commits into
Anmolnoor wants to merge 13 commits into
Conversation
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
python -O strips assert statements, so the executor's 16 invariant guards (kind/payload narrowing and file/git service wiring) could silently vanish in optimized runs. Replace them with a _require helper raising ExecutorInvariantError; ActionExecutor.execute() converts violations into typed FAILED ExecutionResults instead of interpreter crashes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The kind/payload validator already enforced payload presence but accepted a stray question payload on EXPLANATION/SHELL/TOOL_CALL actions and a stray explanation on QUESTION actions. Every cross-kind payload is now rejected at validation time; failures route through the existing plan repair loop. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Planner and guardrails each maintained an identical 16-entry set; a divergence would let the planner permit what policy blocks. The set now lives once in models/git.py with both services aliasing it, plus an identity test. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Event sinks must never break a turn, but their failures were invisible: the observer swallowed sink exceptions, compose_event_sink warned per event forever on a flapping sink, and a crash escaping EventLogWriter.write_event left the sessions index claiming a complete log. Sink failures are now counted and warned about once, a sink is disabled after 3 consecutive failures, writer crashes poison the index row to write_truncated, and gap-phraser fallbacks log their reason with a category instead of failing silently. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Schema migrations ran automatically against the user's real history DB with no backup and no post-check, and the v6 executescript rebuild dropped the source table without validating the copy. Migrations now back up the DB file first (keeping the newest backup), the v6 rebuild validates row counts before dropping anything, and failures raise HistoryMigrationError naming the backup path with the original data intact. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The unified-diff applier silently tolerated bare context lines and always compared with trailing newlines stripped, and never checked a hunk's declared source count against its body. Declared-count mismatches and no-op hunks are now parse-time DIFF_REJECTED errors; bare-context and newline-normalized matching remain as deliberate tolerances but are reported through FileMutationResult.leniency_notes so they land in the trace. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
41 new tests where coverage was thinnest: PlannerService unit tests (prompt injection, validation, repair retries), Codex adapter failure paths mapped to ProviderErrorCodes, and live renderer edge cases (narrow widths, spinner identity, markup/ANSI payloads). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A plan naming a nonexistent capability id raised a plain ValueError from CapabilityRegistry.resolve that escaped both the plan-repair loop and the orchestrator's PlanningError handling, crashing the turn; it is now wrapped as PlanningError so the model gets a repair retry. And render_detail_panel passed user request text to Rich as raw markup, allowing styling injection and a MarkupError crash mid-Live.update; it is now rendered literally via Text(). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Mark completed roadmaps (fixes roadmap, v0.1, v2 stages, v4) with status headers, add the hardening-batch CHANGELOG section, and document the write_truncated semantics, sink circuit-breaker, and pre-migration backups in TECHNICAL.md. Stage 8 (orchestrator slimming) is recorded as evaluated-and-descoped: the helpers' only callers live inside the orchestrator, and the constructor params are deliberate DI seams. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Found by the codex smoke test: an assistant message starting with a stray Rich closing tag (e.g. '[/bold]') crashed the concise renderer's console.print and the verbose Assistant panel with MarkupError after the turn had already completed. Model-generated text now renders literally via Text() in both paths, matching the earlier detail-panel fix. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Two findings from the codex smoke test. First, a piped one-shot run hitting an approval prompt died with a bare 'Aborted.' (exit 1) when typer.confirm got EOF; the prompt callback now returns None on non-TTY EOF and ApprovalService resolves the action as PENDING, producing the graceful loop stop. A TTY Ctrl-C still aborts. Second, a turn that failed, repaired itself, and completed naturally reported 'stopped: tool failed' plus 'no verification command ran': custom test scripts now classify as verification, cross-iteration verification is decided by the latest attempt instead of worst-wins, and the summary only uses stop framing for abnormal stops — recovered turns read 'Executed N actions, recovered from M earlier failures'. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Owner
Author
|
Pushed one more fix found by interactive use: the live status line's |
On a real TTY the live status line's '?'-toggle reader kept its byte-at-a-time stdin loop running while approval/question prompts were shown — renderer.pause() only stopped the Rich Live widget. The reader raced typer.confirm for every byte, swallowed the user's 'y', and the prompt resolved to its default 'n', blocking the action and stopping the turn. pause() now tears down the reader (thread stopped, termios restored, type-ahead flushed) and resume() reinstalls it. Pinned by a pty-backed regression test. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
5a5540b to
e03d417
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements
plans/fcli-hardening-roadmap.md— the staged fix plan from the 2026-06-10 full-project review. The theme: fcli was stricter about what gets in (policy, approval, validation) than about noticing when its own internals fail. This batch closes that gap, one reviewable commit per stage.Gates on every commit: pytest (460 → 542 tests), ruff check + format, mypy strict,
foundation doctor.Stages
ExecutorInvariantError→ FAILED results; survivespython -O.question/explanationcross-payload holes; failures route through plan repair.models/git.py); planner and guardrails can no longer diverge.compose_event_sink), writer crashes poison the session towrite_truncated, phraser fallbacks log a reason category..pre-v<target>.bakbackup, row-count validation before dropping the source table,HistoryMigrationErrornaming the backup.FileMutationResult.leniency_notes.test_planner.py, Codex failure paths, live-rendering edges.Bugs found and fixed along the way
ValueErrorand crashed the turn (found by stage 7 tests).MarkupErrorcrash / styling injection in the live detail panel and both chat renderers (the latter found only by live smoke testing).Verification
Beyond the suite, smoke-tested hard against the real Codex provider (gpt-5.5, ChatGPT login) in a scratch git workspace: file creation, fix-and-verify replan loop (3 iterations, verification "passed"), typed git stage+commit through the approval gate, graceful gap decline, non-TTY scope-block, empty-request guard, markup-injection probe (crashed before the fix, renders literally after), piped-approval probe (aborted before the fix, pending-approval after).
Review note: the roadmap file (
plans/fcli-hardening-roadmap.md) carries a status header per stage including what was rejected on inspection and why — start there.🤖 Generated with Claude Code