Hardening batch: fail-loudly invariants, audit visibility, migration rails, test depth by Anmolnoor · Pull Request #18 · Anmolnoor/fcli

Anmolnoor · 2026-06-11T04:36:46Z

Summary

Implements plans/fcli-hardening-roadmap.md — the staged fix plan from the 2026-06-10 full-project review. The theme: fcli was stricter about what gets in (policy, approval, validation) than about noticing when its own internals fail. This batch closes that gap, one reviewable commit per stage.

Gates on every commit: pytest (460 → 542 tests), ruff check + format, mypy strict, foundation doctor.

Stages

Executor invariants fail loudly — all 16 asserts replaced with typed ExecutorInvariantError → FAILED results; survives python -O.
Plan-action validation — closed the stray question/explanation cross-payload holes; failures route through plan repair.
One source of truth for git mutation subcommands (models/git.py); planner and guardrails can no longer diverge.
Audit-trail failures visible — sink failure counting, disable-after-3 circuit breaker (observer + compose_event_sink), writer crashes poison the session to write_truncated, phraser fallbacks log a reason category.
Migration safety rails — pre-migration .pre-v<target>.bak backup, row-count validation before dropping the source table, HistoryMigrationError naming the backup.
Diff-applier strictness — parse-time rejection of count-mismatched and no-op hunks; bare-context and newline-normalized matching kept but surfaced via FileMutationResult.leniency_notes.
Test depth — 41 new tests: isolated test_planner.py, Codex failure paths, live-rendering edges.
Orchestrator slimming — evaluated and descoped with evidence (helpers' only callers are internal; ctor params are deliberate DI seams). See roadmap.
Docs/plans hygiene — status headers everywhere, CHANGELOG, TECHNICAL.md.

Bugs found and fixed along the way

Hallucinated capability id escaped the repair loop as an unwrapped ValueError and crashed the turn (found by stage 7 tests).
Model text parsed as Rich markup → MarkupError crash / styling injection in the live detail panel and both chat renderers (the latter found only by live smoke testing).
Piped non-TTY runs hard-aborted ("Aborted.", exit 1) at approval prompts → now graceful PENDING_APPROVAL stop; TTY Ctrl-C still aborts.
Successful self-repairing turns reported "stopped: tool failed" + "no verification command ran" → custom test scripts count as verification, latest verification attempt wins across iterations, and recovered turns read "Executed N actions, recovered from M earlier failures".

Verification

Beyond the suite, smoke-tested hard against the real Codex provider (gpt-5.5, ChatGPT login) in a scratch git workspace: file creation, fix-and-verify replan loop (3 iterations, verification "passed"), typed git stage+commit through the approval gate, graceful gap decline, non-TTY scope-block, empty-request guard, markup-injection probe (crashed before the fix, renders literally after), piped-approval probe (aborted before the fix, pending-approval after).

Review note: the roadmap file (plans/fcli-hardening-roadmap.md) carries a status header per stage including what was rejected on inspection and why — start there.

🤖 Generated with Claude Code

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

python -O strips assert statements, so the executor's 16 invariant guards (kind/payload narrowing and file/git service wiring) could silently vanish in optimized runs. Replace them with a _require helper raising ExecutorInvariantError; ActionExecutor.execute() converts violations into typed FAILED ExecutionResults instead of interpreter crashes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The kind/payload validator already enforced payload presence but accepted a stray question payload on EXPLANATION/SHELL/TOOL_CALL actions and a stray explanation on QUESTION actions. Every cross-kind payload is now rejected at validation time; failures route through the existing plan repair loop. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Planner and guardrails each maintained an identical 16-entry set; a divergence would let the planner permit what policy blocks. The set now lives once in models/git.py with both services aliasing it, plus an identity test. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Event sinks must never break a turn, but their failures were invisible: the observer swallowed sink exceptions, compose_event_sink warned per event forever on a flapping sink, and a crash escaping EventLogWriter.write_event left the sessions index claiming a complete log. Sink failures are now counted and warned about once, a sink is disabled after 3 consecutive failures, writer crashes poison the index row to write_truncated, and gap-phraser fallbacks log their reason with a category instead of failing silently. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Schema migrations ran automatically against the user's real history DB with no backup and no post-check, and the v6 executescript rebuild dropped the source table without validating the copy. Migrations now back up the DB file first (keeping the newest backup), the v6 rebuild validates row counts before dropping anything, and failures raise HistoryMigrationError naming the backup path with the original data intact. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The unified-diff applier silently tolerated bare context lines and always compared with trailing newlines stripped, and never checked a hunk's declared source count against its body. Declared-count mismatches and no-op hunks are now parse-time DIFF_REJECTED errors; bare-context and newline-normalized matching remain as deliberate tolerances but are reported through FileMutationResult.leniency_notes so they land in the trace. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

41 new tests where coverage was thinnest: PlannerService unit tests (prompt injection, validation, repair retries), Codex adapter failure paths mapped to ProviderErrorCodes, and live renderer edge cases (narrow widths, spinner identity, markup/ANSI payloads). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

A plan naming a nonexistent capability id raised a plain ValueError from CapabilityRegistry.resolve that escaped both the plan-repair loop and the orchestrator's PlanningError handling, crashing the turn; it is now wrapped as PlanningError so the model gets a repair retry. And render_detail_panel passed user request text to Rich as raw markup, allowing styling injection and a MarkupError crash mid-Live.update; it is now rendered literally via Text(). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Mark completed roadmaps (fixes roadmap, v0.1, v2 stages, v4) with status headers, add the hardening-batch CHANGELOG section, and document the write_truncated semantics, sink circuit-breaker, and pre-migration backups in TECHNICAL.md. Stage 8 (orchestrator slimming) is recorded as evaluated-and-descoped: the helpers' only callers live inside the orchestrator, and the constructor params are deliberate DI seams. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Found by the codex smoke test: an assistant message starting with a stray Rich closing tag (e.g. '[/bold]') crashed the concise renderer's console.print and the verbose Assistant panel with MarkupError after the turn had already completed. Model-generated text now renders literally via Text() in both paths, matching the earlier detail-panel fix. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Two findings from the codex smoke test. First, a piped one-shot run hitting an approval prompt died with a bare 'Aborted.' (exit 1) when typer.confirm got EOF; the prompt callback now returns None on non-TTY EOF and ApprovalService resolves the action as PENDING, producing the graceful loop stop. A TTY Ctrl-C still aborts. Second, a turn that failed, repaired itself, and completed naturally reported 'stopped: tool failed' plus 'no verification command ran': custom test scripts now classify as verification, cross-iteration verification is decided by the latest attempt instead of worst-wins, and the summary only uses stop framing for abnormal stops — recovered turns read 'Executed N actions, recovered from M earlier failures'. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Anmolnoor · 2026-06-11T07:36:07Z

Pushed one more fix found by interactive use: the live status line's ?-toggle keypress reader kept reading stdin during approval prompts, eating the typed y so typer.confirm fell back to the default n and the turn stopped. pause() now fully releases stdin (reader thread stopped, termios restored, type-ahead flushed) and resume() reinstalls the reader — pinned with a pty-backed regression test. Affected all mid-turn prompts (approvals, agent questions, scope grants) on a real TTY. 543 tests green.

On a real TTY the live status line's '?'-toggle reader kept its byte-at-a-time stdin loop running while approval/question prompts were shown — renderer.pause() only stopped the Rich Live widget. The reader raced typer.confirm for every byte, swallowed the user's 'y', and the prompt resolved to its default 'n', blocking the action and stopping the turn. pause() now tears down the reader (thread stopped, termios restored, type-ahead flushed) and resume() reinstalls it. Pinned by a pty-backed regression test. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Smoke Test and others added 12 commits June 10, 2026 16:58

Add hardening roadmap from 2026-06-10 project review

1772fd8

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Anmolnoor force-pushed the feat/fcli-hardening branch from 5a5540b to e03d417 Compare June 11, 2026 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardening batch: fail-loudly invariants, audit visibility, migration rails, test depth#18

Hardening batch: fail-loudly invariants, audit visibility, migration rails, test depth#18
Anmolnoor wants to merge 13 commits into
mainfrom
feat/fcli-hardening

Anmolnoor commented Jun 11, 2026

Uh oh!

Anmolnoor commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Anmolnoor commented Jun 11, 2026

Summary

Stages

Bugs found and fixed along the way

Verification

Uh oh!

Anmolnoor commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant