Skip to content

cocoon gc: surface per-module summary (scanned / collected / freed) #33

@CMGS

Description

@CMGS

What

`cocoon gc` currently emits a single line — `GC completed` — regardless of whether it scanned 7 modules and freed 30 GB or did literally nothing. There is no signal in the default log path telling the operator what happened.

Repro on the testbed: `/var/lib/cocoon` was 38 GB, all blobs/dirs/snapshots referenced by their respective indices. `cocoon gc` finished in 5 ms with the single `GC completed` line. From the operator side it is indistinguishable from "GC ran but found no targets" vs "GC short-circuited because of a bug".

Why

Operators reach for `cocoon gc` when disk fills up. If GC truly has nothing to delete (because every blob is index-referenced), the operator should be told that — so they know to `image rm` first instead of poking at GC further. Conversely, if GC reclaims something, they want to see the bytes / objects.

Where the data already exists

`gc/orchestrator.go::Run` already iterates the modules with full per-module knowledge:

```go
// Phase 2: resolve deletion targets.
targets := make(map[string][]string)
for _, m := range locked {
if ids := m.resolveTargets(...); len(ids) > 0 {
targets[m.getName()] = ids
}
}

// Phase 3: collect (skip modules with no targets).
for _, m := range locked {
ids := targets[m.getName()]
if len(ids) == 0 {
continue
}
if err := m.collect(ctx, ids); err != nil { ... }
}
```

So we already know per module: `name`, `len(ids) before collect`, `collect error`. We do not yet have bytes freed; that would require either pre-stat'ing the targets or having `collect` return a delta.

Proposed UX

INFO-level output (default), one line per module that did something or was skipped:

```
gc oci: 0 orphan blobs (16 referenced)
gc cloudimg: 0 orphan blobs (3 referenced)
gc snapshot: 1 stale-pending reclaimed
gc cloudhypervisor: 2 orphan run dirs reclaimed (vm IDs: …)
gc completed: 4 modules, 3 objects collected
```

If a module is skipped because its lock was busy:

```
gc oci: skipped (lock busy)
gc aborted: modules skipped (lock busy): oci
```

(`Run` already returns this as an error; we just want the summary log to mirror it.)

Suggested implementation

  1. Make each `gc.Module[S].Collect` return `(int, error)` — count of actually deleted objects (some Collect impls walk a list and skip-on-error; the count returned should reflect successful deletions).
  2. `Orchestrator.Run` accumulates per-module `(name, scanned, collected)` into a small struct and logs them at INFO before returning.
  3. Optional: have each module's `ReadDB` / `Resolve` also surface the "referenced N" count for the noop case so the log is informative when nothing was orphan.

Out of scope: bytes-freed accounting (would require stat'ing every target, slows GC; can be added later as DEBUG-only if useful).

Priority

Low — purely a UX / observability fix. No correctness bug, no behavior change, no API churn beyond `Module.Collect` return type.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions