Skip to content

feat(headroom): add compress worker#4197

Draft
jeanduplessis wants to merge 3 commits into
mainfrom
jdp/headroom-compress-worker
Draft

feat(headroom): add compress worker#4197
jeanduplessis wants to merge 3 commits into
mainfrom
jdp/headroom-compress-worker

Conversation

@jeanduplessis

Copy link
Copy Markdown
Contributor

Summary

Adds a Worker-gated Cloudflare Container deployment for Headroom compression only.

Why this change is needed

Headroom needs a restricted Cloudflare-hosted compression endpoint that can be benchmarked without exposing chat, retrieval, dashboard, or stats surfaces. Compress-only traffic does not need durable CCR retrieval storage, so the deployment can stay stateless and route requests through a Worker guard layer.

How this is addressed

  • Adds a headroom-compress Worker service with a Container binding on port 8787 and a pinned Cloudflare Registry image reference.
  • Allows only POST /v1/compress and GET /readyz; rejects chat, messages, responses, retrieve, dashboard, stats, and other routes before container fetch.
  • Adds bearer-token auth, provider/API configuration, request size, timeout, and cost guards around compression requests.
  • Adds health, logging, unit coverage, deploy docs, image build docs, and a benchmark script for repeated compression verification.

Verification

Manual testing performed
  1. Deployed headroom-compress to Cloudflare Workers with the pinned Headroom container image.
  2. Verified GET https://headroom.kiloapps.io/readyz returns 200.
  3. Verified unauthenticated POST /v1/compress returns 401.
  4. Verified authenticated POST /v1/compress returns 200.
  5. Verified denied routes return 404 at the Worker layer.
  6. Ran benchmark script against final URL; logs case compressed 9243 tokens to 333 tokens and passed thresholds.

Visual Changes

N/A

Reviewer Notes

Human Reviewer

  • Validate that using mirrored upstream Headroom v0.27.0 amd64 digest is acceptable until a native amd64 source builder is available. Local arm64 Docker could not build the amd64 Rust extension under QEMU.
  • Validate deployed env/secrets in Cloudflare before relying on this endpoint for production benchmarking.

Code Reviewer Agent

Code Reviewer Notes
  • Focus route enforcement in the Worker before container fetch; only /v1/compress and /readyz should reach behavior paths.
  • Review guard defaults for request bytes, message count, model allow-list, timeout, and estimated cost.
  • Review benchmark script behavior for fixture loading, token discovery from env or .dev.vars, threshold failure exit codes, and JSON report output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants