Skip to content

Simple API responses should be cached server-side and served with HTTP cache headers #1254

@dkliban

Description

@dkliban

Is your feature request related to a problem? Please describe.

Pulp's Simple Repository API responses (both HTML and JSON) are regenerated from the database on every request and include no HTTP cache headers (cache-control, etag, last-modified). This means:

  1. Server-side: Every request for a project's simple index page hits the database and regenerates the response, even though the content only changes when a new repository version is published.
  2. Client-side: Without cache headers, clients like uv cannot cache responses locally and must re-fetch every package index page on every resolve.

By contrast, PyPI pre-generates and caches its simple index pages, and serves them with cache-control: max-age=600, public and an etag. This allows both the server to avoid redundant work and clients to skip the network entirely for 10 minutes, then do cheap 304 Not Modified revalidation after that.

Evidence

# Pulp simple index — no cache headers
$ curl -sI "https://<pulp>/pulp/content/<domain>/pypi/public/simple/numpy/" | grep -iE 'cache|etag'
# (empty)

# PyPI — proper cache headers
$ curl -sI "https://pypi.org/simple/numpy/" | grep -iE 'cache|etag'
cache-control: max-age=600, public
etag: "5rCf07MprnFSJcmoYLKebQ"

Impact

Scenario PyPI Pulp (no caching)
Single uv pip compile (cached) 0.3 s 8.3 s
Full lock-file refresh (18 dirs, 8× parallel) ~30 s 1 m 49 s

Describe the solution you'd like

Two layers of caching:

  1. Server-side response caching: Cache the generated Simple API responses (e.g. in Redis or on disk) so that repeated requests for the same project index page don't re-query the database and regenerate the response. The cache should be invalidated when a new repository version is published.

  2. HTTP cache headers: Serve cached responses with cache-control: max-age=600, public and an etag derived from the repository version or publication timestamp. This enables client-side caching and cheap 304 Not Modified revalidation, matching PyPI behavior.

Together, these would make repeated dependency resolution sub-second when packages haven't changed.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions