Skip to content

Use userspace page cache for datalake benchmarks#818

Draft
alexey-milovidov wants to merge 15 commits intomainfrom
use-page-cache-for-datalake
Draft

Use userspace page cache for datalake benchmarks#818
alexey-milovidov wants to merge 15 commits intomainfrom
use-page-cache-for-datalake

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

Summary

  • Switch clickhouse-datalake and clickhouse-datalake-partitioned from filesystem cache (/dev/shm/) to the userspace page cache
  • Replace filesystem_caches config with page_cache_size: auto in clickhouse-local.yaml
  • Replace --filesystem_cache_name cache with --use_page_cache_for_object_storage 1 in query invocations

Test plan

  • Run clickhouse-datalake benchmark and verify hot runs use the page cache
  • Run clickhouse-datalake-partitioned benchmark and verify hot runs use the page cache
  • Compare results against previous filesystem cache numbers

🤖 Generated with Claude Code

…chmarks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov marked this pull request as draft March 13, 2026 21:54
alexey-milovidov and others added 6 commits March 13, 2026 23:12
This ensures the userspace page cache persists across tries.
A fresh process per query group means try 1 is naturally cold
(empty page cache) and tries 2-3 are hot, without needing drop_caches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov marked this pull request as ready for review March 22, 2026 20:40
@alexey-milovidov alexey-milovidov marked this pull request as draft March 22, 2026 20:42
@alexey-milovidov alexey-milovidov marked this pull request as ready for review May 6, 2026 12:10
@alexey-milovidov alexey-milovidov marked this pull request as draft May 6, 2026 12:22
clickgapai pushed a commit to clickgapai/ClickHouse that referenced this pull request May 6, 2026
`CachedInMemoryReadBufferFromFile::populateBlockRange` previously issued
one `in->readBigAt` per missing 1 MiB block. On object storage, each call
is a separate HTTP request, so a cold scan of a 14 GB Parquet file
through the userspace page cache made ~15k requests, each paying the
TCP/TLS round-trip — measurably slower than the filesystem cache, which
fetches in larger segments.

Coalescing was previously implemented in commit 682b070 and reverted
in c178d2a to avoid transient memory spikes from huge temporary
buffers under parallel cold reads.

Re-introduce coalescing with a hard cap on the temporary buffer
(`max_coalesced_bytes` = 16 MiB). Long miss runs are split into multiple
fetches, bounding peak transient memory per call. Single-block misses
still read directly into the cache cell, avoiding the buffer and the
extra `memcpy`.

Measured locally on c8g.24xlarge against the ClickBench
`clickhouse-datalake` queries (43 queries, single 14.7 GB Parquet on S3,
totals over all queries):

  cold runs:  filesystem cache 62.28s -> page cache (default)  56.58s
  hot runs:   filesystem cache 18.57s -> page cache (default)  13.59s

The page cache is now strictly faster than the filesystem cache on both
cold and hot, with no benchmark-script tuning required.

Context: ClickHouse/ClickBench#818

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant