Use userspace page cache for datalake benchmarks by alexey-milovidov · Pull Request #818 · ClickHouse/ClickBench

alexey-milovidov · 2026-03-13T21:48:46Z

Summary

Switch clickhouse-datalake and clickhouse-datalake-partitioned from filesystem cache (/dev/shm/) to the userspace page cache
Replace filesystem_caches config with page_cache_size: auto in clickhouse-local.yaml
Replace --filesystem_cache_name cache with --use_page_cache_for_object_storage 1 in query invocations

Test plan

Run clickhouse-datalake benchmark and verify hot runs use the page cache
Run clickhouse-datalake-partitioned benchmark and verify hot runs use the page cache
Compare results against previous filesystem cache numbers

🤖 Generated with Claude Code

…chmarks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This ensures the userspace page cache persists across tries. A fresh process per query group means try 1 is naturally cold (empty page cache) and tries 2-3 are hot, without needing drop_caches. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…lickBench into use-page-cache-for-datalake

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…lickBench into use-page-cache-for-datalake

…-cache-for-datalake

`CachedInMemoryReadBufferFromFile::populateBlockRange` previously issued one `in->readBigAt` per missing 1 MiB block. On object storage, each call is a separate HTTP request, so a cold scan of a 14 GB Parquet file through the userspace page cache made ~15k requests, each paying the TCP/TLS round-trip — measurably slower than the filesystem cache, which fetches in larger segments. Coalescing was previously implemented in commit 682b070 and reverted in c178d2a to avoid transient memory spikes from huge temporary buffers under parallel cold reads. Re-introduce coalescing with a hard cap on the temporary buffer (`max_coalesced_bytes` = 16 MiB). Long miss runs are split into multiple fetches, bounding peak transient memory per call. Single-block misses still read directly into the cache cell, avoiding the buffer and the extra `memcpy`. Measured locally on c8g.24xlarge against the ClickBench `clickhouse-datalake` queries (43 queries, single 14.7 GB Parquet on S3, totals over all queries): cold runs: filesystem cache 62.28s -> page cache (default) 56.58s hot runs: filesystem cache 18.57s -> page cache (default) 13.59s The page cache is now strictly faster than the filesystem cache on both cold and hot, with no benchmark-script tuning required. Context: ClickHouse/ClickBench#818 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Use userspace page cache instead of filesystem cache for datalake ben…

3cac209

…chmarks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

alexey-milovidov marked this pull request as draft March 13, 2026 21:54

alexey-milovidov and others added 6 commits March 13, 2026 23:12

Update results

5dcb9bc

Update results

64c71aa

Merge branch 'use-page-cache-for-datalake' of github.com:ClickHouse/C…

7c1f918

…lickBench into use-page-cache-for-datalake

Update results

07fb2b5

Merge branch 'main' into use-page-cache-for-datalake

bc9d6bc

rschu1ze mentioned this pull request Mar 18, 2026

Update ClickHouse results for Data Lakes #819

Merged

alexey-milovidov and others added 5 commits March 21, 2026 20:36

Update results

d68fa8a

Use page_cache_max_size with 80% of RAM instead of page_cache_size: auto

63765a3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update results

9d4ce1e

Merge branch 'use-page-cache-for-datalake' of github.com:ClickHouse/C…

ddb0c79

…lickBench into use-page-cache-for-datalake

Merge branch 'main' of github.com:ClickHouse/ClickBench into use-page…

c36a85d

…-cache-for-datalake

alexey-milovidov marked this pull request as ready for review March 22, 2026 20:40

alexey-milovidov marked this pull request as draft March 22, 2026 20:42

alexey-milovidov added 3 commits March 22, 2026 21:57

Merge branch 'main' into use-page-cache-for-datalake

22d4986

Merge branch 'main' into use-page-cache-for-datalake

17c62fd

Update results

2654bc4

alexey-milovidov marked this pull request as ready for review May 6, 2026 12:10

alexey-milovidov marked this pull request as draft May 6, 2026 12:22

alexey-milovidov mentioned this pull request May 6, 2026

Coalesce consecutive page cache misses into single S3 requests ClickHouse/ClickHouse#104230

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use userspace page cache for datalake benchmarks#818

Use userspace page cache for datalake benchmarks#818
alexey-milovidov wants to merge 15 commits intomainfrom
use-page-cache-for-datalake

alexey-milovidov commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alexey-milovidov commented Mar 13, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant