feat(clickhouse): support LowCardinality, FixedString, CODEC, and SAMPLE BY#8
feat(clickhouse): support LowCardinality, FixedString, CODEC, and SAMPLE BY#8lohanidamodar wants to merge 4 commits intomainfrom
Conversation
Adds a `FixedString(N)` column type for ClickHouse, used for fixed-length string columns whose byte length is known and constant — ISO 3166 country codes, ISO 4217 currency codes, hash digests, and similar values that benefit from ClickHouse's columnar storage of fixed-width data. Exposed via `Table::fixedString($name, $length)` and the matching `Column` forwarder. Length must be at least 1. Other dialects throw `UnsupportedException` at compile time via the dialect `compileColumnType` match — the type has no portable mapping.
Adds a `lowCardinality()` modifier on `Column` that wraps the column type in `LowCardinality(...)` when compiled for ClickHouse. Useful for string columns with a small number of distinct values — status enums, type discriminators, country codes, category labels — where dictionary encoding cuts storage and accelerates reads. `Nullable` is applied outside `LowCardinality` to match ClickHouse's required wrapping order: `Nullable(LowCardinality(String))`. Other dialects throw `UnsupportedException` at compile time via the base `Schema::compileColumnDefinition` (and the matching guards in `PostgreSQL` and `SQLite` overrides).
Adds a `codec()` modifier on `Column` that accumulates codec specs and
emits `CODEC(c1, c2, ...)` after the column type when compiled for
ClickHouse. Multiple calls chain — `->codec('Delta(4)')->codec('LZ4')`
emits `CODEC(Delta(4), LZ4)`.
Each codec string is emitted verbatim; arguments are passed inline
(`'Delta(4)'`, `'ZSTD(3)'`) so the modifier stays a thin wrapper around
the underlying DDL. Empty strings and semicolons are rejected at
configure time.
Other dialects throw `UnsupportedException` at compile time via the
base `Schema::compileColumnDefinition` and the matching guards in the
`PostgreSQL` and `SQLite` overrides.
Adds a `Table::sampleBy($expression)` method (with a forwarder on `Column`) that registers a ClickHouse `SAMPLE BY` clause. Emitted after `ORDER BY` and before `TTL` / `SETTINGS` at table creation time, this enables the approximate-query path (`SELECT ... SAMPLE k`) on MergeTree-family engines. The expression is emitted verbatim and must not be empty or contain a semicolon. SAMPLE BY is rejected on engines that don't take an `ORDER BY` clause (`Memory`, `Log`, `TinyLog`, `StripeLog`) since sampling has no meaning there. Other dialects throw `UnsupportedException` at compile time via the base `Schema::compileCreate`.
📊 Coverage
Full per-file breakdown in the job summary. |
Greptile SummaryThis PR adds four ClickHouse-specific schema features to the fluent builder:
Confidence Score: 3/5The ClickHouse compilation path and all SQL-family dialects are correct, but MongoDB silently accepts and drops the three new ClickHouse-only modifiers instead of rejecting them. The MongoDB adapter overrides both compileCreate and compileAlter without calling the parent, and calls only compileColumnType per column — skipping every guard added in this PR for lowCardinality, codecs, and sampleBy. Callers using a MongoDB schema with these modifiers get no error and silently incorrect output, which contradicts the documented compile-time rejection guarantee. src/Query/Schema/MongoDB.php needs explicit guards for isLowCardinality, codecs, and sampleBy; tests/Query/Schema/FluentBuilderTest.php should gain matching MongoDB throw-tests. Important Files Changed
|
Summary
Adds support for four ClickHouse schema features that are common in
production OLAP workloads but currently can't be expressed via the
schema builder, forcing users to drop down to raw DDL — exactly what
a typed schema builder is meant to prevent.
Each addition lives in
src/Query/Schema/alongside the existingClickHouse modifiers (
ttl,engine,orderBy,settings,skip-index algorithms). Other dialects throw
UnsupportedExceptionat compile time so misuse is caught early.
What's new
LowCardinality(T)column modifierLowCardinalityis a standard ClickHouse storage modifier for stringcolumns with a bounded number of distinct values — status enums, type
discriminators, country/category codes. Dictionary encoding cuts storage
and accelerates reads, and production OLAP schemas without it are an
anti-pattern.
Nullableis applied outsideLowCardinalityto matchClickHouse's required wrapping order.
FixedString(N)column typeFixed-length strings are strictly more efficient than
Stringwhen thebyte length is known and constant — ISO codes, hash digests, fixed-width
identifiers. New
Table::fixedString($name, $length)plus a matchingforwarder on
Column. Length must be at least 1.Column-level
CODEC(...)clausesMultiple
codec()calls accumulate and emitCODEC(c1, c2, ...). Each codec string is emitted verbatim, soarguments live inline (
'Delta(4)','ZSTD(3)') and the modifierstays a thin wrapper around the underlying DDL. Empty strings and
semicolons are rejected at configure time. Column-level codecs are a
core ClickHouse feature for tuning storage size and read throughput;
the schema builder couldn't express them before this PR.
SAMPLE BYtable optionSAMPLE BYenables approximate-query support(
SELECT ... SAMPLE k) and must be declared at table creation time.Emitted after
ORDER BYand beforeTTL/SETTINGS. Rejected onengines that don't take an
ORDER BYclause (Memory,Log,TinyLog,StripeLog).Why these specifically
The schema builder can already model the standard MergeTree shape, but
production ClickHouse schemas almost always reach for one or more of
these modifiers. Without them, users have to fall back to raw DDL,
which defeats the purpose of a typed builder.
The patches follow the same dialect pattern as the existing
ttl,engine,orderBy,settings, and skip-index features added in #6:state lives on
Column/Table, ClickHouse compiles it, and baseSchema/PostgreSQL/SQLiteoverrides throwUnsupportedExceptionso misuse on the wrong dialect is caught atcompile time.
Out of scope (planned follow-ups)
uniqExact,uniq,uniqCombined,uniqHLL12) onBuilder— would let users expressClickHouse-native exact and approximate distinct-count aggregates
without dropping to raw expressions.
toStartOfHour,toStartOfDay,toStartOfWeek,toStartOfMonth,toStartOfMinute) onBuilder— for time-seriesrollups in
SELECTandGROUP BY.These are query-builder features rather than schema features, so a
separate PR keeps this one focused.
Tests
tests/Query/Schema/ClickHouseTest.phpasserting exact DDL output for each feature, plus validation-error
coverage (zero length, empty/semicolon codec, empty/semicolon SAMPLE
BY, SAMPLE BY on a non-
ORDER BYengine).tests/Query/Schema/FluentBuilderTest.phpcoveringLowCardinality,FixedString, column CODEC, andSAMPLE BYon MySQL / PostgreSQL /SQLite.
Test plan
composer test(5197 tests pass)composer lint(Pint passes)composer check(PHPStan max passes)