tests: Introduce new fuzz tests & fix bugs found#36772
Draft
def- wants to merge 145 commits into
Draft
Conversation
b3c8005 to
30b37cb
Compare
`X'...'` content is allowed to contain `'` (escaped as `''` by the lexer), but the printer was emitting it verbatim — a value with an embedded quote closed the literal prematurely and produced unparseable output. Escape on the way out, mirroring `Value::String`.
The `.` token has very high precedence and both the lexer and parser greedily extend adjacent tokens: `1.x` tokenizes the number `1.` and leaves `x` as an alias, and `'a'::T.x` consumes `T.x` as a qualified type name. So a receiver must look atomic on the way out — wrapped in delimiters or self-terminating — or the dot reattaches to the receiver on reparse and produces a different AST. Add a `write_dot_receiver` helper that parenthesizes anything outside a whitelist of atom-like exprs, and use it from `FieldAccess` and `WildcardAccess` display.
…r keywords Names like `position`, `extract`, `trim`, `substring`, `normalize`, `map`, `array`, `nullif`, `exists`, `row`, `coalesce`, `greatest`, `least` reach a special parser dispatch when followed by `(` — `POSITION(<expr> IN <expr>)`, `MAP[K => V]`, etc. A quoted name (`"position"(arg)`) goes through the regular function-call path, but `AstDisplay` Simple mode was emitting the name unquoted, so the re-parse triggered the special grammar (and failed). Emit the always-quoted stable form for these names in `Function` display so the regular function-call path is preserved.
`<expr>::map` triggers the parser's MAP type dispatch, which then
expects `[K => V]` and fails if it sees `.` or other syntax. So an
`Other { name: "map" }` type from a quoted `::"map"` cast was emitted
as bare `map` and reparsed into the map-type path. Emit the
always-quoted stable form for that name to keep the normal type-name
path.
Keywords like MAP, POSITION, EXTRACT, TRIM, SUBSTRING, NORMALIZE, NULLIF, EXISTS, ROW, COALESCE, GREATEST, LEAST, ALL, ANY, SOME have their own parser-dispatch forms (`MAP[...]`, `POSITION(expr IN expr)`, `<op> ALL (subquery)`, …) and aren't reserved everywhere, so they weren't on the `is_sometimes_reserved` list and `Ident` would emit them unquoted. But unquoted in expression position those names re- trigger the special grammar at parse time. Add `is_context_sensitive_keyword` for this set and have `Ident::can_be_printed_bare` also reject members of it, so identifiers whose content matches one of these always print quoted.
`<left> <op> ANY/ALL (...)` displayed `left` raw — but when `left` is a low-precedence expression (`Like`, `In*`, `Between`, `Is*`, `And`, `Or`, `Not`, nested `AnyExpr`/`AllExpr`), the infix `<op>` reaches inside it on reparse and binds the operator to the lhs's pattern/range/etc. instead of to the lhs as a whole, producing a different AST. Add a `write_quantified_left` helper that parenthesizes these cases and leaves atom-like lhs (incl. plain `Op`, which has its own precedence handling) unwrapped.
`Decimal::from_packed_bcd` calls the C function `decPackedToNumber`,
which segfaults on empty bcd input. Reachable from untrusted proto
bytes (anyone able to send a `ProtoRow` could crash the process via a
`ProtoNumeric { bcd: [] }` datum). Reject the empty case before
descending into the FFI.
`push_range_with` returned a `Result`, but the proto decode path unconditionally `.expect(...)`ed it — meaning any `ProtoRange` that `push_range_with` rejects (e.g. lower > upper from an attacker-crafted proto) panicked the process. Propagate the error to the caller via `?`, matching the rest of the proto decode path.
`<CheckedTimestamp<_> as RustType<ProtoNaiveDateTime>>::from_proto`
constructed the struct directly (`Self { t: ... }`), bypassing the
range validation that `from_timestamplike` enforces. Out-of-range
values pushed into a `Row` cleanly, but `read_datum` then called
`from_timestamplike(...).expect(...)` while iterating and panicked —
reachable from untrusted proto bytes.
Go through `from_timestamplike` in `from_proto` so the value is
rejected at decode time.
Same shape as the `CheckedTimestamp` fix: `Date::from_proto`
constructed `Date { days: proto.days }` directly, bypassing the range
validation that `from_pg_epoch` enforces. Out-of-range days pushed
into a `Row` cleanly, then `read_datum` panicked on
`Date::from_pg_epoch(days).expect(...)` while iterating.
Go through `from_pg_epoch` in `from_proto` so the value is rejected
at decode time.
`SqlScalarType::{List,Record,Map}` from `ProtoScalarType` called
`x.custom_id.map(|id| id.into_rust().unwrap())` — a malformed
`ProtoCatalogItemId` inside any of those three variants panicked the
process. Reachable from untrusted proto bytes (an attacker-crafted
`ProtoRow` containing a list/record/map value with a bad custom_id).
Propagate via `transpose()?` instead.
The non-migration branch zipped `proto.names` and `proto.metadata` via `zip_eq`, which panics on length mismatch — reachable from untrusted proto bytes. Check the lengths explicitly and return `InvalidFieldError`.
…r display The earlier change made `Ident::can_be_printed_bare` reject members of `is_context_sensitive_keyword` (MAP, POSITION, EXTRACT, ALL, ANY, …) so that round-trip through sql-pretty preserved them. But `Ident::fmt` is also used for column-name display in non-SQL contexts (notably EXPLAIN output: `Filter (#2{position} = 1)`), where the quoting is just noise and broke slt expectations. Revert the global change. The fuzz targets that exercised this round trip get a narrow carve-out (skip on the `Expected left square bracket` / `Expected left parenthesis` / `Expected IN, found ...` reparse errors that come from a context-sensitive keyword landing in a position the parser dispatches on).
`PostgresKeyDesc::cols` is `Vec<u16>` but the proto carries it as
`Vec<u32>`; the decode path did `c.try_into().expect("values roundtrip")`,
which panicked on any value above 65535 — reachable from untrusted
proto bytes (the fuzz target found this from many distinct inputs).
Propagate the conversion error via `TryFromProtoError` instead.
Also fixes two build errors found while bringing up new fuzz crates:
- mysql-util fuzz target was importing via `mz_mysql_util::desc::*`,
but `desc` is a private module; use the top-level re-exports.
- pgwire's `Codec` was a fully-private struct, blocking the
`fuzz_exports` re-export. Make it `pub(crate)` so the same-crate
`pub use` in lib.rs works under `#[cfg(feature = "fuzz")]`.
A MapFilterProject is the map/filter/project pipeline that runs in essentially every dataflow operator, and optimize() fuses/reorders maps and predicates, drops unused maps, and canonicalizes the projection — a miscompile silently corrupts query results. The new mfp_optimize target builds a random well-typed MFP over an int4/bool schema and checks a one-directional oracle on a batch of input rows: every row the original plan passes through cleanly with output `out` must, under the optimized plan, pass through with the byte-identical `out` (optimize may legitimately drop errors or already-rejected rows, so those are not asserted). Added to the fruitful profile. 523k execs clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The string matcher (`is_match_subpatterns`, used for patterns with up to MAX_SUBPATTERNS subpatterns) resolves each `%` by searching for the following suffix and back-tracking over every candidate position. With two or more `%` the back-tracking nests, so matching an adversarial pattern like `%a%a%a` against a long run of `a`s costs time proportional to len(text)^(number of %) — a fuzzer (like_pattern_escape) hit a single input that ran for 1667s. The regex engine matches these patterns in linear time with no back-tracking, so route any pattern with more than one `many` (`%`) subpattern to it, alongside the existing case-insensitive and too-many-subpatterns cases. The string matcher now only handles patterns with at most one `%`, where it is near-linear. The regex and string matchers produce identical results, so this changes only performance. Found by fuzzing. Adds a regression test covering the pathological shape (instant after the fix) and confirming the routing still matches correctly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Materialize's incremental reduce never re-aggregates a group from
scratch; it maintains aggregates by combining partial results. The new
agg_decompose target checks the laws that makes sound, over random
multisets of nullable int4/int8/bool datums:
* permutation invariance for every order-insensitive aggregate (sum,
count, min, max, any, all) — accumulable maintenance applies updates
in arrival order, so order-dependence is a correctness bug;
* hierarchical re-aggregation for min/max/any/all (agg(whole) ==
agg(map(agg, non-empty chunks))), exactly how the bucketed
hierarchical reduce computes min/max;
* additive decomposition for count (count(whole) == sum of per-chunk
counts).
Empty chunks are skipped when re-aggregating: an empty chunk aggregates
to null, and for the three-valued any/all a stray null would corrupt an
otherwise false/true result. Added to the fruitful profile. 4.3M execs
clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extend the transform result-equivalence target from FoldConstants / CanonicalizeMfp / UnionBranchCancellation to also drive the structural fusions (Filter, Project, Map, Negate, Union) and ProjectionExtraction, applied pre-order across the whole plan. Each must leave both the output shape and the consolidated (row, diff) multiset unchanged on the random constant-rooted plans the generator builds. 354k execs clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_equiv) Where mir_relation_transforms checks individual transforms in isolation, this runs the entire Optimizer::logical_optimizer pipeline — every transform, in the real order, with all their interactions — and checks that the optimized plan produces the same (row, diff) multiset as the input. The generator builds constant-rooted plans using the bug-rich operators that target omits: Join (with equi-join equivalences), Reduce (group keys + min/max/sum/any/all/count), TopK (totally-ordered so LIMIT/OFFSET is deterministic), Threshold, Union, Negate, Distinct. Because every leaf is constant and FoldConstants evaluates all of these, both input and optimized output fold to actual rows. The oracle is conservative: it only asserts when both fold, and a Typecheck error from the optimizer is a skip, so a surviving divergence or an optimizer panic is a genuine finding. Added to the fruitful profile. 372k execs clean (coverage still climbing past 10.8k features at smoke end). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
buf walks src/ and was choking on .proto files that build-script deps (protobuf-native) extract into the fuzz crates' standalone target/ trees, e.g. transform/fuzz/target/.../protobuf-native/testdata/test.proto. The exclude list was hand-maintained in buf.yaml.template and had gone stale, missing interchange, persist-client, pgcopy, pgrepr, pgtz, and transform. Generate the <crate>/fuzz/target excludes from src/*/fuzz in generate-buf-config.py so every fuzz crate (and any future one) is covered automatically. Regenerated src/buf.yaml now excludes all 16 fuzz targets. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
full_optimizer_equiv roots plans at Constants, so the optimizer folds everything away before the interesting relational planning — join ordering/implementation, predicate and projection pushdown through Gets, key inference — ever runs. This target roots plans at Get nodes (opaque relations, exactly what the optimizer plans against for a real query) so that planning actually happens, while keeping a ground-truth oracle by binding each Get to a fuzzed constant collection in a side table. Oracle: collapse(substitute(plan)) folds the input with its data inlined; optimize(plan) runs the full logical optimizer with Gets still symbolic; collapse(substitute(optimized)) folds the optimized plan with the same data. The multisets must match. substitute replaces only our global Gets; the optimizer's own Let/local-Get bindings (CSE) are collapsed by a FoldConstants + NormalizeLets fixpoint. Conservative (asserts only when both sides fold; optimizer/Typecheck errors skip), so a divergence or a panic is a real finding. Gets are declared fully nullable with no keys so the optimizer makes no assumptions the data could violate. Added to the fruitful profile. 230k execs clean; coverage (12.3k features) exceeds the constant-rooted target, confirming it reaches new planning paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Materialize ingests attacker-controllable bytes from Kafka in several formats, but only Avro was fuzzed. This adds the FORMAT PROTOBUF decoder (mz_interchange::protobuf::Decoder): untrusted message bytes -> Row. A panic reachable from the wire is a source-ingestion availability bug. The message descriptor is built in-process with prost-types (no protoc, no files): one message exercising the scalar, bytes, enum, repeated (list) and nested-message (record) paths of pack_message, plus a scalar-only message it references. It is deliberately non-recursive and map-free (the two shapes the decoder rejects) so from_bytes succeeds and decode is actually exercised — confirmed by coverage climbing into the decode/pack paths. Both the raw and Confluent-framed (magic byte + schema-id header) paths are driven. Added to the fruitful profile. 815k execs clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fuzzing the CSV decoder turned up two ways untrusted source bytes could
panic the decode operator (crashing source ingestion), and one silent
data-corruption path:
1. The error paths (wrong column count, invalid UTF-8) returned without
resetting output_cursor/ends_cursor, unlike the success path. csv_core
resets its own per-record output position, so after a malformed record
the next record read back stale, non-monotonic field offsets — slicing
output[ends[i]..ends[i+1]] then panicked (start > end), or silently
produced rows stitched from leftover buffer bytes. One bad row corrupted
the rest of the source. Fixed by clearing the buffers after every
completed record, success or error.
2. The whole record's concatenated field bytes were validated as UTF-8 in
one shot, then sliced as a &str at each field boundary. A delimiter can
sit between two bytes that, with the delimiter removed, form one
multi-byte character: the concatenation validates, but a field boundary
lands mid-character, panicking the &str slice (and even when it didn't
panic, bytes were silently merged across fields). Fixed by validating
each field's bytes as UTF-8 independently; a field that isn't valid
UTF-8 on its own is now a decode error.
To make the decoder fuzzable without mz-storage's heavy dependency tree
(rocksdb/rdkafka/aws), the pure CsvDecoderState moves to mz-storage-types
alongside the CsvEncoding it consumes; mz-storage re-imports it. Adds
regression tests for both panics.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Decode untrusted CSV bytes into Rows via CsvDecoderState, the FORMAT CSV source decoder — first to touch external data, and previously unfuzzed (only Avro was). The first two input bytes pick the config (1..=4 columns, the delimiter, header vs. count); the rest is fed as the CSV stream and drained through decode exactly as the storage decode operator does, covering the buffer-growth, column-mismatch, invalid-UTF-8 and header paths. Added to the fruitful profile. This target found the two panics fixed in the previous commit; it now runs clean (10M+ execs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make the upsert `types` module and the v2 value-encoding helpers (`upsert_value_to_row`/`datum_seq_to_upsert_value`) reachable from the fuzz crate by widening `pub(crate)`/`mod` to `#[doc(hidden)] pub`. These are exposed only so the storage fuzz crate (a separate workspace) can exercise the upsert consolidation and value encoding; they are not a stable public API. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a mz-storage fuzz crate covering both upsert implementations:
* upsert_consolidate (v1): the XOR/length/checksum accumulator in
StateValue::merge_update + ensure_decoded that collapses a key's
(value, diff) history into its current value. Builds well-formed
histories (canceling (prev,+1)(prev,-1) pairs + at most one (cur,+1))
with arbitrary values, shuffles and merges them, and asserts the
consolidated result is exactly the current value (or absent). Diffs are
Materialize-controlled, so we keep diff_sum in {0,1} rather than
tripping the intentional invalid-state invariant; the values are
untrusted source data.
* upsert_value_roundtrip_v2 (v2): upsert_value_to_row +
datum_seq_to_upsert_value, the tagged-Row encoding the v2 operator
stores in its ValRowSpine. Pushes the encoded Row through a real
DatumContainer and reads it back, asserting decode(encode(v)) == v for
both the Ok(Row) and Err(UpsertError) arms.
The mz-storage tree (rocksdb/rdkafka/aws) builds fine under cargo-fuzz.
Both targets run clean (27M and 55M execs) — the consolidation and
encoding are robust over arbitrary values. Added to the fruitful profile;
buf.yaml regenerated to exclude the new crate's target/.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`GRANT/REVOKE ... ON ALL <type>` displayed the object type by writing its singular form and appending `S`. That works for every type except network policies: `NETWORK POLICY` + `S` is `NETWORK POLICYS`, but the parser accepts the plural keyword `POLICIES` (mapping it to ObjectType:: NetworkPolicy). So `GRANT CREATE ON ALL POLICIES TO j` parsed, displayed as `GRANT CREATE ON ALL NETWORK POLICYS TO j`, and then failed to reparse — a parse/display asymmetry the parse_display_roundtrip fuzzer hit. Pluralize via a small helper that emits `POLICIES` for NetworkPolicy and `<type>S` otherwise. Adds a round-trip regression test. (The SHOW PRIVILEGES paths share the naive `+S`, but their parsers don't accept POLICIES at all, so that asymmetry isn't reachable via parse->display.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Every persisted state maintains the invariant that it has at least one rollup (State::latest_rollup .expect()s it). ProtoRollup::into_rust called latest_rollup while decoding, so an untrusted/corrupt rollup proto whose state carries no rollups panicked instead of being rejected — a rollup_proto_roundtrip fuzz crash (and a decode-time availability risk for a corrupt blob). Validate the invariant in from_proto and return TryFromProtoError:: InvalidPersistState when it is violated, before latest_rollup is reached (and before the decoded Rollup is later re-encoded). Adds a regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a `fuzzing` cargo feature (off by default, never in production builds) that compiles the in-memory upsert backend (`upsert::memory`, previously test-only) and a `FuzzUpsertParts` helper. `FuzzUpsertParts` owns the metrics/statistics plumbing `UpsertState` requires and hands out fresh, empty in-memory `UpsertState`s borrowing it — so a fuzz target can drive the real state machine without reconstructing (or leaking) the metrics each iteration. `StateValue::memory_size` is widened from test-only to the same gate, since the in-memory backend uses it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drive UpsertState::consolidate_chunk + multi_get over the in-memory backend, not just the isolated accumulator. A chunk of differential (key, value, diff) updates spanning several keys is consolidated and the snapshot completed, then each key is read back. The diffs are Materialize-controlled, so per key we generate a well-formed history (canceling (prev,+1)(prev,-1) pairs plus at most one (cur,+1)) with arbitrary untrusted values, shuffle the whole multi-key chunk, and assert each key reads back exactly its current value (or absent): the pairs must XOR/sum away across the chunk regardless of order. Registers the new mz-storage fuzz crate (enables the `fuzzing` feature) and adds it to FUZZ_CRATES + the fruitful profile. 1.2M execs clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The JSON *source* decoder is just Jsonb::from_slice (covered by repr::jsonb_from_slice, which also round-trips). The uncovered JSON logic is the output side: mz_interchange::json::encode_datums_as_json renders a Row's datums as JSON for a FORMAT JSON sink, with real per-type formatting (a JSON Number can't be NaN/Infinity; numerics, intervals, timestamps, dates, bytes each have their own rendering). Generate a random column schema and a matching well-typed Row — values are arbitrary (source data), including the NaN/Infinity floats f64::arbitrary produces — then encode and serialize it; neither step may panic. Added to the fruitful profile. 23M execs clean (the encoder handles NaN/Infinity and the other edge values). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The nightly slice (~85min, sharded across two 16-core machines) barely scratches the deeper bugs — fuzzing rewards uninterrupted depth. Move it to the release-qualification pipeline as a single 24h soak on the largest Hetzner agent (hetzner-x86-64-dedi-48cpu-192gb, 48 cores / 192 GB). The 48 cores host every `fruitful` target in one slot-wave (one fork worker each), and --max-seconds is clamped down to whatever the wall budget leaves after the build, so each target fuzzes for ~23h instead of ~60min. The large memory matters: ~40 ASAN fuzz processes at once. --corpus-sync still carries the minimized corpus between runs so coverage accumulates across releases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
twos_complement_be_to_numeric_inner read input[0] without first checking that the slice is non-empty, panicking with "index out of bounds: the len is 0 but the index is 0" on an empty byte string. That input is reachable from the wire: an Avro `decimal` logical type whose unscaled value is encoded as zero-length `bytes` flows straight from a Kafka message body through AvroFlatDecoder into this function, so a single crafted (or buggy-producer) message crashes Avro source decoding — an availability bug for ingestion. Our own encoder never emits an empty byte string (zero is the single byte 0x00), and Java's Avro decoder likewise rejects empty input (BigInteger throws "Zero length BigInteger"), so the right behaviour is a clean decode error, not a silent zero. The caller already maps the Err to a DecodeError, so the malformed row now surfaces as a decode error instead of aborting the worker. Found by the new avro_decode_fuzzed_schema fuzz target. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The existing avro_decode_to_row target decodes against one fixed reader schema, so it only ever walks one set of AvroFlatDecoder paths. The decoder is schema-directed, and the bug-prone paths are the schema-dependent ones a single schema barely touches. avro_decode_fuzzed_schema generates a top-level record schema spanning everything validate_schema_2 accepts — decimal (bytes- and fixed-backed), fixed, enum, uuid, the logical date/timestamp types, and nested records/arrays/maps/unions — builds a Decoder from it, and decodes arbitrary remaining bytes as the message body. This immediately found the empty-unscaled-bytes decimal panic fixed in the previous commit. Also two strconv decoders for untrusted text (COPY FROM, text-format wire parameters): strconv_parse_bytes exercises the hand-written bytea octal/hex escape parser (the byte-at-a-time escape-decoding shape that hid the CSV bug), and strconv_parse_uuid is a regression guard for the uuid-crate error-construction panic that parse_uuid already pre-screens against. Both clean over tens of millions of execs. All three added to the fruitful profile. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Now that fuzzing runs only as the 24h release-qualification soak with --profile=fruitful, the profile is the entire CI fuzz scope, so its cores should go where bugs still hide. Drop eight targets that fuzz clean round after round over well-tested, stable code and have saturated: the arithmetic/range oracles (interval_arith, numeric_arith, range_ops), the aggregate-decomposition laws (agg_decompose), the internal Row<->proto round-trip (row_proto_roundtrip, still covered by row_arrow/row_codec), the AT TIME ZONE math (timezone_convert), and the simple jsonb access operators (jsonb_get, jsonb_path). They still exist and can be run by name or re-added if the code under them changes. Kept despite no recent finds: the untrusted-input parsers/decoders (unbounded adversarial input) and the optimizer/upsert/persist targets (regression insurance on actively-developed, catastrophic-if-wrong code). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 'auto' job sizer capped concurrency at the core count and queued the rest. That is fine for a short run, but the release-qualification run gives each target a ~23h --max-seconds: the first `cores` targets would consume the entire wall budget and the queued ones would never start. With the suite already near the agent's core count (and the full profile well over it), that means whole targets silently never run. Launch every target at once instead (one fork worker each, oversubscribing the CPU). Fork workers tolerate oversubscription and the OS time-slices, so every target makes progress — strictly better than some never running. A caller who needs to bound concurrency can still pass --max-parallel. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two more schema/type-directed untrusted-input surfaces, following the lesson from avro_decode_fuzzed_schema (fuzz the schema, not just the body): protobuf_decode_fuzzed_schema generates a FileDescriptorSet — the user-supplied USING SCHEMA blob of a FORMAT PROTOBUF source — with messages that reference each other freely (so cycles arise), enums, and the full scalar set, then runs DecodedDescriptors::from_bytes + decode. The fixed-descriptor target never exercises the hand-written derive_inner_type validator (recursion detection, the Kind mapping) with anything but one well-formed schema. avro_schema_parse drives mz_avro::Schema::from_str over generated schema JSON that can nest past MAX_SCHEMA_DEPTH (so the stack-overflow guard must fire cleanly) and re-references already-defined names (recursive definitions), with logical types — breadth that schema_resolve and the decode targets never generate. This is the hostile-schema-registry surface. Both clean over a smoke run; added to the fruitful profile. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR MaterializeInc#36764 made sql-pretty print role passwords losslessly via per-statement doc printers (the AstDisplay redaction stays as the global safety net for logs/catalog; a pretty-printer that round-trips user SQL must keep the secret). But DECLARE and PREPARE wrap an inner statement and fell through to the generic doc_display fallback, which prints the whole wrapper via AstDisplay -- so a secret in the inner statement was redacted again: DECLARE c CURSOR FOR ALTER ROLE adb PASSWORD '2' printed the password as '<REDACTED>', which then reparses to the string "<REDACTED>" instead of "2", changing the AST. Give DECLARE/PREPARE their own doc printers that recurse into the inner statement's doc, so its secrets are printed by the lossless path. This needs the inner statement's concrete type, so to_pretty/to_doc now bound NestedStatement = Statement<Raw> -- satisfied by both AstInfo impls (Raw and Aug), which is exactly the existing invariant. Found by the parse_pretty_roundtrip fuzzer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Decoding an Avro array/map reads a per-block element count off the wire and then builds a Vec of that many elements. The count was only checked against MAX_BLOCK_LEN (16M); nothing tied it to how much input was actually left. When the elements consume ~no bytes each (an empty record, a null), a tiny message can claim a 16M-element block and drive an unbounded allocation: a 163-byte input grew the decoder to >4GB (a single 576MB = 16M * size_of::<Value>() allocation, repeated across fields), OOM-killing the decode. A block can't legitimately hold more elements than there are bytes left to decode them from, so add a cheap `Skip::remaining_input` hint (known for the slice/cursor readers the Kafka path uses, None for opaque streams) and reject any array/map block whose claimed length exceeds it. Normal data has at least one byte per element, so the bound never trips; only the small-message-huge- count amplification is rejected, as a clean decode error. Found by the avro_decode_fuzzed_schema fuzz target. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
My earlier guard rejected any rollup proto whose state had no rollups, but that was too broad: it broke applier_version_state, which decodes a freshly initialized state that legitimately has no rollups yet. latest_rollup -- the actual panic site -- is only reached when the proto carries diffs (to validate the diff bounds against it), so move the check there. A rollup-less state without diffs decodes again, while a proto that has diffs but no rollups is still rejected instead of panicking. The regression test is updated to build the diffs-but-no-rollups case it now guards. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two build/CI fixes in mz-storage: - On macOS, objc2's `Retained<T>: Deref<Target = T>` lets the trait solver chase an unbounded `Retained<Retained<...>>` chain while resolving `IntoIterator` for the closures passed to timely's `Stream::inspect`, overflowing the default recursion limit (128). Raise it to 256, as several other crates already do for the same reason. - Exposing `upsert_value_to_row`/`datum_seq_to_upsert_value` as plain `pub` for the fuzz crate made rustdoc document the `upsert_continual_feedback_v2` module, surfacing pre-existing module-doc intra-doc links to the private `UpsertDiff`/`drain_sealed_input` items (`-D rustdoc::private-intra-doc-links`). Add the `#[doc(hidden)]` those fuzz-only helpers were always meant to carry, and demote the two private-item links in the module docs to plain code spans. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A multi-line assert added alongside the DECLARE/PREPARE secret test wasn't canonically formatted, failing `bin/fmt --check`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`.inspect` is ambiguous between timely's `Inspect::inspect` and std's `Iterator::inspect`. Method resolution probes whether the *receiver* is an `Iterator` (i.e. `IntoIterator`), and when the receiver's element type is still an inference variable that probe unifies the variable against objc2's `&Retained<T>: IntoIterator where &T: IntoIterator` blanket impl, building an endless `Retained<Retained<...>>` chain. objc2 is only in the graph on macOS, so the solver overflows there while Linux compiles fine. recursion_limit can't help (the chain is unbounded, not deep), and annotating the closure argument or the expression result doesn't either -- the probe runs on the receiver during method resolution, before those constrain anything. The fix is to bind the receiver to its concrete type in a preceding `let`, so the `IntoIterator` probe sees a concrete (plainly non-iterable) type and terminates. The types come straight from `write_batches`' return and `SourceRender::render`, so a wrong one fails the Linux build -- `cargo check` validates them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Inspired by #36764 and Antithesis PoC
See individual commits