Skip to content

tests: Introduce new fuzz tests & fix bugs found#36772

Draft
def- wants to merge 145 commits into
MaterializeInc:mainfrom
def-:pr-fuzz2
Draft

tests: Introduce new fuzz tests & fix bugs found#36772
def- wants to merge 145 commits into
MaterializeInc:mainfrom
def-:pr-fuzz2

Conversation

@def-
Copy link
Copy Markdown
Contributor

@def- def- commented May 28, 2026

Inspired by #36764 and Antithesis PoC

See individual commits

@def- def- changed the title tests: Introduce new fuzz tests tests: Introduce new fuzz tests & fix bugs found May 29, 2026
@def- def- force-pushed the pr-fuzz2 branch 14 times, most recently from b3c8005 to 30b37cb Compare June 3, 2026 10:22
def- added 14 commits June 4, 2026 00:15
`X'...'` content is allowed to contain `'` (escaped as `''` by the
lexer), but the printer was emitting it verbatim — a value with an
embedded quote closed the literal prematurely and produced unparseable
output. Escape on the way out, mirroring `Value::String`.
The `.` token has very high precedence and both the lexer and parser
greedily extend adjacent tokens: `1.x` tokenizes the number `1.` and
leaves `x` as an alias, and `'a'::T.x` consumes `T.x` as a qualified
type name. So a receiver must look atomic on the way out — wrapped in
delimiters or self-terminating — or the dot reattaches to the receiver
on reparse and produces a different AST.

Add a `write_dot_receiver` helper that parenthesizes anything outside
a whitelist of atom-like exprs, and use it from `FieldAccess` and
`WildcardAccess` display.
…r keywords

Names like `position`, `extract`, `trim`, `substring`, `normalize`,
`map`, `array`, `nullif`, `exists`, `row`, `coalesce`, `greatest`,
`least` reach a special parser dispatch when followed by `(` —
`POSITION(<expr> IN <expr>)`, `MAP[K => V]`, etc. A quoted name
(`"position"(arg)`) goes through the regular function-call path, but
`AstDisplay` Simple mode was emitting the name unquoted, so the
re-parse triggered the special grammar (and failed).

Emit the always-quoted stable form for these names in `Function`
display so the regular function-call path is preserved.
`<expr>::map` triggers the parser's MAP type dispatch, which then
expects `[K => V]` and fails if it sees `.` or other syntax. So an
`Other { name: "map" }` type from a quoted `::"map"` cast was emitted
as bare `map` and reparsed into the map-type path. Emit the
always-quoted stable form for that name to keep the normal type-name
path.
Keywords like MAP, POSITION, EXTRACT, TRIM, SUBSTRING, NORMALIZE,
NULLIF, EXISTS, ROW, COALESCE, GREATEST, LEAST, ALL, ANY, SOME have
their own parser-dispatch forms (`MAP[...]`, `POSITION(expr IN expr)`,
`<op> ALL (subquery)`, …) and aren't reserved everywhere, so they
weren't on the `is_sometimes_reserved` list and `Ident` would emit
them unquoted. But unquoted in expression position those names re-
trigger the special grammar at parse time.

Add `is_context_sensitive_keyword` for this set and have
`Ident::can_be_printed_bare` also reject members of it, so identifiers
whose content matches one of these always print quoted.
`<left> <op> ANY/ALL (...)` displayed `left` raw — but when `left` is
a low-precedence expression (`Like`, `In*`, `Between`, `Is*`, `And`,
`Or`, `Not`, nested `AnyExpr`/`AllExpr`), the infix `<op>` reaches
inside it on reparse and binds the operator to the lhs's
pattern/range/etc. instead of to the lhs as a whole, producing a
different AST.

Add a `write_quantified_left` helper that parenthesizes these cases
and leaves atom-like lhs (incl. plain `Op`, which has its own
precedence handling) unwrapped.
`Decimal::from_packed_bcd` calls the C function `decPackedToNumber`,
which segfaults on empty bcd input. Reachable from untrusted proto
bytes (anyone able to send a `ProtoRow` could crash the process via a
`ProtoNumeric { bcd: [] }` datum). Reject the empty case before
descending into the FFI.
`push_range_with` returned a `Result`, but the proto decode path
unconditionally `.expect(...)`ed it — meaning any `ProtoRange` that
`push_range_with` rejects (e.g. lower > upper from an attacker-crafted
proto) panicked the process. Propagate the error to the caller via
`?`, matching the rest of the proto decode path.
`<CheckedTimestamp<_> as RustType<ProtoNaiveDateTime>>::from_proto`
constructed the struct directly (`Self { t: ... }`), bypassing the
range validation that `from_timestamplike` enforces. Out-of-range
values pushed into a `Row` cleanly, but `read_datum` then called
`from_timestamplike(...).expect(...)` while iterating and panicked —
reachable from untrusted proto bytes.

Go through `from_timestamplike` in `from_proto` so the value is
rejected at decode time.
Same shape as the `CheckedTimestamp` fix: `Date::from_proto`
constructed `Date { days: proto.days }` directly, bypassing the range
validation that `from_pg_epoch` enforces. Out-of-range days pushed
into a `Row` cleanly, then `read_datum` panicked on
`Date::from_pg_epoch(days).expect(...)` while iterating.

Go through `from_pg_epoch` in `from_proto` so the value is rejected
at decode time.
`SqlScalarType::{List,Record,Map}` from `ProtoScalarType` called
`x.custom_id.map(|id| id.into_rust().unwrap())` — a malformed
`ProtoCatalogItemId` inside any of those three variants panicked the
process. Reachable from untrusted proto bytes (an attacker-crafted
`ProtoRow` containing a list/record/map value with a bad custom_id).
Propagate via `transpose()?` instead.
The non-migration branch zipped `proto.names` and `proto.metadata`
via `zip_eq`, which panics on length mismatch — reachable from
untrusted proto bytes. Check the lengths explicitly and return
`InvalidFieldError`.
…r display

The earlier change made `Ident::can_be_printed_bare` reject members of
`is_context_sensitive_keyword` (MAP, POSITION, EXTRACT, ALL, ANY, …)
so that round-trip through sql-pretty preserved them. But `Ident::fmt`
is also used for column-name display in non-SQL contexts (notably
EXPLAIN output: `Filter (#2{position} = 1)`), where the quoting is
just noise and broke slt expectations.

Revert the global change. The fuzz targets that exercised this round
trip get a narrow carve-out (skip on the `Expected left square
bracket` / `Expected left parenthesis` / `Expected IN, found ...`
reparse errors that come from a context-sensitive keyword landing in a
position the parser dispatches on).
`PostgresKeyDesc::cols` is `Vec<u16>` but the proto carries it as
`Vec<u32>`; the decode path did `c.try_into().expect("values roundtrip")`,
which panicked on any value above 65535 — reachable from untrusted
proto bytes (the fuzz target found this from many distinct inputs).
Propagate the conversion error via `TryFromProtoError` instead.

Also fixes two build errors found while bringing up new fuzz crates:

- mysql-util fuzz target was importing via `mz_mysql_util::desc::*`,
  but `desc` is a private module; use the top-level re-exports.
- pgwire's `Codec` was a fully-private struct, blocking the
  `fuzz_exports` re-export. Make it `pub(crate)` so the same-crate
  `pub use` in lib.rs works under `#[cfg(feature = "fuzz")]`.
def- and others added 29 commits June 4, 2026 07:28
A MapFilterProject is the map/filter/project pipeline that runs in
essentially every dataflow operator, and optimize() fuses/reorders maps
and predicates, drops unused maps, and canonicalizes the projection — a
miscompile silently corrupts query results. The new mfp_optimize target
builds a random well-typed MFP over an int4/bool schema and checks a
one-directional oracle on a batch of input rows: every row the original
plan passes through cleanly with output `out` must, under the optimized
plan, pass through with the byte-identical `out` (optimize may legitimately
drop errors or already-rejected rows, so those are not asserted). Added to
the fruitful profile. 523k execs clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The string matcher (`is_match_subpatterns`, used for patterns with up to
MAX_SUBPATTERNS subpatterns) resolves each `%` by searching for the
following suffix and back-tracking over every candidate position. With
two or more `%` the back-tracking nests, so matching an adversarial
pattern like `%a%a%a` against a long run of `a`s costs time proportional
to len(text)^(number of %) — a fuzzer (like_pattern_escape) hit a single
input that ran for 1667s.

The regex engine matches these patterns in linear time with no
back-tracking, so route any pattern with more than one `many` (`%`)
subpattern to it, alongside the existing case-insensitive and
too-many-subpatterns cases. The string matcher now only handles patterns
with at most one `%`, where it is near-linear. The regex and string
matchers produce identical results, so this changes only performance.

Found by fuzzing. Adds a regression test covering the pathological shape
(instant after the fix) and confirming the routing still matches
correctly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Materialize's incremental reduce never re-aggregates a group from
scratch; it maintains aggregates by combining partial results. The new
agg_decompose target checks the laws that makes sound, over random
multisets of nullable int4/int8/bool datums:

  * permutation invariance for every order-insensitive aggregate (sum,
    count, min, max, any, all) — accumulable maintenance applies updates
    in arrival order, so order-dependence is a correctness bug;
  * hierarchical re-aggregation for min/max/any/all (agg(whole) ==
    agg(map(agg, non-empty chunks))), exactly how the bucketed
    hierarchical reduce computes min/max;
  * additive decomposition for count (count(whole) == sum of per-chunk
    counts).

Empty chunks are skipped when re-aggregating: an empty chunk aggregates
to null, and for the three-valued any/all a stray null would corrupt an
otherwise false/true result. Added to the fruitful profile. 4.3M execs
clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extend the transform result-equivalence target from FoldConstants /
CanonicalizeMfp / UnionBranchCancellation to also drive the structural
fusions (Filter, Project, Map, Negate, Union) and ProjectionExtraction,
applied pre-order across the whole plan. Each must leave both the output
shape and the consolidated (row, diff) multiset unchanged on the random
constant-rooted plans the generator builds. 354k execs clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_equiv)

Where mir_relation_transforms checks individual transforms in isolation,
this runs the entire Optimizer::logical_optimizer pipeline — every
transform, in the real order, with all their interactions — and checks
that the optimized plan produces the same (row, diff) multiset as the
input. The generator builds constant-rooted plans using the bug-rich
operators that target omits: Join (with equi-join equivalences), Reduce
(group keys + min/max/sum/any/all/count), TopK (totally-ordered so
LIMIT/OFFSET is deterministic), Threshold, Union, Negate, Distinct.

Because every leaf is constant and FoldConstants evaluates all of these,
both input and optimized output fold to actual rows. The oracle is
conservative: it only asserts when both fold, and a Typecheck error from
the optimizer is a skip, so a surviving divergence or an optimizer panic
is a genuine finding. Added to the fruitful profile. 372k execs clean
(coverage still climbing past 10.8k features at smoke end).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
buf walks src/ and was choking on .proto files that build-script deps
(protobuf-native) extract into the fuzz crates' standalone target/ trees,
e.g. transform/fuzz/target/.../protobuf-native/testdata/test.proto. The
exclude list was hand-maintained in buf.yaml.template and had gone stale,
missing interchange, persist-client, pgcopy, pgrepr, pgtz, and transform.

Generate the <crate>/fuzz/target excludes from src/*/fuzz in
generate-buf-config.py so every fuzz crate (and any future one) is covered
automatically. Regenerated src/buf.yaml now excludes all 16 fuzz targets.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
full_optimizer_equiv roots plans at Constants, so the optimizer folds
everything away before the interesting relational planning — join
ordering/implementation, predicate and projection pushdown through Gets,
key inference — ever runs. This target roots plans at Get nodes (opaque
relations, exactly what the optimizer plans against for a real query) so
that planning actually happens, while keeping a ground-truth oracle by
binding each Get to a fuzzed constant collection in a side table.

Oracle: collapse(substitute(plan)) folds the input with its data inlined;
optimize(plan) runs the full logical optimizer with Gets still symbolic;
collapse(substitute(optimized)) folds the optimized plan with the same
data. The multisets must match. substitute replaces only our global Gets;
the optimizer's own Let/local-Get bindings (CSE) are collapsed by a
FoldConstants + NormalizeLets fixpoint. Conservative (asserts only when
both sides fold; optimizer/Typecheck errors skip), so a divergence or a
panic is a real finding. Gets are declared fully nullable with no keys so
the optimizer makes no assumptions the data could violate. Added to the
fruitful profile. 230k execs clean; coverage (12.3k features) exceeds the
constant-rooted target, confirming it reaches new planning paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Materialize ingests attacker-controllable bytes from Kafka in several
formats, but only Avro was fuzzed. This adds the FORMAT PROTOBUF decoder
(mz_interchange::protobuf::Decoder): untrusted message bytes -> Row. A
panic reachable from the wire is a source-ingestion availability bug.

The message descriptor is built in-process with prost-types (no protoc,
no files): one message exercising the scalar, bytes, enum, repeated
(list) and nested-message (record) paths of pack_message, plus a
scalar-only message it references. It is deliberately non-recursive and
map-free (the two shapes the decoder rejects) so from_bytes succeeds and
decode is actually exercised — confirmed by coverage climbing into the
decode/pack paths. Both the raw and Confluent-framed (magic byte +
schema-id header) paths are driven. Added to the fruitful profile.
815k execs clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fuzzing the CSV decoder turned up two ways untrusted source bytes could
panic the decode operator (crashing source ingestion), and one silent
data-corruption path:

  1. The error paths (wrong column count, invalid UTF-8) returned without
     resetting output_cursor/ends_cursor, unlike the success path. csv_core
     resets its own per-record output position, so after a malformed record
     the next record read back stale, non-monotonic field offsets — slicing
     output[ends[i]..ends[i+1]] then panicked (start > end), or silently
     produced rows stitched from leftover buffer bytes. One bad row corrupted
     the rest of the source. Fixed by clearing the buffers after every
     completed record, success or error.

  2. The whole record's concatenated field bytes were validated as UTF-8 in
     one shot, then sliced as a &str at each field boundary. A delimiter can
     sit between two bytes that, with the delimiter removed, form one
     multi-byte character: the concatenation validates, but a field boundary
     lands mid-character, panicking the &str slice (and even when it didn't
     panic, bytes were silently merged across fields). Fixed by validating
     each field's bytes as UTF-8 independently; a field that isn't valid
     UTF-8 on its own is now a decode error.

To make the decoder fuzzable without mz-storage's heavy dependency tree
(rocksdb/rdkafka/aws), the pure CsvDecoderState moves to mz-storage-types
alongside the CsvEncoding it consumes; mz-storage re-imports it. Adds
regression tests for both panics.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Decode untrusted CSV bytes into Rows via CsvDecoderState, the FORMAT CSV
source decoder — first to touch external data, and previously unfuzzed
(only Avro was). The first two input bytes pick the config (1..=4 columns,
the delimiter, header vs. count); the rest is fed as the CSV stream and
drained through decode exactly as the storage decode operator does,
covering the buffer-growth, column-mismatch, invalid-UTF-8 and header
paths. Added to the fruitful profile.

This target found the two panics fixed in the previous commit; it now runs
clean (10M+ execs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make the upsert `types` module and the v2 value-encoding helpers
(`upsert_value_to_row`/`datum_seq_to_upsert_value`) reachable from the
fuzz crate by widening `pub(crate)`/`mod` to `#[doc(hidden)] pub`. These
are exposed only so the storage fuzz crate (a separate workspace) can
exercise the upsert consolidation and value encoding; they are not a
stable public API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a mz-storage fuzz crate covering both upsert implementations:

  * upsert_consolidate (v1): the XOR/length/checksum accumulator in
    StateValue::merge_update + ensure_decoded that collapses a key's
    (value, diff) history into its current value. Builds well-formed
    histories (canceling (prev,+1)(prev,-1) pairs + at most one (cur,+1))
    with arbitrary values, shuffles and merges them, and asserts the
    consolidated result is exactly the current value (or absent). Diffs are
    Materialize-controlled, so we keep diff_sum in {0,1} rather than
    tripping the intentional invalid-state invariant; the values are
    untrusted source data.

  * upsert_value_roundtrip_v2 (v2): upsert_value_to_row +
    datum_seq_to_upsert_value, the tagged-Row encoding the v2 operator
    stores in its ValRowSpine. Pushes the encoded Row through a real
    DatumContainer and reads it back, asserting decode(encode(v)) == v for
    both the Ok(Row) and Err(UpsertError) arms.

The mz-storage tree (rocksdb/rdkafka/aws) builds fine under cargo-fuzz.
Both targets run clean (27M and 55M execs) — the consolidation and
encoding are robust over arbitrary values. Added to the fruitful profile;
buf.yaml regenerated to exclude the new crate's target/.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`GRANT/REVOKE ... ON ALL <type>` displayed the object type by writing its
singular form and appending `S`. That works for every type except network
policies: `NETWORK POLICY` + `S` is `NETWORK POLICYS`, but the parser
accepts the plural keyword `POLICIES` (mapping it to ObjectType::
NetworkPolicy). So `GRANT CREATE ON ALL POLICIES TO j` parsed, displayed as
`GRANT CREATE ON ALL NETWORK POLICYS TO j`, and then failed to reparse —
a parse/display asymmetry the parse_display_roundtrip fuzzer hit.

Pluralize via a small helper that emits `POLICIES` for NetworkPolicy and
`<type>S` otherwise. Adds a round-trip regression test. (The SHOW
PRIVILEGES paths share the naive `+S`, but their parsers don't accept
POLICIES at all, so that asymmetry isn't reachable via parse->display.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Every persisted state maintains the invariant that it has at least one
rollup (State::latest_rollup .expect()s it). ProtoRollup::into_rust called
latest_rollup while decoding, so an untrusted/corrupt rollup proto whose
state carries no rollups panicked instead of being rejected — a
rollup_proto_roundtrip fuzz crash (and a decode-time availability risk for
a corrupt blob).

Validate the invariant in from_proto and return TryFromProtoError::
InvalidPersistState when it is violated, before latest_rollup is reached
(and before the decoded Rollup is later re-encoded). Adds a regression
test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a `fuzzing` cargo feature (off by default, never in production builds)
that compiles the in-memory upsert backend (`upsert::memory`, previously
test-only) and a `FuzzUpsertParts` helper. `FuzzUpsertParts` owns the
metrics/statistics plumbing `UpsertState` requires and hands out fresh,
empty in-memory `UpsertState`s borrowing it — so a fuzz target can drive
the real state machine without reconstructing (or leaking) the metrics each
iteration. `StateValue::memory_size` is widened from test-only to the same
gate, since the in-memory backend uses it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drive UpsertState::consolidate_chunk + multi_get over the in-memory backend,
not just the isolated accumulator. A chunk of differential (key, value, diff)
updates spanning several keys is consolidated and the snapshot completed,
then each key is read back. The diffs are Materialize-controlled, so per key
we generate a well-formed history (canceling (prev,+1)(prev,-1) pairs plus at
most one (cur,+1)) with arbitrary untrusted values, shuffle the whole
multi-key chunk, and assert each key reads back exactly its current value (or
absent): the pairs must XOR/sum away across the chunk regardless of order.

Registers the new mz-storage fuzz crate (enables the `fuzzing` feature) and
adds it to FUZZ_CRATES + the fruitful profile. 1.2M execs clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The JSON *source* decoder is just Jsonb::from_slice (covered by
repr::jsonb_from_slice, which also round-trips). The uncovered JSON logic
is the output side: mz_interchange::json::encode_datums_as_json renders a
Row's datums as JSON for a FORMAT JSON sink, with real per-type formatting
(a JSON Number can't be NaN/Infinity; numerics, intervals, timestamps,
dates, bytes each have their own rendering).

Generate a random column schema and a matching well-typed Row — values are
arbitrary (source data), including the NaN/Infinity floats f64::arbitrary
produces — then encode and serialize it; neither step may panic. Added to
the fruitful profile. 23M execs clean (the encoder handles NaN/Infinity and
the other edge values).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The nightly slice (~85min, sharded across two 16-core machines) barely
scratches the deeper bugs — fuzzing rewards uninterrupted depth. Move it to
the release-qualification pipeline as a single 24h soak on the largest
Hetzner agent (hetzner-x86-64-dedi-48cpu-192gb, 48 cores / 192 GB).

The 48 cores host every `fruitful` target in one slot-wave (one fork worker
each), and --max-seconds is clamped down to whatever the wall budget leaves
after the build, so each target fuzzes for ~23h instead of ~60min. The large
memory matters: ~40 ASAN fuzz processes at once. --corpus-sync still carries
the minimized corpus between runs so coverage accumulates across releases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
twos_complement_be_to_numeric_inner read input[0] without first checking
that the slice is non-empty, panicking with "index out of bounds: the len
is 0 but the index is 0" on an empty byte string.

That input is reachable from the wire: an Avro `decimal` logical type whose
unscaled value is encoded as zero-length `bytes` flows straight from a Kafka
message body through AvroFlatDecoder into this function, so a single crafted
(or buggy-producer) message crashes Avro source decoding — an availability
bug for ingestion. Our own encoder never emits an empty byte string (zero is
the single byte 0x00), and Java's Avro decoder likewise rejects empty input
(BigInteger throws "Zero length BigInteger"), so the right behaviour is a
clean decode error, not a silent zero. The caller already maps the Err to a
DecodeError, so the malformed row now surfaces as a decode error instead of
aborting the worker.

Found by the new avro_decode_fuzzed_schema fuzz target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The existing avro_decode_to_row target decodes against one fixed reader
schema, so it only ever walks one set of AvroFlatDecoder paths. The decoder
is schema-directed, and the bug-prone paths are the schema-dependent ones a
single schema barely touches. avro_decode_fuzzed_schema generates a top-level
record schema spanning everything validate_schema_2 accepts — decimal (bytes-
and fixed-backed), fixed, enum, uuid, the logical date/timestamp types, and
nested records/arrays/maps/unions — builds a Decoder from it, and decodes
arbitrary remaining bytes as the message body. This immediately found the
empty-unscaled-bytes decimal panic fixed in the previous commit.

Also two strconv decoders for untrusted text (COPY FROM, text-format wire
parameters): strconv_parse_bytes exercises the hand-written bytea octal/hex
escape parser (the byte-at-a-time escape-decoding shape that hid the CSV
bug), and strconv_parse_uuid is a regression guard for the uuid-crate
error-construction panic that parse_uuid already pre-screens against. Both
clean over tens of millions of execs. All three added to the fruitful
profile.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Now that fuzzing runs only as the 24h release-qualification soak with
--profile=fruitful, the profile is the entire CI fuzz scope, so its cores
should go where bugs still hide. Drop eight targets that fuzz clean round
after round over well-tested, stable code and have saturated: the
arithmetic/range oracles (interval_arith, numeric_arith, range_ops), the
aggregate-decomposition laws (agg_decompose), the internal Row<->proto
round-trip (row_proto_roundtrip, still covered by row_arrow/row_codec), the
AT TIME ZONE math (timezone_convert), and the simple jsonb access operators
(jsonb_get, jsonb_path).

They still exist and can be run by name or re-added if the code under them
changes. Kept despite no recent finds: the untrusted-input parsers/decoders
(unbounded adversarial input) and the optimizer/upsert/persist targets
(regression insurance on actively-developed, catastrophic-if-wrong code).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 'auto' job sizer capped concurrency at the core count and queued the
rest. That is fine for a short run, but the release-qualification run gives
each target a ~23h --max-seconds: the first `cores` targets would consume the
entire wall budget and the queued ones would never start. With the suite
already near the agent's core count (and the full profile well over it), that
means whole targets silently never run.

Launch every target at once instead (one fork worker each, oversubscribing
the CPU). Fork workers tolerate oversubscription and the OS time-slices, so
every target makes progress — strictly better than some never running. A
caller who needs to bound concurrency can still pass --max-parallel.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two more schema/type-directed untrusted-input surfaces, following the lesson
from avro_decode_fuzzed_schema (fuzz the schema, not just the body):

protobuf_decode_fuzzed_schema generates a FileDescriptorSet — the user-supplied
USING SCHEMA blob of a FORMAT PROTOBUF source — with messages that reference
each other freely (so cycles arise), enums, and the full scalar set, then runs
DecodedDescriptors::from_bytes + decode. The fixed-descriptor target never
exercises the hand-written derive_inner_type validator (recursion detection,
the Kind mapping) with anything but one well-formed schema.

avro_schema_parse drives mz_avro::Schema::from_str over generated schema JSON
that can nest past MAX_SCHEMA_DEPTH (so the stack-overflow guard must fire
cleanly) and re-references already-defined names (recursive definitions), with
logical types — breadth that schema_resolve and the decode targets never
generate. This is the hostile-schema-registry surface.

Both clean over a smoke run; added to the fruitful profile.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR MaterializeInc#36764 made sql-pretty print role passwords losslessly via per-statement
doc printers (the AstDisplay redaction stays as the global safety net for
logs/catalog; a pretty-printer that round-trips user SQL must keep the secret).
But DECLARE and PREPARE wrap an inner statement and fell through to the generic
doc_display fallback, which prints the whole wrapper via AstDisplay -- so a
secret in the inner statement was redacted again:

  DECLARE c CURSOR FOR ALTER ROLE adb PASSWORD '2'

printed the password as '<REDACTED>', which then reparses to the string
"<REDACTED>" instead of "2", changing the AST.

Give DECLARE/PREPARE their own doc printers that recurse into the inner
statement's doc, so its secrets are printed by the lossless path. This needs
the inner statement's concrete type, so to_pretty/to_doc now bound
NestedStatement = Statement<Raw> -- satisfied by both AstInfo impls (Raw and
Aug), which is exactly the existing invariant.

Found by the parse_pretty_roundtrip fuzzer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Decoding an Avro array/map reads a per-block element count off the wire and
then builds a Vec of that many elements. The count was only checked against
MAX_BLOCK_LEN (16M); nothing tied it to how much input was actually left. When
the elements consume ~no bytes each (an empty record, a null), a tiny message
can claim a 16M-element block and drive an unbounded allocation: a 163-byte
input grew the decoder to >4GB (a single 576MB = 16M * size_of::<Value>()
allocation, repeated across fields), OOM-killing the decode.

A block can't legitimately hold more elements than there are bytes left to
decode them from, so add a cheap `Skip::remaining_input` hint (known for the
slice/cursor readers the Kafka path uses, None for opaque streams) and reject
any array/map block whose claimed length exceeds it. Normal data has at least
one byte per element, so the bound never trips; only the small-message-huge-
count amplification is rejected, as a clean decode error.

Found by the avro_decode_fuzzed_schema fuzz target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
My earlier guard rejected any rollup proto whose state had no rollups, but
that was too broad: it broke applier_version_state, which decodes a freshly
initialized state that legitimately has no rollups yet. latest_rollup -- the
actual panic site -- is only reached when the proto carries diffs (to validate
the diff bounds against it), so move the check there. A rollup-less state
without diffs decodes again, while a proto that has diffs but no rollups is
still rejected instead of panicking. The regression test is updated to build
the diffs-but-no-rollups case it now guards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two build/CI fixes in mz-storage:

- On macOS, objc2's `Retained<T>: Deref<Target = T>` lets the trait solver
  chase an unbounded `Retained<Retained<...>>` chain while resolving
  `IntoIterator` for the closures passed to timely's `Stream::inspect`,
  overflowing the default recursion limit (128). Raise it to 256, as several
  other crates already do for the same reason.

- Exposing `upsert_value_to_row`/`datum_seq_to_upsert_value` as plain `pub`
  for the fuzz crate made rustdoc document the `upsert_continual_feedback_v2`
  module, surfacing pre-existing module-doc intra-doc links to the private
  `UpsertDiff`/`drain_sealed_input` items (`-D rustdoc::private-intra-doc-links`).
  Add the `#[doc(hidden)]` those fuzz-only helpers were always meant to carry,
  and demote the two private-item links in the module docs to plain code spans.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A multi-line assert added alongside the DECLARE/PREPARE secret test wasn't
canonically formatted, failing `bin/fmt --check`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`.inspect` is ambiguous between timely's `Inspect::inspect` and std's
`Iterator::inspect`. Method resolution probes whether the *receiver* is an
`Iterator` (i.e. `IntoIterator`), and when the receiver's element type is still
an inference variable that probe unifies the variable against objc2's
`&Retained<T>: IntoIterator where &T: IntoIterator` blanket impl, building an
endless `Retained<Retained<...>>` chain. objc2 is only in the graph on macOS,
so the solver overflows there while Linux compiles fine.

recursion_limit can't help (the chain is unbounded, not deep), and annotating
the closure argument or the expression result doesn't either -- the probe runs
on the receiver during method resolution, before those constrain anything. The
fix is to bind the receiver to its concrete type in a preceding `let`, so the
`IntoIterator` probe sees a concrete (plainly non-iterable) type and
terminates. The types come straight from `write_batches`' return and
`SourceRender::render`, so a wrong one fails the Linux build -- `cargo check`
validates them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant