feat(lazy): structural patching for decode+modify+encode#44
Conversation
Add lazy patch support to avoid materializing the entire root container when modifying a few fields. Instead of triggering full materialization on __newindex, we now record patches and splice the original buffer during encode. Changes: - Add qjson_cursor_field_bytes FFI function to get field value byte spans - Modify LazyObject.__newindex to record patches instead of materializing - Modify LazyObject.__index to check patches first - Add encode_with_patches for splice-based encoding - Add lazy_patch_spec.lua tests Performance: decode+modify+encode on 100KB payload - Before: ~30 μs/op (full materialization + walking encode) - After: ~32 μs/op (WIP - splice path needs optimization) See docs/lazy-patch-spec.md for detailed design.
There was a problem hiding this comment.
Pull request overview
Adds a "lazy patch" path so that mutating a few fields on a qjson.decode() proxy no longer materializes the whole root container. Writes are recorded in _patches/_deleted side tables, and qjson.encode() splices the original buffer (with new helper FFI qjson_cursor_field_bytes) instead of walking a materialized table.
Changes:
- New FFI
qjson_cursor_field_bytes(Rust + C header + LuaJIT cdef) returning the byte span of a field's value in the original buffer. - Rewrites
LazyObject.__newindex/__index/__pairsto record/read patches and deletions; addsencode_with_patchesandencode_lazy_object_walking_with_patchesto encode lazy objects with patches. - Adds
tests/lua/lazy_patch_spec.luaanddocs/lazy-patch-spec.md.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ffi.rs | Adds qjson_cursor_field_bytes returning a field value's byte range. |
| include/qjson.h | Declares the new FFI entry point. |
| lua/qjson.lua | Adds matching LuaJIT cdef for qjson_cursor_field_bytes. |
| lua/qjson/table.lua | Replaces materialize-on-write with patch recording; new patch-aware encode and iteration paths. |
| tests/lua/lazy_patch_spec.lua | Spec covering basic patch/read/delete/new-field/iteration/nesting/edge cases. |
| docs/lazy-patch-spec.md | Detailed design doc explaining problem, algorithm, edge cases, and benchmarks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if has_deleted or #new_fields > 0 then | ||
| -- Fall back to walking for complex cases | ||
| return encode_lazy_object_walking_with_patches(t, patches, deleted) | ||
| end | ||
|
|
||
| -- Sort replacements by start offset | ||
| table.sort(replacements, function(a, b) return a.start < b.start end) | ||
|
|
||
| -- Build output by splicing | ||
| local parts = {} | ||
| local pos = t._bs + 1 -- 1-based Lua index | ||
|
|
||
| for _, r in ipairs(replacements) do | ||
| -- Copy unchanged portion (convert 0-based to 1-based) | ||
| local r_start_1based = r.start + 1 | ||
| if r_start_1based > pos then | ||
| parts[#parts + 1] = buf:sub(pos, r_start_1based - 1) | ||
| end | ||
| -- Insert replacement | ||
| parts[#parts + 1] = r.value | ||
| pos = r.end_ + 1 -- end_ is exclusive, so +1 for 1-based | ||
| end | ||
|
|
||
| -- Copy remaining portion | ||
| if pos <= t._be then | ||
| parts[#parts + 1] = buf:sub(pos, t._be) | ||
| end | ||
|
|
||
| return table.concat(parts) | ||
| end | ||
|
|
||
| -- Walk a LazyObject with patches, handling deletions and new fields | ||
| local function encode_lazy_object_walking_with_patches(t, patches, deleted) |
There was a problem hiding this comment.
Fixed in 176558a — encode_lazy_object_walking_with_patches is now defined ahead of encode_with_patches. Verified: tests/lua/lazy_patch_spec.lua 26/26 pass; full Lua suite 112/112.
| -- Check if a lazy view has patches (but not necessarily dirty children) | ||
| local function has_patches(v) | ||
| local patches = rawget(v, "_patches") | ||
| return patches and #patches > 0 | ||
| end | ||
|
|
||
| -- Check if a lazy view has deletions | ||
| local function has_deletions(v) | ||
| local deleted = rawget(v, "_deleted") | ||
| if not deleted then return false end | ||
| for _ in pairs(deleted) do return true end | ||
| return false | ||
| end |
There was a problem hiding this comment.
Confirmed unused. grep -n "has_patches\|has_deletions" lua/qjson/table.lua only matches the definitions. Removing both.
| LazyObject.__newindex = function(t, k, v) | ||
| local contents = materialize_object_contents(t) | ||
| -- Snapshot user-key cache BEFORE nilling internals. | ||
| -- Use next() for raw iteration: pairs() invokes __pairs on lazy tables, | ||
| -- walking the full JSON via FFI instead of the Lua-side rawget cache. | ||
| local cache = {} | ||
| local ck, cv = next(t) | ||
| while ck ~= nil do | ||
| if not INTERNAL_KEYS[ck] then | ||
| cache[ck] = cv | ||
| end | ||
| ck, cv = next(t, ck) | ||
| -- Initialize patches table if needed | ||
| if not rawget(t, "_patches") then | ||
| rawset(t, "_patches", {}) | ||
| rawset(t, "_deleted", {}) | ||
| end | ||
| t._doc, t._cur_box, t._cur, t._bs, t._be = nil, nil, nil, nil, nil | ||
| setmetatable(t, nil) | ||
| for _, kv in ipairs(contents) do | ||
| rawset(t, kv[1], cache[kv[1]] or kv[2]) | ||
|
|
||
| local patches = rawget(t, "_patches") | ||
| local deleted = rawget(t, "_deleted") | ||
|
|
||
| if v == nil then | ||
| -- Mark key as deleted | ||
| deleted[k] = true | ||
| -- Remove from patches if previously patched | ||
| for i, p in ipairs(patches) do | ||
| if p.key == k then | ||
| table.remove(patches, i) | ||
| break | ||
| end | ||
| end | ||
| else | ||
| -- Encode the new value (we need both encoded and lua_value) | ||
| local encoded = encode(v) | ||
|
|
||
| -- Update or add patch | ||
| local found = false | ||
| for _, p in ipairs(patches) do | ||
| if p.key == k then | ||
| p.encoded_value = encoded | ||
| p.lua_value = v | ||
| found = true | ||
| break | ||
| end | ||
| end | ||
| if not found then | ||
| patches[#patches + 1] = { key = k, encoded_value = encoded, lua_value = v } | ||
| end | ||
|
|
||
| -- Remove from deleted if previously deleted | ||
| deleted[k] = nil | ||
| end | ||
| rawset(t, k, v) | ||
| end |
There was a problem hiding this comment.
Real bug. t[1] = "x" on a LazyObject records {key=1, ...}, then read_object_field returns nil for non-string keys (line 117) so the read is silently lost, and encode_with_patches errors on #p.key later. Fix: raise in LazyObject.__newindex for non-string keys — matches the existing encode_object policy at line 774.
| -- For each patch, find the field's byte range in the original buffer | ||
| for _, p in ipairs(patches) do | ||
| local rc = C.qjson_cursor_field_bytes(t._cur, p.key, #p.key, sz_a, sz_b) | ||
| if rc == QJSON_OK then | ||
| -- Existing field: replace value | ||
| replacements[#replacements + 1] = { | ||
| start = tonumber(sz_a[0]), | ||
| end_ = tonumber(sz_b[0]), | ||
| value = p.encoded_value, | ||
| } | ||
| end | ||
| -- If NOT_FOUND, it's a new field - handled separately below | ||
| end | ||
|
|
||
| -- Collect deleted field spans (we need to find the full field span including key) | ||
| -- For simplicity, we'll handle deletions by walking and skipping deleted keys | ||
| local has_deleted = false | ||
| for _ in pairs(deleted) do has_deleted = true; break end | ||
|
|
||
| -- If we have deletions or new fields, fall back to walking | ||
| -- (splicing deletions is complex due to comma handling) | ||
| local new_fields = {} | ||
| for _, p in ipairs(patches) do | ||
| local rc = C.qjson_cursor_field_bytes(t._cur, p.key, #p.key, sz_a, sz_b) | ||
| if rc == QJSON_NOT_FOUND then | ||
| new_fields[#new_fields + 1] = p | ||
| end | ||
| end | ||
|
|
||
| if has_deleted or #new_fields > 0 then | ||
| -- Fall back to walking for complex cases | ||
| return encode_lazy_object_walking_with_patches(t, patches, deleted) | ||
| end |
There was a problem hiding this comment.
Valid. Collapsing the two passes into one — classify each patch into replacements or new_fields in a single qjson_cursor_field_bytes call.
| -- Check patches first | ||
| local patches = rawget(self, "_patches") | ||
| if patches then | ||
| for _, p in ipairs(patches) do | ||
| if p.key == key then | ||
| return p.lua_value | ||
| end | ||
| end | ||
| end | ||
|
|
||
| -- Check deleted | ||
| local deleted = rawget(self, "_deleted") | ||
| if deleted and deleted[key] then | ||
| return nil | ||
| end |
There was a problem hiding this comment.
Declining for now. Typical patch counts in this workflow are 1–3 modified fields; a 3-entry linear scan is faster than maintaining a parallel hash table (the FFI cost from encode/qjson_cursor_field_bytes already dominates). The O(n·m) only matters when m grows large, which the lazy-patch design discourages (assigning many fields → user is better off materializing). Happy to revisit if a profile shows it becoming hot; adding to README "Roadmap / Deferred" as a noted optimization.
|
Note Currently processing new changes in this PR. This may take a few minutes, please wait... ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds lazy patch tracking to qjson lazy proxies, a new FFI cursor API to obtain field byte spans, patch-aware encoding paths (splicing or walking fallback), a design/spec doc, and comprehensive Lua tests validating read/write/iterate/encode behavior. ChangesLazy Patching Implementation
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can disable sequence diagrams in the walkthrough.Disable the |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (3)
src/ffi.rs (1)
840-867: 💤 Low valueConsider extracting the byte-range calculation into a shared helper.
The match block for computing
[start, end)from a cursor (lines 846-867) duplicates the logic inqjson_cursor_bytes(lines 738-759). Both check the lead byte, handle containers/strings by usingidx_end, and delegate scalars toscalar_byte_range.Extracting this into a helper like
cursor_byte_range(d: &Document, cur: Cursor) -> Result<(usize, usize), qjson_err>would reduce duplication and ensure consistency.♻️ Suggested helper extraction
/// Compute the byte range [start, end) for a cursor's value in the original buffer. unsafe fn cursor_byte_range(d: &Document<'_>, cur: Cursor) -> Result<(usize, usize), qjson_err> { let pos = d.indices[cur.idx_start as usize] as usize; let lead = match d.buf.get(pos) { Some(b) => *b, None => return Err(qjson_err::QJSON_PARSE_ERROR), }; match lead { b'{' | b'[' | b'"' => { let end = d.indices[cur.idx_end as usize] as usize; if end >= d.buf.len() { return Err(qjson_err::QJSON_PARSE_ERROR); } Ok((pos, end + 1)) } _ => scalar_byte_range(d, cur), } }Then both
qjson_cursor_bytesandqjson_cursor_field_bytescan call this helper.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/ffi.rs` around lines 840 - 867, The byte-range computation for a cursor is duplicated; extract it into a shared unsafe helper like cursor_byte_range(d: &Document<'_>, cur: Cursor) -> Result<(usize, usize), qjson_err> that performs the lead-byte check, handles containers/strings by using cur.idx_end and bounds-checking against d.buf.len(), and delegates scalars to scalar_byte_range; then replace the duplicated logic in qjson_cursor_bytes and qjson_cursor_field_bytes (or any other sites) to call cursor_byte_range and map its Ok/Err to the existing *value_start/*value_end and qjson_err return conventions.tests/lua/lazy_patch_spec.lua (1)
3-44: ⚡ Quick winMissing test case: deletion of non-existent field.
The test suite covers deleting existing fields and re-adding them, but doesn't test deleting a field that was never in the original JSON. This edge case should verify that deleting a non-existent field doesn't cause errors and doesn't affect encoding.
🧪 Proposed additional test
it("deletes a non-existent field without error", function() local t = qjson.decode('{"a":1}') t.b = nil -- b never existed assert.is_nil(t.b) assert.are.equal(1, t.a) local out = qjson.encode(t) local cjson = require("cjson") local parsed = cjson.decode(out) assert.is_nil(parsed.b) assert.are.equal(1, parsed.a) end)Also applies to: 72-97
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/lua/lazy_patch_spec.lua` around lines 3 - 44, Add a new test case named "deletes a non-existent field without error" to the Lazy Patch suite that decodes '{"a":1}' via qjson.decode, sets t.b = nil (where b was never present), asserts t.b is nil and t.a remains 1, then qjson.encode(t) and cjson.decode the output and assert parsed.b is nil and parsed.a equals 1; place this alongside the other "it" blocks so the behavior of qjson.decode/encode and the lazy-patch logic for deleting non-existent fields is validated.docs/lazy-patch-spec.md (1)
196-208: ⚡ Quick winThe pseudocode helper functions in the specification need clarification or mapping to actual implementation.
Lines 196 and 212 reference
find_field_value_span()andfind_field_span()which are not defined in the spec. While the lazy patch feature is implemented and functional (confirmed bytests/lua/lazy_patch_spec.lua), the actual implementation uses C FFI calls (C.qjson_cursor_field_bytes()) rather than these abstract function names.The specification should either:
- Define what these abstract helper functions conceptually do (for algorithmic clarity)
- Map them explicitly to the actual C FFI interface used in the implementation
- Clarify upfront that this is pseudocode describing the algorithm rather than directly executable pseudocode
This will help developers understand the algorithm without needing to cross-reference the Lua implementation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/lazy-patch-spec.md` around lines 196 - 208, The spec references undefined helpers find_field_value_span and find_field_span; update the document to either (a) add brief conceptual definitions of those helpers (what inputs they take and that they return byte offsets for a field's value/span), (b) explicitly map them to the real implementation (e.g., call sites using C.qjson_cursor_field_bytes() via FFI) and show the equivalent return values, or (c) add a clear note at the top that the listing is high-level pseudocode and that the real implementation uses C FFI (see tests/lua/lazy_patch_spec.lua); include the helper names find_field_value_span and find_field_span and the C symbol C.qjson_cursor_field_bytes in the explanation so readers can locate the implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/lazy-patch-spec.md`:
- Around line 332-341: Specify and add a concrete implementation for
LazyObject.__pairs that merges original fields with patch entries and excludes
deletions: read raw _patches and _deleted tables, build a patch_map mapping
patch.key -> lua_value and a seen set, iterate the original pairs (using the
preserved original__pairs iterator) yielding keys not marked deleted and not
overridden by patches (returning patch value when key exists in patch_map), then
iterate remaining patch_map keys to yield new/overridden fields, ensuring each
key is yielded only once; reference LazyObject.__pairs, rawget(t, "_patches"),
rawget(t, "_deleted"), and original__pairs to locate and implement this
behavior.
- Around line 254-256: The current splice logic uses result:find("}%s*$") to
locate the closing brace after concatenating parts, which is fragile; modify the
code handling variables result, parts and close_pos so that if
result:find("}%s*$") returns nil you immediately fallback by returning
encode_lazy_object_walking(t) (or otherwise abort the splice), ensuring you do
not proceed with unsafe splicing when the closing brace cannot be reliably
located; update any callers that rely on the splice to handle this early return
and reference t._be only when the brace is confirmed.
- Around line 244-264: The prefix handling can produce invalid JSON when all
original fields were deleted and new_fields is non-empty; in the block that
computes result/close_pos/prefix (using variables result, close_pos, prefix,
parts, new_fields and function find_field_value_span), strip any trailing commas
and whitespace from prefix before appending new_fields (e.g., normalize prefix
by removing trailing "," and spaces so you never produce "{,"), or alternatively
compute whether any fields remain after applying deletions and only insert a
comma when appropriate; update the logic that currently uses
prefix:match("[^{%s]%s*$") to perform this normalization and add unit tests for
the edge case "all fields deleted + new fields added."
- Around line 86-143: The patch structure is inconsistent: LazyObject.__newindex
currently stores only encoded_value but LazyObject.__index expects p.lua_value;
modify __newindex so that when setting a non-nil value it stores both
encoded_value = encode_value(v) and lua_value = v in the patch entry (and when
updating an existing patch update both fields), and ensure deletions still
remove patch entries and clear deleted[k]; also update the patch-record
documentation to list lua_value alongside encoded_value and key so
readers/consumers know both fields exist.
In `@lua/qjson/table.lua`:
- Around line 712-717: The loop calling C.qjson_cursor_field_bytes on t._cur
should handle unexpected return codes the same way other calls do: after getting
rc, keep the existing branch for QJSON_NOT_FOUND that appends the new part, but
add an elseif checking for rc not equal to QJSON_OK (and not QJSON_NOT_FOUND)
and propagate/report the error (e.g., return nil / raise error / log with rc and
context) so failures from C.qjson_cursor_field_bytes are surfaced; locate the
loop over patches and modify the rc handling around C.qjson_cursor_field_bytes,
referencing variables/function names: patches, p, C.qjson_cursor_field_bytes,
QJSON_NOT_FOUND, parts, encode_string, p.encoded_value, t._cur, sz_a, sz_b.
- Around line 613-623: The call to C.qjson_cursor_field_bytes in the loop
(inside the function handling patches) only checks for QJSON_OK and ignores any
other non-NOT_FOUND return codes; update the loop to explicitly handle
unexpected error codes (e.g., QJSON_PARSE_ERROR) returned by
qjson_cursor_field_bytes: after calling C.qjson_cursor_field_bytes(t._cur,
p.key, `#p.key`, sz_a, sz_b) check rc and if rc is QJSON_OK proceed to add the
replacement as now, if rc is QJSON_NOT_FOUND skip to the new-field handling as
before, but for any other rc propagate or surface the error (for example return
nil plus an error message, or log and abort) so the caller of the function can
react instead of silently continuing; reference the symbols
qjson_cursor_field_bytes, QJSON_OK, QJSON_NOT_FOUND (and QJSON_PARSE_ERROR) and
the replacements table when making the change.
- Around line 209-213: The call to C.qjson_cursor_field_bytes (inside the loop
using view._cur and p.key) only handles QJSON_NOT_FOUND and silently ignores
other error codes; update the code to check rc for non-success values (e.g.,
QJSON_TYPE_MISMATCH, QJSON_PARSE_ERROR) and handle them explicitly instead of
skipping the field: after calling qjson_cursor_field_bytes, if rc ==
QJSON_NOT_FOUND keep the existing behavior, else if rc != 0 surface the error
(return an error tuple or raise error with a descriptive message including rc
and p.key, or log it via the module logger) so callers can react to parse/type
errors rather than silently skipping fields. Ensure to reference
qjson_cursor_field_bytes, view._cur, p.key, and the QJSON_* constants when
implementing the check.
In `@tests/lua/lazy_patch_spec.lua`:
- Around line 229-275: Add a new test case inside the describe("Lazy Patch -
edge cases", ...) block named it("handles all fields deleted with new fields
added") that uses qjson.decode to load '{"a":1,"b":2}', sets t.a = nil and t.b =
nil, then assigns t.c = 3 and t.d = 4, asserts the nil and value expectations,
calls qjson.encode and parses with cjson.decode and asserts parsed.a and
parsed.b are nil and parsed.c and parsed.d equal 3 and 4; this verifies
qjson.encode handles the "all fields deleted then new fields added" edge case
without emitting invalid JSON.
- Around line 246-254: The test "handles unicode in values" currently only uses
ASCII; update the test inside the it block that calls qjson.decode/qjson.encode
to use a multi-byte UTF-8 string (e.g., include accented characters and CJK or
emoji) for t.a so the round-trip via qjson.encode and cjson.decode actually
validates Unicode handling; ensure the asserts (assert.are.equal) compare the
original Unicode string to both t.a after assignment and parsed.a after
re-decoding to confirm multi-byte sequences are preserved.
- Around line 240-244: The test "handles special characters in keys" decodes
'{"a.b":1}', patches t["a.b"]=10 and only checks the in-memory value; extend the
test to also call qjson.encode(t) and assert the encoded string matches the
expected JSON with the special key correctly quoted/escaped (e.g.,
'{"a.b":10}'), so the splice/encoding logic in qjson.encode is validated; update
the it block that uses qjson.decode, the local variable t, and add an assertion
against qjson.encode(t).
---
Nitpick comments:
In `@docs/lazy-patch-spec.md`:
- Around line 196-208: The spec references undefined helpers
find_field_value_span and find_field_span; update the document to either (a) add
brief conceptual definitions of those helpers (what inputs they take and that
they return byte offsets for a field's value/span), (b) explicitly map them to
the real implementation (e.g., call sites using C.qjson_cursor_field_bytes() via
FFI) and show the equivalent return values, or (c) add a clear note at the top
that the listing is high-level pseudocode and that the real implementation uses
C FFI (see tests/lua/lazy_patch_spec.lua); include the helper names
find_field_value_span and find_field_span and the C symbol
C.qjson_cursor_field_bytes in the explanation so readers can locate the
implementation.
In `@src/ffi.rs`:
- Around line 840-867: The byte-range computation for a cursor is duplicated;
extract it into a shared unsafe helper like cursor_byte_range(d: &Document<'_>,
cur: Cursor) -> Result<(usize, usize), qjson_err> that performs the lead-byte
check, handles containers/strings by using cur.idx_end and bounds-checking
against d.buf.len(), and delegates scalars to scalar_byte_range; then replace
the duplicated logic in qjson_cursor_bytes and qjson_cursor_field_bytes (or any
other sites) to call cursor_byte_range and map its Ok/Err to the existing
*value_start/*value_end and qjson_err return conventions.
In `@tests/lua/lazy_patch_spec.lua`:
- Around line 3-44: Add a new test case named "deletes a non-existent field
without error" to the Lazy Patch suite that decodes '{"a":1}' via qjson.decode,
sets t.b = nil (where b was never present), asserts t.b is nil and t.a remains
1, then qjson.encode(t) and cjson.decode the output and assert parsed.b is nil
and parsed.a equals 1; place this alongside the other "it" blocks so the
behavior of qjson.decode/encode and the lazy-patch logic for deleting
non-existent fields is validated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 39a453a2-e4f7-49ae-9d3e-5ab1c9602e90
📒 Files selected for processing (6)
docs/lazy-patch-spec.mdinclude/qjson.hlua/qjson.lualua/qjson/table.luasrc/ffi.rstests/lua/lazy_patch_spec.lua
| -- Handle new fields (not in original) | ||
| local new_fields = {} | ||
| for _, p in ipairs(patches) do | ||
| if not find_field_value_span(t, p.key) then | ||
| new_fields[#new_fields + 1] = '"' .. p.key .. '":' .. p.encoded_value | ||
| end | ||
| end | ||
|
|
||
| if #new_fields > 0 then | ||
| -- Insert before closing brace | ||
| local result = table.concat(parts) | ||
| local close_pos = result:find("}%s*$") | ||
| if close_pos then | ||
| local prefix = result:sub(1, close_pos - 1) | ||
| -- Add comma if there are existing fields | ||
| if prefix:match("[^{%s]%s*$") then | ||
| prefix = prefix .. "," | ||
| end | ||
| return prefix .. table.concat(new_fields, ",") .. "}" | ||
| end | ||
| end |
There was a problem hiding this comment.
Risk of invalid JSON when all fields are deleted.
When all original fields are deleted and new fields are added, the logic at line 259 checks if prefix:match("[^{%s]%s*$") to decide whether to add a comma. However, if the prefix is just {, this pattern won't match, and no comma is added. But if replacements included deletions that left trailing commas, you could end up with malformed JSON like {,"new_field":1}.
Consider a more robust approach:
- Track whether any fields remain after applying deletions
- Normalize comma handling by removing any trailing comma in the prefix before appending new fields
- Add unit tests for this specific edge case (all fields deleted + new fields added)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/lazy-patch-spec.md` around lines 244 - 264, The prefix handling can
produce invalid JSON when all original fields were deleted and new_fields is
non-empty; in the block that computes result/close_pos/prefix (using variables
result, close_pos, prefix, parts, new_fields and function
find_field_value_span), strip any trailing commas and whitespace from prefix
before appending new_fields (e.g., normalize prefix by removing trailing "," and
spaces so you never produce "{,"), or alternatively compute whether any fields
remain after applying deletions and only insert a comma when appropriate; update
the logic that currently uses prefix:match("[^{%s]%s*$") to perform this
normalization and add unit tests for the edge case "all fields deleted + new
fields added."
There was a problem hiding this comment.
The implementation does not use the spec's }%s*$ splice path for this case. When deletions or new fields are present, encode_with_patches falls through to encode_lazy_object_walking_with_patches (table.lua:641-643), which walks original keys (skipping deleted) and then appends new fields — it cannot produce {,. Splice path is only used for the "all-existing-key replacements, no deletions" case. Will add a clarifying note to the spec; adding a regression test for the all-deleted-plus-new edge case (R8) regardless.
There was a problem hiding this comment.
@membphis, thanks for the detailed clarification. That makes sense — since encode_proxy routes any case involving deletions or new fields to encode_lazy_object_walking_with_patches before reaching the splice path, the {, scenario can't arise in practice. My concern was against the spec's pseudo-code in isolation, which doesn't make the fallthrough guard explicit in that section.
Adding a clarifying note to the spec (e.g., a comment in the encode_with_patches block stating "only reached when patches are pure replacements of existing keys with no deletions/new fields") would make the invariant self-documenting and prevent future confusion. The R8 regression test is a good defensive addition regardless.
|
|
||
| ```lua | ||
| tab.model = "gpt4o" | ||
| for k, v in qjson.pairs(tab) do | ||
| print(k, v) | ||
| end | ||
| ``` | ||
|
|
||
| **Handling**: `__pairs` must merge original fields with patches and exclude deleted fields. | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Missing algorithm specification for __pairs iteration.
The edge case section mentions that "__pairs must merge original fields with patches and exclude deleted fields" (line 340), but the algorithm for implementing this is not provided. Given the complexity of correctly merging three sources (original fields, patches, deletions), this algorithm should be specified.
Add a section showing the __pairs implementation, for example:
LazyObject.__pairs = function(t)
local patches = rawget(t, "_patches") or {}
local deleted = rawget(t, "_deleted") or {}
local seen = {}
local patch_map = {}
-- Build patch map
for _, p in ipairs(patches) do
patch_map[p.key] = p.lua_value
seen[p.key] = true
end
-- Iterator state
local orig_iter, orig_state, orig_key = original__pairs(t)
local patch_idx = 1
local patch_keys = {}
for k in pairs(patch_map) do
patch_keys[`#patch_keys` + 1] = k
end
return function(state, k)
-- Iterate original fields first (skipping deleted & patched)
while true do
local next_k, next_v = orig_iter(orig_state, k)
if next_k == nil then break end
if not deleted[next_k] and not seen[next_k] then
seen[next_k] = true
return next_k, patch_map[next_k] or next_v
end
k = next_k
end
-- Then iterate patched/new fields
while patch_idx <= `#patch_keys` do
local pk = patch_keys[patch_idx]
patch_idx = patch_idx + 1
if not seen[pk] then
seen[pk] = true
return pk, patch_map[pk]
end
end
return nil
end, t, nil
end🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/lazy-patch-spec.md` around lines 332 - 341, Specify and add a concrete
implementation for LazyObject.__pairs that merges original fields with patch
entries and excludes deletions: read raw _patches and _deleted tables, build a
patch_map mapping patch.key -> lua_value and a seen set, iterate the original
pairs (using the preserved original__pairs iterator) yielding keys not marked
deleted and not overridden by patches (returning patch value when key exists in
patch_map), then iterate remaining patch_map keys to yield new/overridden
fields, ensuring each key is yielded only once; reference LazyObject.__pairs,
rawget(t, "_patches"), rawget(t, "_deleted"), and original__pairs to locate and
implement this behavior.
There was a problem hiding this comment.
__pairs is implemented at lua/qjson/table.lua:169-225 (lazy_object_iter + LazyObject.__pairs): iterates original entries via qjson_cursor_object_entry_at, skips deleted, returns patched value when a patch covers the key, then iterates remaining patches and yields only those whose key is QJSON_NOT_FOUND in the original. Adding a pointer to this in the spec.
There was a problem hiding this comment.
@membphis, thanks for the clarification! The implementation at lua/qjson/table.lua:169-225 covers the algorithm well — using qjson_cursor_object_entry_at to iterate originals, skipping deleted keys, returning patched values inline, and then draining new-only patches at the end is exactly the right approach. Adding a cross-reference from the spec's edge-case section to that implementation will close the documentation gap nicely.
- Move encode_lazy_object_walking_with_patches before encode_with_patches (Lua local function defined later was unresolvable at compile time, causing nil-call errors on new-field/deletion paths). - Update tests in lazy_patch_spec.lua and lazy_table_spec.lua to match the spec's edge-case-1 design: writing to a LazyObject records a patch on the object instead of materializing it, so the metatable stays LazyObject after assignment.
|
Responses to the three nitpicks in the top-level review:
|
- LazyObject.__newindex: raise on non-string keys instead of silently recording an unreadable patch (Copilot C3). - Remove unused has_patches / has_deletions helpers (Copilot C2). - encode_with_patches: classify each patch in a single qjson_cursor_field_bytes pass (was doing two) and propagate unexpected return codes via check() (Copilot C4 + CodeRabbit R6). - encode_lazy_object_walking_with_patches: propagate unexpected qjson_cursor_field_bytes return codes (CodeRabbit R7). - lazy_object_iter: same error-propagation fix (CodeRabbit R5). - tests/lua/lazy_patch_spec.lua: extend special-key test to verify encoded output, swap the unicode test to multi-byte UTF-8, add "delete non-existent field", "all originals deleted + new fields added", and "non-string key raises" cases (CodeRabbit R8/R9/R10 + nitpick N2). - docs/lazy-patch-spec.md: add upfront note that the listing is design pseudocode (pointing helper names at qjson_cursor_field_bytes), and correct the patch-record example to include lua_value alongside encoded_value (CodeRabbit R1 + nitpick N3).
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/lua/lazy_patch_spec.lua (1)
120-128: ⚡ Quick winAdd a regression test for mutating a patched table after assignment.
Please add a case like
t.a = {x=1}; t.a.x = 2;and assertqjson.encode(t)containsx=2. Current coverage only checks immediate assignment snapshots.Also applies to: 251-260
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/lua/lazy_patch_spec.lua` around lines 120 - 128, Add a regression case to the "changes scalar to object" test in lazy_patch_spec.lua that verifies mutations to a patched table after assignment are preserved: after creating t via qjson.decode and assigning t.a = {x = 1}, mutate the nested field (t.a.x = 2) and then qjson.encode(t) and decode with cjson (or assert the encoded string) to assert the value is 2; update the existing it("changes scalar to object"...) block (and similar block around lines 251-260) to perform t.a = {x=1}; t.a.x = 2; and assert the encoded/decoded result shows x == 2 so the lazy patching preserves later mutations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/lazy-patch-spec.md`:
- Line 472: In docs/lazy-patch-spec.md update the two fenced code blocks that
currently have bare backticks (the blocks showing the
"decode/modify/encode/total" timings around the examples including "decode:
5.97 μs (20%)" and "decode: 5.97 μs (55%)") to include a language identifier
(e.g., change ``` to ```text or ```lua) so the MD040 lint rule is satisfied;
locate the fences that wrap the timing lines and add the chosen language token
to the opening fence.
In `@lua/qjson/table.lua`:
- Around line 351-366: The code stores a one-time snapshot in p.encoded_value
which becomes stale if p.lua_value (a table) is mutated later; instead stop
caching encoded bytes: remove p.encoded_value from patches and always call
encode(p.lua_value) at emit time (replace reads of p.encoded_value with
encode(p.lua_value)). Update the creation/update site (where patches[`#patches`+1]
and the loop that sets p.encoded_value and p.lua_value) to only store lua_value
= v, and modify all emit sites that reference p.encoded_value (including the
places noted around the other ranges) to re-encode via encode(p.lua_value) so
mutations are serialized correctly.
---
Nitpick comments:
In `@tests/lua/lazy_patch_spec.lua`:
- Around line 120-128: Add a regression case to the "changes scalar to object"
test in lazy_patch_spec.lua that verifies mutations to a patched table after
assignment are preserved: after creating t via qjson.decode and assigning t.a =
{x = 1}, mutate the nested field (t.a.x = 2) and then qjson.encode(t) and decode
with cjson (or assert the encoded string) to assert the value is 2; update the
existing it("changes scalar to object"...) block (and similar block around lines
251-260) to perform t.a = {x=1}; t.a.x = 2; and assert the encoded/decoded
result shows x == 2 so the lazy patching preserves later mutations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d05f2b5c-890c-485e-9098-2ca7cf5b41ac
📒 Files selected for processing (4)
docs/lazy-patch-spec.mdlua/qjson/table.luatests/lua/lazy_patch_spec.luatests/lua/lazy_table_spec.lua
Resolve conflict in lua/qjson.lua: adopt main's qjson.lib module structure and add qjson_cursor_field_bytes FFI declaration to lib.lua.
| -- Encode the new value (we need both encoded and lua_value) | ||
| local encoded = encode(v) | ||
|
|
||
| -- Update or add patch | ||
| local found = false | ||
| for _, p in ipairs(patches) do | ||
| if p.key == k then | ||
| p.encoded_value = encoded | ||
| p.lua_value = v | ||
| found = true | ||
| break | ||
| end | ||
| end | ||
| if not found then | ||
| patches[#patches + 1] = { key = k, encoded_value = encoded, lua_value = v } | ||
| end |
There was a problem hiding this comment.
Real bug — same as the CodeRabbit thread on table.lua:362. Reproduced locally:
local t = qjson.decode('{"a":1}')
t.a = {x = 1}; t.a.x = 2
qjson.encode(t) -- '{"a":{"x":1}}' (expected '{"a":{"x":2}}')Fix: stop caching encoded_value in the patch record; re-encode p.lua_value at emit time in encode_with_patches (table.lua:652) and encode_lazy_object_walking_with_patches (table.lua:606, :625).
| } | ||
| }) | ||
| } | ||
|
|
There was a problem hiding this comment.
Adopting. qjson_cursor_field_bytes is the splice path's load-bearing FFI symbol and is currently only exercised through the Lua spec. Extending tests/ffi_cursor_bytes.rs to cover, directly at the FFI boundary:
- value spans for string / number / object / array fields
- missing key →
QJSON_NOT_FOUND - non-object cursor →
QJSON_TYPE_MISMATCH - null
value_start/value_end→QJSON_INVALID_ARG - splice round-trip: extract the span, substitute a different JSON value, re-parse, check semantics
| Current behavior when modifying a field on a lazy table: | ||
|
|
||
| ```lua | ||
| local tab = qjson.decode(json_str) -- Returns LazyObject proxy | ||
| tab.model = "gpt4o" -- Triggers __newindex → materializes entire root | ||
| local out = qjson.encode(tab) -- Walks materialized table + lazy subtrees | ||
| ``` | ||
|
|
||
| The `__newindex` handler (`lua/qjson/table.lua:266-285`) calls `materialize_object_contents()` which: | ||
| 1. Iterates all key/value pairs via FFI | ||
| 2. Clears the metatable | ||
| 3. Copies all values into the plain table | ||
| 4. Sets the new value |
There was a problem hiding this comment.
Adopting. The doc is already labelled as the original design memo (see the note at the top), but the literal lua/qjson/table.lua:266-285 line range in the "Current behavior" section is stale enough to mislead — the pre-patch materialize_object_contents no longer lives at those lines. Replacing the line-number citation with a name-based reference (LazyObject.__newindex / materialize_object_contents) so it doesn't rot again.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
lua/qjson/table.lua (2)
105-111:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winStale doc comment:
__newindexno longer materializest.ain-place.This block still describes the pre-patch behavior ("__newindex materializes
t.ain-place, and the nextt.alookup retrieves the already-materialized table"). With the new design,__newindexrecords into_patches/_deletedand does not materialize. The rationale for rawset-caching here should be re-stated (identity stability for nestedt.a.x = vso the same lazy child receives the patch).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lua/qjson/table.lua` around lines 105 - 111, Update the stale doc comment in lua/qjson/table.lua: replace the description that "__newindex materializes `t.a` in-place" with the new behavior that __newindex records updates into `_patches` and deletions into `_deleted` without materializing the child; state that rawset-caching of container results is retained to preserve identity stability so nested operations like `t.a.x = v` target the same lazy child proxy and therefore receive the same recorded patch. Mention the relevant symbols `__newindex`, `_patches`, `_deleted`, and the rawset-cached lazy child behavior so readers understand why caching is still required.
112-139:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftContainer rawset cache silently bypasses
__newindexpatch tracking.
read_object_fieldrawset-caches container values at Line 137. In Lua,__newindexonly fires when the raw key is absent, so any subsequent assignment to a previously-read container key skips patch recording entirely:local obj = qjson.decode('{"foo":{"x":1}}') local _ = obj.foo -- triggers __index → rawset(obj, "foo", lazy_proxy) obj.foo = "replaced" -- __newindex NOT called; raw assignment wins -- _patches has no entry for "foo" -- has_dirty_children() ignores non-table cached values -- encode_proxy() takes the slice fast path and returns the ORIGINAL bytes qjson.encode(obj) -- => '{"foo":{"x":1}}' (silently dropped!)Delete is even worse:
obj.foo = nilremoves the raw key, so the next read falls through to the cursor and resurrects the original value, while encode never sees a_deletedentry.This silently corrupts user output for a reasonable access pattern (
read-then-write). A few possible fixes:
- In
__newindex, dorawset(t, k, nil)before recording the patch, and have__newindexitself perform the patch bookkeeping even for keys that were cached — i.e., funnel writes through a helper that always records into_patches/_deleted, by stripping any cached entry first.- Alternatively, do not rawset-cache containers in
read_object_field; instead, keep an internal sidecar table (e.g._child_cache) shielded from user writes so__newindexis always invoked.🛠 Sketch: shadow `_child_cache` so `__newindex` always fires
- local v = decode_cursor(self, child_box) - -- Cache containers so identity is stable and materialization sticks. - if type(v) == "table" then rawset(self, key, v) end - return v + local cache = rawget(self, "_child_cache") + if cache then + local cached = cache[key] + if cached ~= nil then return cached end + end + local v = decode_cursor(self, child_box) + if type(v) == "table" then + if not cache then + cache = {} + rawset(self, "_child_cache", cache) + end + cache[key] = v + end + return v(And update
INTERNAL_KEYS,has_dirty_children, walking encoders, and iter to consult_child_cacheinstead ofrawget(self, k).)Also applies to: 323-367
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lua/qjson/table.lua` around lines 112 - 139, read_object_field currently rawset-caches container values (rawset(self, key, v)) which bypasses __newindex patch tracking and causes read-then-write to silently drop changes; fix by removing the direct rawset caching and instead store child containers in a private sidecar (e.g. _child_cache) or, if keeping the cache, ensure __newindex first clears any cached entry (rawset(self, key, nil)) and then records into _patches/_deleted via a shared helper so writes always create patches; update all places that consulted rawget(self, k) (has_dirty_children, encode_proxy/encoders, iter, INTERNAL_KEYS logic) to consult _child_cache or the helper so identity/materialization and delete resurrection are handled correctly.
🧹 Nitpick comments (1)
lua/qjson/table.lua (1)
186-192: 💤 Low valueIter does an O(n·m) patch scan per original field.
For each original field, the inner
for _, p in ipairs(patches)does a linear scan of all patches.encode_lazy_object_walking_with_patchesalready solved this by building apatched_keyslookup once; consider doing the same here so iteration is O(n + m) instead of O(n·m).♻️ Suggested change
local function lazy_object_iter(state, _prev_key) local view = state.view - local patches = rawget(view, "_patches") + local patches = rawget(view, "_patches") local deleted = rawget(view, "_deleted") + local patched_keys = state.patched_keys + if patches and not patched_keys then + patched_keys = {} + for _, p in ipairs(patches) do patched_keys[p.key] = p end + state.patched_keys = patched_keys + end @@ - if patches then - for _, p in ipairs(patches) do - if p.key == k then - return k, p.lua_value - end - end - end + if patched_keys then + local p = patched_keys[k] + if p then return k, p.lua_value end + end(Build
patched_keysonce inLazyObject.__pairsinstead, to avoid the nil-check on every call.)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lua/qjson/table.lua` around lines 186 - 192, The iterator is performing an O(n·m) scan because for each original field it linearly scans `patches`; fix this by building a `patched_keys` lookup once (as done in `encode_lazy_object_walking_with_patches`) in `LazyObject.__pairs` and have the iterator (`iter`) consult `patched_keys` and `patches` via direct lookup instead of iterating `ipairs(patches)` for each key; update references to `patches`, `patched_keys`, and `p.lua_value` so the iterator returns patched values by key using the lookup rather than the inner loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@lua/qjson/table.lua`:
- Around line 105-111: Update the stale doc comment in lua/qjson/table.lua:
replace the description that "__newindex materializes `t.a` in-place" with the
new behavior that __newindex records updates into `_patches` and deletions into
`_deleted` without materializing the child; state that rawset-caching of
container results is retained to preserve identity stability so nested
operations like `t.a.x = v` target the same lazy child proxy and therefore
receive the same recorded patch. Mention the relevant symbols `__newindex`,
`_patches`, `_deleted`, and the rawset-cached lazy child behavior so readers
understand why caching is still required.
- Around line 112-139: read_object_field currently rawset-caches container
values (rawset(self, key, v)) which bypasses __newindex patch tracking and
causes read-then-write to silently drop changes; fix by removing the direct
rawset caching and instead store child containers in a private sidecar (e.g.
_child_cache) or, if keeping the cache, ensure __newindex first clears any
cached entry (rawset(self, key, nil)) and then records into _patches/_deleted
via a shared helper so writes always create patches; update all places that
consulted rawget(self, k) (has_dirty_children, encode_proxy/encoders, iter,
INTERNAL_KEYS logic) to consult _child_cache or the helper so
identity/materialization and delete resurrection are handled correctly.
---
Nitpick comments:
In `@lua/qjson/table.lua`:
- Around line 186-192: The iterator is performing an O(n·m) scan because for
each original field it linearly scans `patches`; fix this by building a
`patched_keys` lookup once (as done in
`encode_lazy_object_walking_with_patches`) in `LazyObject.__pairs` and have the
iterator (`iter`) consult `patched_keys` and `patches` via direct lookup instead
of iterating `ipairs(patches)` for each key; update references to `patches`,
`patched_keys`, and `p.lua_value` so the iterator returns patched values by key
using the lookup rather than the inner loop.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 04e6e7b1-86e7-4647-ac59-253bcddcdc9a
📒 Files selected for processing (3)
lua/qjson.lualua/qjson/lib.lualua/qjson/table.lua
✅ Files skipped from review due to trivial changes (1)
- lua/qjson.lua
|
@coderabbitai — responses to the three "outside diff range" comments on
Reproduced locally: -- Repro 1: scalar replacement is silently dropped
local obj = qjson.decode('{"foo":{"x":1}}')
local _ = obj.foo -- triggers rawset(obj, "foo", lazy_proxy)
obj.foo = "replaced" -- __newindex NOT called; raw assignment wins
print(qjson.encode(obj)) -- '{"foo":{"x":1}}' (silently dropped)
-- Repro 2: delete is even worse — read resurrects the original
local obj2 = qjson.decode('{"foo":{"x":1},"bar":2}')
local _ = obj2.foo
obj2.foo = nil
print(obj2.foo) -- still '{"x":1}' (rawget falls through to cursor)
print(qjson.encode(obj2)) -- '{"foo":{"x":1},"bar":2}' (delete ignored)Fix: move container caching into a private
Rewriting the comment to describe the new contract:
Consistent with the prior decline on #3258298368 (the same O(n·m) concern on |
|
Tip For best results, initiate chat on the files or code changes.
Would you like to open a follow-up issue to track the performance target (~10 μs/op) and the remaining pending items (delete-field support, array patch support) so they don't get lost after this PR merges? |
…tches at emit Two regressions in the structural-patching path on PR #44: 1. Container reads rawset-cached children onto the view itself, so a subsequent assignment to that key bypassed __newindex (Lua only fires __newindex when the raw key is absent). Patches and deletions silently no-op'd after a prior read. Move the container cache to a `_child_cache` sidecar so the raw key stays absent and __newindex always fires. 2. Patches stored an `encoded_value` snapshot at write time, so a post-assignment mutation like `t.a = {x=1}; t.a.x = 2` was lost. Store only `lua_value` and re-encode at emit time. Regression tests in tests/lua/lazy_patch_spec.lua cover both. Adds direct coverage for qjson_cursor_field_bytes in tests/ffi_cursor_bytes.rs.
Summary
Add lazy patch support to avoid materializing the entire root container when modifying a few fields on a
qjson.decode()result.Problem
Current behavior when modifying a field:
This defeats the lazy decode advantage for the common "decode → modify few fields → encode" workflow.
Solution
Record patches instead of materializing, then splice the original buffer during encode:
Changes
qjson_cursor_field_bytesto get field value byte spansLazyObject.__newindexto record patchesLazyObject.__indexto check patches firstencode_with_patchesfor splice-based encodinglazy_patch_spec.luadocs/lazy-patch-spec.mdwith detailed designStatus
Test Results
Summary by CodeRabbit
New Features
Documentation
Tests