feat(lua): lighten keep_origin provenance by membphis · Pull Request #131 · api7/lua-qjson

membphis · 2026-06-01T01:24:10Z

Summary

add threshold-based keep_origin provenance with origin.complete gating whole-slice reuse
keep partial object origins useful for key order and large-child reuse while incomplete arrays fall back to normal encoding
document the relaxed byte-identical guarantee and add Lua coverage for the new semantics

Fixes #130

Test Plan

PATH="$HOME/.luarocks/bin:$PATH" make test
PATH="$HOME/.luarocks/bin:$PATH" make lint
cargo test --release --no-default-features
cargo test --features test-panic --release

Summary by CodeRabbit

Documentation
- Updated explanation of qjson.materialize() provenance tracking, covering how key order preservation and token reuse are optimized.
Tests
- Expanded test coverage for materialization behavior with various edge cases and reuse scenarios.

coderabbitai · 2026-06-01T01:24:21Z

Warning

Review limit reached

@membphis, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 21 minutes and 58 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: df22c583-5e77-4027-ba96-b5450609edc8

📥 Commits

Reviewing files that changed from the base of the PR and between 7dcfac5 and 1045319.

📒 Files selected for processing (2)

lua/qjson/table.lua
tests/lua/origin_materialize_spec.lua

📝 Walkthrough

Walkthrough

This PR reworks qjson.materialize(t, { keep_origin = true }) to use threshold-based lightweight provenance recording instead of full-tree recording. Materialization now records provenance only for sufficiently large strings and complete child containers, tracking a complete flag to indicate whether all children needed for byte-identical reuse have been captured. Encoding-side origin matching is updated to respect this flag, allowing full-slice reuse only when complete == true.

Changes

Threshold-based lightweight provenance for origin tracking

Layer / File(s)	Summary
Threshold constants and child-recording helper `lua/qjson/table.lua`	Introduces `ORIGIN_STRING_MIN_RAW` and `ORIGIN_TABLE_MIN_RAW` thresholds. New `origin_child_record` helper consolidates the logic for deciding whether a string or child table qualifies for provenance recording based on raw token length and child completeness; otherwise marks recording as incomplete.
Materialization with complete-flag tracking `lua/qjson/table.lua`	Object and array materialization paths now track a `complete` accumulator. Each child is processed through `origin_child_record` to determine recordability. When any qualifying child is omitted or a child table is incomplete, `complete` is set to false. The final `complete` flag is stored in provenance metadata for both container types.
Encoding-side origin matching and reuse rules `lua/qjson/table.lua`	Origin-based full-match checks for objects and arrays now immediately fail unless `complete == true`. `encode_origin_child` only reuses raw JSON slices for matching string origins; all other cases fall back to normal encoding. Array fast-path origin-slice reuse only fires when `complete` is true.
Test coverage and documentation `tests/lua/origin_materialize_spec.lua`, `README.md`	Comprehensive test coverage verifies threshold behavior: small escaped strings don't guarantee token reuse after parent mutation; large strings above threshold do reuse; unmodified small-scalar containers re-emit field-by-field rather than as original slices; unmodified containers with only large complete children return original slices unchanged; partial origins don't hide nested mutations; complete child subtrees are reused when parent is modified. README documents provenance recording thresholds and how `complete` flag affects encoding behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

api7/lua-qjson#129: Introduces the foundational keep_origin provenance feature for qjson.materialize, which this PR optimizes with threshold-based lightweight recording and complete-flag matching.

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
E2e Test Quality Review	⚠️ Warning	Functional test coverage is good (18 tests), but implementation has unaddressed review feedback: eager string allocation defeats optimization goal; no E2E tests verify reduced allocation count.	Fix eager raw-token slicing by passing (source, bs, be) to origin_child_record and only slicing when thresholds are met; add test measuring reduced allocation count for scalar-heavy payloads.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(lua): lighten keep_origin provenance' directly and clearly summarizes the main change—introducing lightweight, threshold-based provenance to reduce memory overhead in the keep_origin feature.
Linked Issues check	✅ Passed	The changes implement all core coding requirements from issue `#130`: threshold-based provenance with origin.complete flag, proper recording policies for strings and tables, updated encoding rules, and comprehensive test coverage for new behaviors.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue `#130` objectives: README documents new semantics, lua/qjson/table.lua implements lightweight provenance thresholds and encoding updates, and tests validate the new behavior comprehensively.
Security Check	✅ Passed	Security checklist targets API Gateway platforms; qjson is a JSON serialization library. PR `#131` optimizes provenance tracking with no auth, database, secrets, or TLS concerns.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/issue-130-lightweight-origin

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/lua/origin_materialize_spec.lua (1)

37-66: ⚡ Quick win

Add exact-boundary coverage for the >24 / >64 rules.

These tests cover well-below and well-above the thresholds, but the implementation relies on strict > comparisons. A regression to >= would currently slip through. One string token at exactly 24 bytes and one child container at exactly 64 bytes would lock down the contract nicely.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/lua/origin_materialize_spec.lua` around lines 37 - 66, Add two tests
that hit the exact thresholds to lock the >24 and >64 semantics: (1) add an
"exact 24-byte escaped string" test that builds a JSON with a string token whose
escaped/raw length is exactly 24 bytes and verifies materialize/encode behaves
as it does for values below the large-string threshold (i.e. not treated as
"above 24"); reference the existing pattern in origin_materialize_spec.lua (use
qjson.decode, qjson.materialize, qjson.encode and modify/inspect fields as in
the other string tests). (2) add an "exact 64-byte child container" test that
constructs a parent object whose single child container's encoded size is
exactly 64 bytes and verifies materialize/encode behaves as it does for values
below the large-container threshold (i.e. not treated as "above 64"); follow the
existing container tests' style and assertions so a regression from > to >=
would fail.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lua/qjson/table.lua`:
- Around line 561-579: origin_child_record currently forces callers to
materialize raw_token (allocating Lua strings) before checking thresholds;
change it to accept byte-span metadata (bs, be or a span table) instead of
raw_token and only call source:sub(bs, be) inside the branches that actually
keep provenance. Update origin_child_record signature and body to use bs/be for
length checks against ORIGIN_STRING_MIN_RAW and ORIGIN_TABLE_MIN_RAW and to
build the returned record (string raw or table origin) after materializing the
substring; also update callers that currently call cursor_raw_token(...) to
instead fetch the bs/be span (without slicing) and pass that span into
origin_child_record, and ensure TABLE_ORIGIN and child_origin checks remain
unchanged.

---

Nitpick comments:
In `@tests/lua/origin_materialize_spec.lua`:
- Around line 37-66: Add two tests that hit the exact thresholds to lock the >24
and >64 semantics: (1) add an "exact 24-byte escaped string" test that builds a
JSON with a string token whose escaped/raw length is exactly 24 bytes and
verifies materialize/encode behaves as it does for values below the large-string
threshold (i.e. not treated as "above 24"); reference the existing pattern in
origin_materialize_spec.lua (use qjson.decode, qjson.materialize, qjson.encode
and modify/inspect fields as in the other string tests). (2) add an "exact
64-byte child container" test that constructs a parent object whose single child
container's encoded size is exactly 64 bytes and verifies materialize/encode
behaves as it does for values below the large-container threshold (i.e. not
treated as "above 64"); follow the existing container tests' style and
assertions so a regression from > to >= would fail.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 943cd06c-4685-4f6a-8eb9-95d0ebe0c801

📥 Commits

Reviewing files that changed from the base of the PR and between 6225ce2 and 7dcfac5.

📒 Files selected for processing (3)

README.md
lua/qjson/table.lua
tests/lua/origin_materialize_spec.lua

feat(lua): lighten keep_origin provenance

7dcfac5

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread lua/qjson/table.lua Outdated

membphis added 2 commits June 1, 2026 15:04

fix(lua): avoid eager origin token slicing

baf5133

test(lua): keep origin allocation check lint-clean

1045319

membphis merged commit 2173c1a into main Jun 1, 2026
16 checks passed

membphis deleted the codex/issue-130-lightweight-origin branch June 1, 2026 07:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lua): lighten keep_origin provenance#131

feat(lua): lighten keep_origin provenance#131
membphis merged 3 commits into
mainfrom
codex/issue-130-lightweight-origin

membphis commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

membphis commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

membphis commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading