Skip to content

feat(lua): lighten keep_origin provenance#131

Merged
membphis merged 3 commits into
mainfrom
codex/issue-130-lightweight-origin
Jun 1, 2026
Merged

feat(lua): lighten keep_origin provenance#131
membphis merged 3 commits into
mainfrom
codex/issue-130-lightweight-origin

Conversation

@membphis
Copy link
Copy Markdown
Collaborator

@membphis membphis commented Jun 1, 2026

Summary

  • add threshold-based keep_origin provenance with origin.complete gating whole-slice reuse
  • keep partial object origins useful for key order and large-child reuse while incomplete arrays fall back to normal encoding
  • document the relaxed byte-identical guarantee and add Lua coverage for the new semantics

Fixes #130

Test Plan

  • PATH="$HOME/.luarocks/bin:$PATH" make test
  • PATH="$HOME/.luarocks/bin:$PATH" make lint
  • cargo test --release --no-default-features
  • cargo test --features test-panic --release

Summary by CodeRabbit

  • Documentation

    • Updated explanation of qjson.materialize() provenance tracking, covering how key order preservation and token reuse are optimized.
  • Tests

    • Expanded test coverage for materialization behavior with various edge cases and reuse scenarios.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

Warning

Review limit reached

@membphis, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 21 minutes and 58 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: df22c583-5e77-4027-ba96-b5450609edc8

📥 Commits

Reviewing files that changed from the base of the PR and between 7dcfac5 and 1045319.

📒 Files selected for processing (2)
  • lua/qjson/table.lua
  • tests/lua/origin_materialize_spec.lua
📝 Walkthrough

Walkthrough

This PR reworks qjson.materialize(t, { keep_origin = true }) to use threshold-based lightweight provenance recording instead of full-tree recording. Materialization now records provenance only for sufficiently large strings and complete child containers, tracking a complete flag to indicate whether all children needed for byte-identical reuse have been captured. Encoding-side origin matching is updated to respect this flag, allowing full-slice reuse only when complete == true.

Changes

Threshold-based lightweight provenance for origin tracking

Layer / File(s) Summary
Threshold constants and child-recording helper
lua/qjson/table.lua
Introduces ORIGIN_STRING_MIN_RAW and ORIGIN_TABLE_MIN_RAW thresholds. New origin_child_record helper consolidates the logic for deciding whether a string or child table qualifies for provenance recording based on raw token length and child completeness; otherwise marks recording as incomplete.
Materialization with complete-flag tracking
lua/qjson/table.lua
Object and array materialization paths now track a complete accumulator. Each child is processed through origin_child_record to determine recordability. When any qualifying child is omitted or a child table is incomplete, complete is set to false. The final complete flag is stored in provenance metadata for both container types.
Encoding-side origin matching and reuse rules
lua/qjson/table.lua
Origin-based full-match checks for objects and arrays now immediately fail unless complete == true. encode_origin_child only reuses raw JSON slices for matching string origins; all other cases fall back to normal encoding. Array fast-path origin-slice reuse only fires when complete is true.
Test coverage and documentation
tests/lua/origin_materialize_spec.lua, README.md
Comprehensive test coverage verifies threshold behavior: small escaped strings don't guarantee token reuse after parent mutation; large strings above threshold do reuse; unmodified small-scalar containers re-emit field-by-field rather than as original slices; unmodified containers with only large complete children return original slices unchanged; partial origins don't hide nested mutations; complete child subtrees are reused when parent is modified. README documents provenance recording thresholds and how complete flag affects encoding behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • api7/lua-qjson#129: Introduces the foundational keep_origin provenance feature for qjson.materialize, which this PR optimizes with threshold-based lightweight recording and complete-flag matching.
🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
E2e Test Quality Review ⚠️ Warning Functional test coverage is good (18 tests), but implementation has unaddressed review feedback: eager string allocation defeats optimization goal; no E2E tests verify reduced allocation count. Fix eager raw-token slicing by passing (source, bs, be) to origin_child_record and only slicing when thresholds are met; add test measuring reduced allocation count for scalar-heavy payloads.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(lua): lighten keep_origin provenance' directly and clearly summarizes the main change—introducing lightweight, threshold-based provenance to reduce memory overhead in the keep_origin feature.
Linked Issues check ✅ Passed The changes implement all core coding requirements from issue #130: threshold-based provenance with origin.complete flag, proper recording policies for strings and tables, updated encoding rules, and comprehensive test coverage for new behaviors.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #130 objectives: README documents new semantics, lua/qjson/table.lua implements lightweight provenance thresholds and encoding updates, and tests validate the new behavior comprehensively.
Security Check ✅ Passed Security checklist targets API Gateway platforms; qjson is a JSON serialization library. PR #131 optimizes provenance tracking with no auth, database, secrets, or TLS concerns.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/issue-130-lightweight-origin

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/lua/origin_materialize_spec.lua (1)

37-66: ⚡ Quick win

Add exact-boundary coverage for the >24 / >64 rules.

These tests cover well-below and well-above the thresholds, but the implementation relies on strict > comparisons. A regression to >= would currently slip through. One string token at exactly 24 bytes and one child container at exactly 64 bytes would lock down the contract nicely.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/lua/origin_materialize_spec.lua` around lines 37 - 66, Add two tests
that hit the exact thresholds to lock the >24 and >64 semantics: (1) add an
"exact 24-byte escaped string" test that builds a JSON with a string token whose
escaped/raw length is exactly 24 bytes and verifies materialize/encode behaves
as it does for values below the large-string threshold (i.e. not treated as
"above 24"); reference the existing pattern in origin_materialize_spec.lua (use
qjson.decode, qjson.materialize, qjson.encode and modify/inspect fields as in
the other string tests). (2) add an "exact 64-byte child container" test that
constructs a parent object whose single child container's encoded size is
exactly 64 bytes and verifies materialize/encode behaves as it does for values
below the large-container threshold (i.e. not treated as "above 64"); follow the
existing container tests' style and assertions so a regression from > to >=
would fail.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lua/qjson/table.lua`:
- Around line 561-579: origin_child_record currently forces callers to
materialize raw_token (allocating Lua strings) before checking thresholds;
change it to accept byte-span metadata (bs, be or a span table) instead of
raw_token and only call source:sub(bs, be) inside the branches that actually
keep provenance. Update origin_child_record signature and body to use bs/be for
length checks against ORIGIN_STRING_MIN_RAW and ORIGIN_TABLE_MIN_RAW and to
build the returned record (string raw or table origin) after materializing the
substring; also update callers that currently call cursor_raw_token(...) to
instead fetch the bs/be span (without slicing) and pass that span into
origin_child_record, and ensure TABLE_ORIGIN and child_origin checks remain
unchanged.

---

Nitpick comments:
In `@tests/lua/origin_materialize_spec.lua`:
- Around line 37-66: Add two tests that hit the exact thresholds to lock the >24
and >64 semantics: (1) add an "exact 24-byte escaped string" test that builds a
JSON with a string token whose escaped/raw length is exactly 24 bytes and
verifies materialize/encode behaves as it does for values below the large-string
threshold (i.e. not treated as "above 24"); reference the existing pattern in
origin_materialize_spec.lua (use qjson.decode, qjson.materialize, qjson.encode
and modify/inspect fields as in the other string tests). (2) add an "exact
64-byte child container" test that constructs a parent object whose single child
container's encoded size is exactly 64 bytes and verifies materialize/encode
behaves as it does for values below the large-container threshold (i.e. not
treated as "above 64"); follow the existing container tests' style and
assertions so a regression from > to >= would fail.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 943cd06c-4685-4f6a-8eb9-95d0ebe0c801

📥 Commits

Reviewing files that changed from the base of the PR and between 6225ce2 and 7dcfac5.

📒 Files selected for processing (3)
  • README.md
  • lua/qjson/table.lua
  • tests/lua/origin_materialize_spec.lua

Comment thread lua/qjson/table.lua Outdated
@membphis membphis merged commit 2173c1a into main Jun 1, 2026
16 checks passed
@membphis membphis deleted the codex/issue-130-lightweight-origin branch June 1, 2026 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize keep_origin provenance with threshold-based lightweight records

1 participant