Add document-format crate#4234
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the document-format crate, which provides a typed handle (Gdd) for the .gdd document format, decoupling on-disk layout from the editor's in-memory runtime types. It implements codecs, export options, path layouts, and session state persistence. The review feedback highlights several critical issues: a randomized history writing order in append_history_deltas that can cause document load failures, a race condition on WASM/OPFS in open_from_archive due to non-blocking writes, a potential runtime panic on WASM from using block_on in garbage_collect, and redundant loads in stream_entries due to duplicate hashes in hashes_from_store.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
98bc8f2 to
bee1f52
Compare
There was a problem hiding this comment.
5 issues found across 14 files
Confidence score: 2/5
- In
document/document-format/src/lib.rs,add_resource_from_pathcommitsAddResourcebefore bytes are copied, so an I/O failure can leave a persisted hash pointing to missing data; merging as-is risks durable document/resource inconsistency — make the operation atomic (copy/verify first, then persist) or add rollback before merging. - In
document/document-format/src/codec.rs,read_singlecan convert a trailing decodeErrintoExpectedSingle, which hides the real parse failure and can send callers down the wrong recovery path — propagate the second-item decode error before returning the multi-value error. - In
document/document-format/src/export.rs, using two independent booleans means an invalid “all excluded” configuration is representable until runtimevalidate(), so bad states can slip through call sites and fail late — replace this with a type that makes invalid combinations unrepresentable (or enforce construction-time validation) before merge. - In
document/document-format/src/io.rs,write_singlereturnsReadErrorwith unreachable variants likeNotFound, which can mislead downstream error handling and obscure real write failure semantics — split read/write error types (or rename/refine) to match actual write behavior.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="document/document-format/src/export.rs">
<violation number="1" location="document/document-format/src/export.rs:1">
P3: Module doc says "re-encoding" but ExportFormat docs explicitly state payloads keep their codecs and export does not re-encode — terminology contradiction</violation>
<violation number="2" location="document/document-format/src/export.rs:22">
P2: Invalid states are representable — two independent bools with runtime validate() instead of a type that makes all-excluded impossible at compile time</violation>
</file>
<file name="document/document-format/src/codec.rs">
<violation number="1" location="document/document-format/src/codec.rs:116">
P2: `read_single` masks trailing decode errors as `ExpectedSingle`. Propagate a second-item `Err` before returning the multi-value error.</violation>
</file>
<file name="document/document-format/src/io.rs">
<violation number="1" location="document/document-format/src/io.rs:56">
P2: ReadError is used for write_single's return type but includes a NotFound variant unreachable from writes. This misleads callers into handling a never-produced error. Rename or use a separate error type for writes.</violation>
</file>
<file name="document/document-format/src/lib.rs">
<violation number="1" location="document/document-format/src/lib.rs:583">
P1: `add_resource_from_path` persists `AddResource` before copying bytes, allowing committed dangling resource hashes on I/O failure.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| let hash = ResourceHash::from(bytes); | ||
|
|
||
| let hot_ops = self.session.stage_embedded_resource(id, hash)?; | ||
| self.append_and_retire(&hot_ops, false)?; |
There was a problem hiding this comment.
P1: add_resource_from_path persists AddResource before copying bytes, allowing committed dangling resource hashes on I/O failure.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/document-format/src/lib.rs, line 583:
<comment>`add_resource_from_path` persists `AddResource` before copying bytes, allowing committed dangling resource hashes on I/O failure.</comment>
<file context>
@@ -0,0 +1,964 @@
+ let hash = ResourceHash::from(bytes);
+
+ let hot_ops = self.session.stage_embedded_resource(id, hash)?;
+ self.append_and_retire(&hot_ops, false)?;
+
+ self.working.write_non_blocking(&self.layout.resource_path(&hash), bytes)?;
</file context>
| @@ -0,0 +1,49 @@ | |||
| //! Export options. Walking the working copy through an archive codec / re-encoding payloads. | |||
There was a problem hiding this comment.
P2: Invalid states are representable — two independent bools with runtime validate() instead of a type that makes all-excluded impossible at compile time
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/document-format/src/export.rs, line 22:
<comment>Invalid states are representable — two independent bools with runtime validate() instead of a type that makes all-excluded impossible at compile time</comment>
<file context>
@@ -0,0 +1,49 @@
+ /// for VCS workflows where the diffable `history.jsonl` is the interesting payload and the
+ /// registry would rewrite whole-file on every retirement. Consumers replay history from an
+ /// empty registry.
+ pub include_registry: bool,
+ /// Whether to include `history.jsonl`. `false` produces a flat snapshot (registry only),
+ /// useful for sharing without revealing edit history and for cutting file size.
</file context>
| pub fn read_single<T: DeserializeOwned>(self, bytes: &[u8]) -> Result<T, CodecError> { | ||
| let mut iter = self.iter::<T>(bytes); | ||
| let first = iter.next().ok_or(CodecError::Empty)??; | ||
| if iter.next().is_some() { |
There was a problem hiding this comment.
P2: read_single masks trailing decode errors as ExpectedSingle. Propagate a second-item Err before returning the multi-value error.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/document-format/src/codec.rs, line 116:
<comment>`read_single` masks trailing decode errors as `ExpectedSingle`. Propagate a second-item `Err` before returning the multi-value error.</comment>
<file context>
@@ -0,0 +1,324 @@
+ pub fn read_single<T: DeserializeOwned>(self, bytes: &[u8]) -> Result<T, CodecError> {
+ let mut iter = self.iter::<T>(bytes);
+ let first = iter.next().ok_or(CodecError::Empty)??;
+ if iter.next().is_some() {
+ return Err(CodecError::ExpectedSingle);
+ }
</file context>
|
|
||
| /// Encode `value` with `codec` and write to `{basename}.{ext}`. Synchronous: the write goes through | ||
| /// the container's sync write surface (durable on folder/memory, enqueued on OPFS). | ||
| pub fn write_single<T: Serialize>(container: &AnyContainer, basename: &str, codec: Codec, value: &T) -> Result<(), ReadError> { |
There was a problem hiding this comment.
P2: ReadError is used for write_single's return type but includes a NotFound variant unreachable from writes. This misleads callers into handling a never-produced error. Rename or use a separate error type for writes.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/document-format/src/io.rs, line 56:
<comment>ReadError is used for write_single's return type but includes a NotFound variant unreachable from writes. This misleads callers into handling a never-produced error. Rename or use a separate error type for writes.</comment>
<file context>
@@ -0,0 +1,60 @@
+
+/// Encode `value` with `codec` and write to `{basename}.{ext}`. Synchronous: the write goes through
+/// the container's sync write surface (durable on folder/memory, enqueued on OPFS).
+pub fn write_single<T: Serialize>(container: &AnyContainer, basename: &str, codec: Codec, value: &T) -> Result<(), ReadError> {
+ let bytes = codec.write_single(value)?;
+ container.write_non_blocking(&path_for(basename, codec), &bytes)?;
</file context>
| @@ -0,0 +1,49 @@ | |||
| //! Export options. Walking the working copy through an archive codec / re-encoding payloads. | |||
There was a problem hiding this comment.
P3: Module doc says "re-encoding" but ExportFormat docs explicitly state payloads keep their codecs and export does not re-encode — terminology contradiction
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/document-format/src/export.rs, line 1:
<comment>Module doc says "re-encoding" but ExportFormat docs explicitly state payloads keep their codecs and export does not re-encode — terminology contradiction</comment>
<file context>
@@ -0,0 +1,49 @@
+//! Export options. Walking the working copy through an archive codec / re-encoding payloads.
+//! Implementation lives on [`crate::Gdd::export`].
+
</file context>
| //! Export options. Walking the working copy through an archive codec / re-encoding payloads. | |
| //! Export options. Walking the working copy through an archive codec, keeping payloads as-is. |
No description provided.