Skip to content

Add document-format crate#4234

Open
TrueDoctor wants to merge 2 commits into
masterfrom
upstream-document-format
Open

Add document-format crate#4234
TrueDoctor wants to merge 2 commits into
masterfrom
upstream-document-format

Conversation

@TrueDoctor

Copy link
Copy Markdown
Member

No description provided.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the document-format crate, which provides a typed handle (Gdd) for the .gdd document format, decoupling on-disk layout from the editor's in-memory runtime types. It implements codecs, export options, path layouts, and session state persistence. The review feedback highlights several critical issues: a randomized history writing order in append_history_deltas that can cause document load failures, a race condition on WASM/OPFS in open_from_archive due to non-blocking writes, a potential runtime panic on WASM from using block_on in garbage_collect, and redundant loads in stream_entries due to duplicate hashes in hashes_from_store.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread document/document-format/src/lib.rs
Comment thread document/document-format/src/lib.rs
Comment thread document/document-format/src/lib.rs
Comment thread document/document-format/src/lib.rs
@TrueDoctor TrueDoctor force-pushed the upstream-document-format branch from 98bc8f2 to bee1f52 Compare June 16, 2026 14:03

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 14 files

Confidence score: 2/5

  • In document/document-format/src/lib.rs, add_resource_from_path commits AddResource before bytes are copied, so an I/O failure can leave a persisted hash pointing to missing data; merging as-is risks durable document/resource inconsistency — make the operation atomic (copy/verify first, then persist) or add rollback before merging.
  • In document/document-format/src/codec.rs, read_single can convert a trailing decode Err into ExpectedSingle, which hides the real parse failure and can send callers down the wrong recovery path — propagate the second-item decode error before returning the multi-value error.
  • In document/document-format/src/export.rs, using two independent booleans means an invalid “all excluded” configuration is representable until runtime validate(), so bad states can slip through call sites and fail late — replace this with a type that makes invalid combinations unrepresentable (or enforce construction-time validation) before merge.
  • In document/document-format/src/io.rs, write_single returns ReadError with unreachable variants like NotFound, which can mislead downstream error handling and obscure real write failure semantics — split read/write error types (or rename/refine) to match actual write behavior.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="document/document-format/src/export.rs">

<violation number="1" location="document/document-format/src/export.rs:1">
P3: Module doc says "re-encoding" but ExportFormat docs explicitly state payloads keep their codecs and export does not re-encode — terminology contradiction</violation>

<violation number="2" location="document/document-format/src/export.rs:22">
P2: Invalid states are representable — two independent bools with runtime validate() instead of a type that makes all-excluded impossible at compile time</violation>
</file>

<file name="document/document-format/src/codec.rs">

<violation number="1" location="document/document-format/src/codec.rs:116">
P2: `read_single` masks trailing decode errors as `ExpectedSingle`. Propagate a second-item `Err` before returning the multi-value error.</violation>
</file>

<file name="document/document-format/src/io.rs">

<violation number="1" location="document/document-format/src/io.rs:56">
P2: ReadError is used for write_single's return type but includes a NotFound variant unreachable from writes. This misleads callers into handling a never-produced error. Rename or use a separate error type for writes.</violation>
</file>

<file name="document/document-format/src/lib.rs">

<violation number="1" location="document/document-format/src/lib.rs:583">
P1: `add_resource_from_path` persists `AddResource` before copying bytes, allowing committed dangling resource hashes on I/O failure.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

let hash = ResourceHash::from(bytes);

let hot_ops = self.session.stage_embedded_resource(id, hash)?;
self.append_and_retire(&hot_ops, false)?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: add_resource_from_path persists AddResource before copying bytes, allowing committed dangling resource hashes on I/O failure.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/document-format/src/lib.rs, line 583:

<comment>`add_resource_from_path` persists `AddResource` before copying bytes, allowing committed dangling resource hashes on I/O failure.</comment>

<file context>
@@ -0,0 +1,964 @@
+		let hash = ResourceHash::from(bytes);
+
+		let hot_ops = self.session.stage_embedded_resource(id, hash)?;
+		self.append_and_retire(&hot_ops, false)?;
+
+		self.working.write_non_blocking(&self.layout.resource_path(&hash), bytes)?;
</file context>

@@ -0,0 +1,49 @@
//! Export options. Walking the working copy through an archive codec / re-encoding payloads.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Invalid states are representable — two independent bools with runtime validate() instead of a type that makes all-excluded impossible at compile time

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/document-format/src/export.rs, line 22:

<comment>Invalid states are representable — two independent bools with runtime validate() instead of a type that makes all-excluded impossible at compile time</comment>

<file context>
@@ -0,0 +1,49 @@
+	/// for VCS workflows where the diffable `history.jsonl` is the interesting payload and the
+	/// registry would rewrite whole-file on every retirement. Consumers replay history from an
+	/// empty registry.
+	pub include_registry: bool,
+	/// Whether to include `history.jsonl`. `false` produces a flat snapshot (registry only),
+	/// useful for sharing without revealing edit history and for cutting file size.
</file context>

pub fn read_single<T: DeserializeOwned>(self, bytes: &[u8]) -> Result<T, CodecError> {
let mut iter = self.iter::<T>(bytes);
let first = iter.next().ok_or(CodecError::Empty)??;
if iter.next().is_some() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: read_single masks trailing decode errors as ExpectedSingle. Propagate a second-item Err before returning the multi-value error.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/document-format/src/codec.rs, line 116:

<comment>`read_single` masks trailing decode errors as `ExpectedSingle`. Propagate a second-item `Err` before returning the multi-value error.</comment>

<file context>
@@ -0,0 +1,324 @@
+	pub fn read_single<T: DeserializeOwned>(self, bytes: &[u8]) -> Result<T, CodecError> {
+		let mut iter = self.iter::<T>(bytes);
+		let first = iter.next().ok_or(CodecError::Empty)??;
+		if iter.next().is_some() {
+			return Err(CodecError::ExpectedSingle);
+		}
</file context>


/// Encode `value` with `codec` and write to `{basename}.{ext}`. Synchronous: the write goes through
/// the container's sync write surface (durable on folder/memory, enqueued on OPFS).
pub fn write_single<T: Serialize>(container: &AnyContainer, basename: &str, codec: Codec, value: &T) -> Result<(), ReadError> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: ReadError is used for write_single's return type but includes a NotFound variant unreachable from writes. This misleads callers into handling a never-produced error. Rename or use a separate error type for writes.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/document-format/src/io.rs, line 56:

<comment>ReadError is used for write_single's return type but includes a NotFound variant unreachable from writes. This misleads callers into handling a never-produced error. Rename or use a separate error type for writes.</comment>

<file context>
@@ -0,0 +1,60 @@
+
+/// Encode `value` with `codec` and write to `{basename}.{ext}`. Synchronous: the write goes through
+/// the container's sync write surface (durable on folder/memory, enqueued on OPFS).
+pub fn write_single<T: Serialize>(container: &AnyContainer, basename: &str, codec: Codec, value: &T) -> Result<(), ReadError> {
+	let bytes = codec.write_single(value)?;
+	container.write_non_blocking(&path_for(basename, codec), &bytes)?;
</file context>

@@ -0,0 +1,49 @@
//! Export options. Walking the working copy through an archive codec / re-encoding payloads.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Module doc says "re-encoding" but ExportFormat docs explicitly state payloads keep their codecs and export does not re-encode — terminology contradiction

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/document-format/src/export.rs, line 1:

<comment>Module doc says "re-encoding" but ExportFormat docs explicitly state payloads keep their codecs and export does not re-encode — terminology contradiction</comment>

<file context>
@@ -0,0 +1,49 @@
+//! Export options. Walking the working copy through an archive codec / re-encoding payloads.
+//! Implementation lives on [`crate::Gdd::export`].
+
</file context>
Suggested change
//! Export options. Walking the working copy through an archive codec / re-encoding payloads.
//! Export options. Walking the working copy through an archive codec, keeping payloads as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant