[SEA-NodeJS] Sync execute: eager cancellable handle + close-drives commit; accept-and-ignore unsupported options#419
Conversation
8bb7def to
e81bf9b
Compare
| // cancellation of a long query is done via `runAsync: true` (below). | ||
| // Errors are swallowed here and re-surfaced at first fetch/finished to | ||
| // preserve the existing error-timing contract. | ||
| await op.waitUntilReady().catch(() => undefined); |
There was a problem hiding this comment.
Swallowed errors are not re-surfaced through finished() / status() as the comment claims. Line 211-212's comment says "Errors are swallowed here and re-surfaced at first fetch/finished to preserve the existing error-timing contract." However, this is only true for the fetch path. When waitUntilReady() fails (e.g. a SQL syntax error, kernel failure), it sets fetchHandlePromise to a rejected promise. On subsequent calls through finished() → waitUntilReadyThroughBackend() → backend.waitUntilReady() → waitUntilReadyCancellable(), the early-return guard at SeaOperationBackend.ts:541 (if (this.fetchHandlePromise) { return; }) fires — it checks truthiness of the promise, not whether it resolved or rejected, so it returns successfully without re-throwing the error. This means: (1) finished() silently succeeds for a failed sync query, (2) status() reports Succeeded (line 389 uses the same truthiness check). The fetch path (fetchChunk → getResultSlicer → getFetchHandle) does re-surface the error because it awaits the rejected promise, but callers using finished() to confirm DDL/DML completion — the exact fire-and-forget pattern this PR is designed to support — will miss the failure.
💡 Suggested Fix: In waitUntilReadyCancellable, instead of only checking whether fetchHandlePromise is set, await it so that a rejected promise re-throws. Alternatively, track a separate fetchError field and re-throw it in the early-return path. The status() method similarly needs to distinguish a resolved vs rejected fetchHandlePromise for the sync path (e.g. by storing a terminalState field set in the .then() / .catch() of getFetchHandle).
…mmit; accept-and-ignore unsupported options The default sync path (runAsync:false) now returns the operation handle immediately and kicks off the inline-materialise result() in the background, instead of blocking executeStatement until the statement is terminal. This restores mid-run cancellation on the sync path WITHOUT the runAsync submit/poll/refetch latency tax: the kernel execute() publishes the statement id mid-flight, so a concurrent op.cancel() interrupts the running execute via the StatementCanceller. Fire-and-forget DDL/DML (execute then close without a fetch) and dependent-statement ordering still "just work": the operation backend's close() drives the same memoised result() to terminal before releasing (committing the statement), UNLESS the op was cancelled — in which case it releases without committing. Early-window cancel correctness (cancel issued before the server id is published) depends on the kernel StatementCanceller intent-hold (databricks/databricks-sql-kernel#133): the canceller holds the cancel intent until the id appears, then dispatches the real CancelStatement, so there is no orphaned server statement. No napi surface change. Also retains the option no-ops: accept-and-ignore for unsupported per-statement options (useCloudFetch / useLZ4Compression / stagingAllowedLocalPath) and queryTimeout (no-op + TODO), matching the Python kernel client rather than hard-failing the connection. Validated e2e (pecotesting): CREATE/INSERT fire-and-forget commit, dependent ordering, 100k read, mid-run cancel (late + early intent-hold window). Unit: SEA suite 250 passing. Latency vs Thrift: SELECT 1 ~parity (-2%..+8%), 100k rows -16%..-20% (faster). Concurrency: SELECT 1 conc=40 -19%, 100k conc=10 -7% (faster). Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
e81bf9b to
73a8918
Compare
|
Superseded by the directResults approach. Instead of returning the handle eagerly (which forced close-drives), the kernel now exposes a directResults execute ( |
What
Reworks the SEA/kernel sync execute path (
runAsync: false, the default) and folds in the per-statement option no-ops. Three coupled behaviors:Eager cancellable handle.
executeStatementreturns the operation handle immediately and kicks off the inline-materialiseresult()in the background, instead of blocking until the statement is terminal. This restores mid-run cancellation on the sync path without therunAsyncsubmit/poll/refetch latency tax — the kernelexecute()publishes the statement id mid-flight, so a concurrentop.cancel()interrupts the running execute via theStatementCanceller.fetchAll()awaits the same memoisedresult(), so small queries stay inline/fast.close-drives (fire-and-forget commit). Fire-and-forget DDL/DML (
executethenclosewith no fetch) and dependent-statement ordering still work: the operation backend'sclose()drives the same memoisedresult()to terminal before releasing (committing the statement) — unless the op was cancelled, in which case it releases without committing. This is what fixes the earlier "CREATE didn't run" issue.Option no-ops. Accept-and-ignore for unsupported per-statement options (
useCloudFetch/useLZ4Compression/stagingAllowedLocalPath) andqueryTimeout(no-op + TODO), matching the Python kernel client instead of hard-failing the connection.Coupling
Early-window cancel correctness (cancel issued before the server id is published) depends on the kernel
StatementCancellerintent-hold in databricks/databricks-sql-kernel#133 — the canceller holds the cancel intent until the id appears, then dispatches the realCancelStatement, so there is no orphaned server statement. No napi/API surface change;cancel()still returnsbool.Testing
Unit: SEA suite 250 passing (the two close-semantics tests were updated to the new fire-and-forget-commit contract, plus a new cancelled-op-does-not-commit case).
eslintclean.e2e (pecotesting, QRC off, cache-busted):
CancelStatement(OperationStateError) ✓CancelStatement✓Latency vs Thrift (interleaved, median):
SELECT 1: ~parity (−2% … +8% across runs — single-digit % on a sub-200ms query)SELECT … range(100000): −16% … −20% (faster)Concurrency vs Thrift (wall-clock):
SELECT 1conc=20: +9%; conc=40: −19% (faster)No latency regression large enough to matter: tiny-query overhead is within noise, real result sets and most concurrency points are faster.
This pull request and its description were written by Isaac.