gcs: keep bridge alive across live-migration transport swap#2771
gcs: keep bridge alive across live-migration transport swap#2771rawahars wants to merge 1 commit into
Conversation
Add SetMigrating / ResumeOnConn on the bridge (plumbed through GuestConnection and Guest) so callers can park the recv/send loops during a UVM migration blackout and swap in the new hvsock without dropping in-flight RPCs. CreateConnection gains a coldStart bool so the migration destination skips the fresh-boot handshake. Drive-bys: shim Stop honours caller ctx, Capabilities is nil-safe, ErrGuestConnectionUnavailable is exported, add session-id/action log fields. Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
| // SetMigrating toggles tolerance of transport-level failures around a | ||
| // live-migration blackout. Explicit [bridge.Close] and the RPC timeout | ||
| // kill still tear the bridge down. | ||
| func (brdg *bridge) SetMigrating(migrating bool) { |
There was a problem hiding this comment.
This seems fine but I dont get why its necessary. Why would we want transport level tolerance only when migrating? I get that this is a local loopback connection so in practice it likely never disconnects but doesnt it seem reasonable to just implement the bridge such that on disconnect its auto paused, and on reconnect it continues? No policy needed ?
There was a problem hiding this comment.
We don’t want to do that under normal circumstances. This is because our shim depends on the invariant that if the bridge collapses then it’s a fatal error and all the Waits are released and thereafter, the workflow goes into teardown mode.
Just during migration, we avoid the same, so that in case of restore on rollback, we can resume over a fresh socket connection.
Add SetMigrating / ResumeOnConn on the bridge (plumbed through GuestConnection and Guest) so callers can park the recv/send loops during a UVM migration blackout and swap in the new hvsock without dropping in-flight RPCs. CreateConnection gains a coldStart bool so the migration destination skips the fresh-boot handshake.
Drive-bys: shim Stop honours caller ctx, Capabilities is nil-safe, ErrGuestConnectionUnavailable is exported, add session-id/action log fields.