[live-migration] adds the saved state proto for the controllers#2711
[live-migration] adds the saved state proto for the controllers#2711rawahars wants to merge 1 commit intomicrosoft:mainfrom
Conversation
4f6f8b5 to
dde22da
Compare
| @@ -0,0 +1,13 @@ | |||
| //go:build windows | |||
|
|
|||
| package snapshot | |||
There was a problem hiding this comment.
Can we please call this a shim saved state. This is not a snapshot which has public implications on meaning. A sandbox snapshot is an offline saved sandbox at a point in time for the orchestrator to resume. This is save/restore I hear you, but we need to call ours something else.
There was a problem hiding this comment.
Updated it to call Payload.
| // NotConfigured). | ||
| // ============================================================================= | ||
|
|
||
| enum VMStage { |
There was a problem hiding this comment.
I feel like I would really rather see that each controller defines its own saved state as a proto Any. And each controller is responsible for handing back what it wants saved on Save and restoring what it wants restored on Restore.
Something like
interface SavedState {
SaveController() -> protoAny
RestoreController(protoAny) -> error
}
Then for each controller who implements it, we have an internal proto.
Lets say SCSI is:
proto ScsiSavedState {
repeated AttachedDisk = 1;
}
message AttachedDisk {
host_path,
config,
ref_counts,
mounts
}
Whatever you want.
But its now very localized.
There was a problem hiding this comment.
Updated to have 1 proto per controller and the state is encapsulated within the controller itself.
dde22da to
5cf517b
Compare
As part of the live migration workflow, we need to checkpoint the current
state of the shim on the source side and restore it on the destination
side. This change introduces the well-typed proto definitions for the
shim state used during Save/Restore.
A new `internal/controller/<component>/save` package is added per
controller, each owning its own versioned `Payload` message:
- migration: top-level envelope carrying VM and per-pod payloads as
google.protobuf.Any so each controller is independently versioned.
- vm: VM-level state (host compute system, resources, etc.).
- pod: sandbox/pod-level state.
- network: network endpoints/namespaces attached to the sandbox.
- process: process/exec state inside containers.
- linuxcontainer: Linux container state.
- device/scsi, device/vpci, device/plan9: per-device attachment state.
Also registers the new `internal/controller` prefix in Protobuild.toml
(and drops the now-unused `internal/vmservice` prefix) so the generated
.pb.go files are produced with the correct import paths.
Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
5cf517b to
7b26263
Compare
Summary
As part of the live migration workflow, the shim needs to checkpoint its
in-memory state on the source side and restore it on the destination side.
This PR introduces the well-typed protobuf definitions for that wire format.
Rather than placing the schema in a single monolithic package, every controller
that owns migratable state gets its own
save/sub-package containing aversioned
Payloadmessage. The top-levelmigrationpackage only owns theouter envelope and wires the per-controller payloads together via
google.protobuf.Any, so each controller is independently versioned and canevolve its schema without touching its peers.
What's in this change
A new
internal/controller/<component>/savepackage is added per controller,each owning its own versioned
Payloadmessage (payload.proto, generatedpayload.pb.go, and a smallconstants.gocarryingSchemaVersionandTypeURL):internal/controller/migration/savegoogle.protobuf.Anyinternal/controller/vm/saveStage,SandboxOptions(incl.ConfidentialConfig), GCS next vsock port, host-emitted compatibility blob, andAny-wrapped sub-controller payloads (SCSI, VPCI, Plan9)internal/controller/pod/saveAny-wrapped network payload, and oneAny-wrapped container payload per containerinternal/controller/network/saveStage, and a per-NICEndpointBindingmapinternal/controller/linuxcontainer/saveStage, IO retry timeout, rootfsLayers+LayerReservations, SCSI/Plan9/VPCI reservation cross-refs, and an exec-ID-keyed map ofAny-wrapped process payloadsinternal/controller/process/saveStage, guest PID, OCI bundle path + spec JSON, exit info, IO retry timeout, and the stdin/stdout/stderr vsock ports used by the IO relayinternal/controller/device/scsi/saveDiskState(DiskConfig+DiskStage+ partition-keyedMountState/MountConfig+MountStage), and a reservation-GUID-keyedReservationmapinternal/controller/device/vpci/saveDeviceState(instance ID, VF index,DeviceStage, ref count)internal/controller/device/plan9/saveno_writable_file_sharespolicy, monotonic name counter, host-path-keyedShareState(ShareConfig+ShareStage+ in-guestMountState/MountStage), and a reservation-GUID → host-path mapThe
internal/controllerprefix is also registered inProtobuild.toml(andthe now-unused
internal/vmserviceprefix dropped) so the generated.pb.gofiles are produced with the correct import paths.
Design notes
Controller-oriented layout. The schema mirrors the shim's existing
controller boundaries: each controller serializes its own bookkeeping into a
dedicated
Payload, and the corresponding controller on the destinationrebuilds its in-memory state from that
Payload. Adding new migratablestate in the future is a localized change to the controller that owns it.
Any-wrapped composition. Parent payloads embed children asgoogle.protobuf.Any(e.g.vm.Payload→ SCSI/VPCI/Plan9;pod.Payload→network + containers;
linuxcontainer.Payload→ processes). Each childfully owns its own schema and
SchemaVersion, so a bump in one controllerdoes not force a coordinated bump across the whole tree.
Per-payload schema versioning. Every
Payloadcarriesschema_versionand everysavepackage exports a matchingSchemaVersionconstant and a stableTypeURL. The destination MUSTreject a payload whose version it does not recognise, so incompatible
producers and consumers fail fast instead of silently corrupting state.
Lifecycle enums. Each
Stage(DiskStage,MountStage,DeviceStage,ShareStage, …) mirrors the corresponding GoStateiotain the controller it represents, with matching zero values so a
default-constructed message maps to the controller's initial state.
Cross-references via stable string IDs. Container payloads reference
device-controller bookkeeping by ID only —
scsi_reservation_idsandLayerReservation.reservation_idindex intoscsi.Payload.reservations,plan9_reservation_idsintoplan9.Payload.reservations, andvpci_vmbus_guidsintovpci.Payload.devices. This keeps device state authoritative in one placeand makes containers cheap to enumerate.
IO continuity.
process.Payloadrecords the stdin/stdout/stderrvsock ports of the GCS↔shim IO relay, and
vm.Payload.gcs_next_portseedsthe destination's port allocator, so IO streams can be reattached on the
same ports without renegotiation.