Skip to content

[live-migration] adds the saved state proto for the controllers#2711

Open
rawahars wants to merge 1 commit intomicrosoft:mainfrom
rawahars:lm_shim_state_proto
Open

[live-migration] adds the saved state proto for the controllers#2711
rawahars wants to merge 1 commit intomicrosoft:mainfrom
rawahars:lm_shim_state_proto

Conversation

@rawahars
Copy link
Copy Markdown
Contributor

@rawahars rawahars commented Apr 28, 2026

Summary

As part of the live migration workflow, the shim needs to checkpoint its
in-memory state on the source side and restore it on the destination side.
This PR introduces the well-typed protobuf definitions for that wire format.

Rather than placing the schema in a single monolithic package, every controller
that owns migratable state gets its own save/ sub-package containing a
versioned Payload message. The top-level migration package only owns the
outer envelope and wires the per-controller payloads together via
google.protobuf.Any, so each controller is independently versioned and can
evolve its schema without touching its peers.

What's in this change

A new internal/controller/<component>/save package is added per controller,
each owning its own versioned Payload message (payload.proto, generated
payload.pb.go, and a small constants.go carrying SchemaVersion and
TypeURL):

Package Owns
internal/controller/migration/save Top-level envelope; carries the VM payload and one payload per pod as google.protobuf.Any
internal/controller/vm/save VM-level state: HCS VM ID, lifecycle Stage, SandboxOptions (incl. ConfidentialConfig), GCS next vsock port, host-emitted compatibility blob, and Any-wrapped sub-controller payloads (SCSI, VPCI, Plan9)
internal/controller/pod/save Per-pod state: pod ID, GCS pod ID, Any-wrapped network payload, and one Any-wrapped container payload per container
internal/controller/network/save Per-pod network state: namespace ID, PBR flag, guest-namespace support flag, lifecycle Stage, and a per-NIC EndpointBinding map
internal/controller/linuxcontainer/save Linux container state: container/GCS IDs, lifecycle Stage, IO retry timeout, rootfs Layers + LayerReservations, SCSI/Plan9/VPCI reservation cross-refs, and an exec-ID-keyed map of Any-wrapped process payloads
internal/controller/process/save Process / exec state: exec ID, lifecycle Stage, guest PID, OCI bundle path + spec JSON, exit info, IO retry timeout, and the stdin/stdout/stderr vsock ports used by the IO relay
internal/controller/device/scsi/save SCSI sub-controller: controller count, slot-keyed DiskState (DiskConfig + DiskStage + partition-keyed MountState/MountConfig + MountStage), and a reservation-GUID-keyed Reservation map
internal/controller/device/vpci/save VPCI sub-controller: VMBus-GUID-keyed DeviceState (instance ID, VF index, DeviceStage, ref count)
internal/controller/device/plan9/save Plan9 sub-controller: no_writable_file_shares policy, monotonic name counter, host-path-keyed ShareState (ShareConfig + ShareStage + in-guest MountState/MountStage), and a reservation-GUID → host-path map

The internal/controller prefix is also registered in Protobuild.toml (and
the now-unused internal/vmservice prefix dropped) so the generated .pb.go
files are produced with the correct import paths.

Design notes

  • Controller-oriented layout. The schema mirrors the shim's existing
    controller boundaries: each controller serializes its own bookkeeping into a
    dedicated Payload, and the corresponding controller on the destination
    rebuilds its in-memory state from that Payload. Adding new migratable
    state in the future is a localized change to the controller that owns it.

  • Any-wrapped composition. Parent payloads embed children as
    google.protobuf.Any (e.g. vm.Payload → SCSI/VPCI/Plan9; pod.Payload
    network + containers; linuxcontainer.Payload → processes). Each child
    fully owns its own schema and SchemaVersion, so a bump in one controller
    does not force a coordinated bump across the whole tree.

  • Per-payload schema versioning. Every Payload carries
    schema_version and every save package exports a matching
    SchemaVersion constant and a stable TypeURL. The destination MUST
    reject a payload whose version it does not recognise, so incompatible
    producers and consumers fail fast instead of silently corrupting state.

  • Lifecycle enums. Each Stage (DiskStage, MountStage,
    DeviceStage, ShareStage, …) mirrors the corresponding Go State iota
    in the controller it represents, with matching zero values so a
    default-constructed message maps to the controller's initial state.

  • Cross-references via stable string IDs. Container payloads reference
    device-controller bookkeeping by ID only — scsi_reservation_ids and
    LayerReservation.reservation_id index into
    scsi.Payload.reservations, plan9_reservation_ids into
    plan9.Payload.reservations, and vpci_vmbus_guids into
    vpci.Payload.devices. This keeps device state authoritative in one place
    and makes containers cheap to enumerate.

  • IO continuity. process.Payload records the stdin/stdout/stderr
    vsock ports of the GCS↔shim IO relay, and vm.Payload.gcs_next_port seeds
    the destination's port allocator, so IO streams can be reattached on the
    same ports without renegotiation.

@rawahars rawahars requested a review from a team as a code owner April 28, 2026 06:47
@rawahars rawahars force-pushed the lm_shim_state_proto branch from 4f6f8b5 to dde22da Compare April 28, 2026 09:26
@@ -0,0 +1,13 @@
//go:build windows

package snapshot
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please call this a shim saved state. This is not a snapshot which has public implications on meaning. A sandbox snapshot is an offline saved sandbox at a point in time for the orchestrator to resume. This is save/restore I hear you, but we need to call ours something else.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it to call Payload.

// NotConfigured).
// =============================================================================

enum VMStage {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I would really rather see that each controller defines its own saved state as a proto Any. And each controller is responsible for handing back what it wants saved on Save and restoring what it wants restored on Restore.

Something like

interface SavedState {
SaveController() -> protoAny
RestoreController(protoAny) -> error
}

Then for each controller who implements it, we have an internal proto.

Lets say SCSI is:

proto ScsiSavedState {
repeated AttachedDisk = 1;
}

message AttachedDisk {
host_path,
config,
ref_counts,
mounts
}

Whatever you want.

But its now very localized.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to have 1 proto per controller and the state is encapsulated within the controller itself.

@rawahars rawahars force-pushed the lm_shim_state_proto branch from dde22da to 5cf517b Compare May 1, 2026 18:06
@rawahars rawahars changed the title [live-migration] adds the snapshot proto for the shim state [live-migration] adds the saved state proto for the controllers May 1, 2026
As part of the live migration workflow, we need to checkpoint the current
state of the shim on the source side and restore it on the destination
side. This change introduces the well-typed proto definitions for the
shim state used during Save/Restore.

A new `internal/controller/<component>/save` package is added per
controller, each owning its own versioned `Payload` message:

  - migration: top-level envelope carrying VM and per-pod payloads as
    google.protobuf.Any so each controller is independently versioned.
  - vm:        VM-level state (host compute system, resources, etc.).
  - pod:       sandbox/pod-level state.
  - network:   network endpoints/namespaces attached to the sandbox.
  - process:   process/exec state inside containers.
  - linuxcontainer: Linux container state.
  - device/scsi, device/vpci, device/plan9: per-device attachment state.

Also registers the new `internal/controller` prefix in Protobuild.toml
(and drops the now-unused `internal/vmservice` prefix) so the generated
.pb.go files are produced with the correct import paths.

Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
@rawahars rawahars force-pushed the lm_shim_state_proto branch from 5cf517b to 7b26263 Compare May 1, 2026 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants