Skip to content

guest: unify pod model for V1, virtual pod, and V2 shim support#2699

Open
shreyanshjain7174 wants to merge 1 commit intomicrosoft:mainfrom
shreyanshjain7174:guest-pod-unification-v2
Open

guest: unify pod model for V1, virtual pod, and V2 shim support#2699
shreyanshjain7174 wants to merge 1 commit intomicrosoft:mainfrom
shreyanshjain7174:guest-pod-unification-v2

Conversation

@shreyanshjain7174
Copy link
Copy Markdown
Contributor

@shreyanshjain7174 shreyanshjain7174 commented Apr 22, 2026

The GCS guest runtime (internal/guest/runtime/hcsv2/uvm.go) tracks virtual pods separately from V1 sandbox containers — a dedicated VirtualPod type, seven exported methods, a parent cgroup manager, and a reverse-lookup map. V1 sandboxes have no pod-level tracking at all. Adding V2 shim support would need a third path.

This collapses all three into one: a private uvmPod type and a single pods map on Host. Every sandbox — V1, virtual pod, or V2 shim — goes through createPodInUVM, which allocates a cgroup under /pods/{sandboxID}. Workload containers nest at /pods/{sandboxID}/{containerID}. Container-to-pod membership is tracked via addContainerToPod. Cleanup in RemoveContainer is a single code path: remove the container from the pod, and when the sandbox container itself is removed, delete the pod's cgroup.

Cgroup hierarchy changes from:

/containers/{id}                         (V1 sandbox)
/containers/virtual-pods/{virtualPodID}  (virtual pod)

to:

/pods/{sandboxID}                        (all pod types)
/pods/{sandboxID}/{containerID}          (workload containers)

Standalone (non-CRI) containers keep their own cgroup at /pods/{id} with no pod entry — same isolation as before, just under the new prefix.

Network namespace teardown for virtual pod sandboxes is preserved: RemoveContainer skips RemoveNetworkNamespace for virtual pod sandbox containers since the host-driven path (TearDownNetworkingRemoveNetNSremoveNIC) handles adapter removal first.

cmd/gcs/main.go replaces the /containers/virtual-pods parent cgroup with /pods and drops the InitializeVirtualPodSupport call.

Tested E2E with both shims:

V1 shim (io.containerd.runhcs.v1) V2 shim (io.containerd.lcow.v2)
OCIBundlePath /run/gcs/c/<podId> /run/gcs/pods/<podId>/<podId>
Pod cgroup /sys/fs/cgroup/memory/pods/<podId> /sys/fs/cgroup/memory/pods/<podId>
/containers/virtual-pods/ absent absent

Comment thread cmd/gcs/main.go Outdated
msg = "memory usage for virtual pods cgroup exceeded threshold"
if strings.HasPrefix(cgName, "/pods") {
msg = "memory usage for pods cgroup exceeded threshold"
} else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this new change, would we ever go through the else condition?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the /containers cgroup entirely — all containers now nest under /pods. Simplified the message check too.

Comment thread cmd/gcs/main.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use containersControl after this change?

Comment thread cmd/gcs/main.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here also.

Comment thread cmd/gcs/main.go Outdated
@@ -430,7 +426,7 @@ func main() {

go readMemoryEvents(startTime, gefdFile, "/gcs", int64(*gcsMemLimitBytes), gcsControl)
go readMemoryEvents(startTime, oomFile, "/containers", containersLimit, containersControl)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too.


type Container struct {
id string
id string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add 1-line concise comment for both ID and sandboxID.

Comment thread internal/guest/runtime/hcsv2/uvm.go Outdated
h.podsMutex.Lock()
if c.sandboxID != "" {
if pod, exists := h.pods[c.sandboxID]; exists {
delete(pod.containers, id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behaviour with standalone containers?
Do we delete the entry from map and cgroup for them too?

// Check for virtual pod annotation
virtualPodID, isVirtualPod := settings.OCISpecification.Annotations[annotations.VirtualPodID]
virtualPodID := settings.OCISpecification.Annotations[annotations.VirtualPodID]
isVirtualPod := virtualPodID != ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not compare the id here with virtualPodID

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do — line 412: if isVirtualPod && id == virtualPodID. That's where we force criType = "sandbox" for the first virtual pod container.

Comment thread internal/guest/runtime/hcsv2/uvm.go Outdated

delete(h.containers, id)

// Extract pod cgroup manager under lock, delete cgroup outside lock to
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presently, we delete the cgroup for virtual pods under lock.
Let's continue that behaviour. It would simplify the code below too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — moved cgroup delete back under the lock and merged into containersMutex.

Comment thread internal/guest/runtime/hcsv2/uvm.go Outdated
if err := h.AddContainerToVirtualPod(id, virtualPodID); err != nil {
return nil, errors.Wrapf(err, "failed to add container %s to virtual pod %s", id, virtualPodID)
// Determine the sandboxID for this container.
sandboxID := id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this logic to the next switch?
Under line 507, we already extract the sandboxID and check it's not empty. We can set c.sandboxID after that.

virtualPodID is set only when criType is container.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved into the switch — each case sets its own sandboxID now.

Comment thread internal/guest/runtime/hcsv2/uvm.go Outdated
entry.WithField("path", vpRootDir).Debug("Removed virtual pod root directory")
}
// addContainerToPod registers a container as belonging to a pod.
func (h *Host) addContainerToPod(sandboxID, containerID string) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we inline this method?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlined.

Replace the separate VirtualPod tracking (dedicated type, 7 exported
methods, parent cgroup manager, reverse-lookup map) with a unified
uvmPod type and a single pods map on Host. All pod types (V1 sandbox,
virtual pod, V2 shim) now go through the same code path:

- createPodInUVM allocates a cgroup under /pods/{sandboxID}
- addContainerToPod tracks container→pod membership
- RemoveContainer handles cleanup uniformly

Cgroup hierarchy changes from:
  /containers/{id}                           (V1 sandbox)
  /containers/virtual-pods/{virtualPodID}    (virtual pod)
to:
  /pods/{sandboxID}                          (all pod types)

Workload containers nest under their pod:
  /pods/{sandboxID}/{containerID}

Signed-off-by: Shreyansh Jain <shreyanshjain7174@gmail.com>
Signed-off-by: Shreyansh Sancheti <shsancheti@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants