guest: unify pod model for V1, virtual pod, and V2 shim support#2699
guest: unify pod model for V1, virtual pod, and V2 shim support#2699shreyanshjain7174 wants to merge 1 commit intomicrosoft:mainfrom
Conversation
62fc02c to
a724fae
Compare
| msg = "memory usage for virtual pods cgroup exceeded threshold" | ||
| if strings.HasPrefix(cgName, "/pods") { | ||
| msg = "memory usage for pods cgroup exceeded threshold" | ||
| } else { |
There was a problem hiding this comment.
With this new change, would we ever go through the else condition?
There was a problem hiding this comment.
Removed the /containers cgroup entirely — all containers now nest under /pods. Simplified the message check too.
There was a problem hiding this comment.
Do we use containersControl after this change?
| @@ -430,7 +426,7 @@ func main() { | |||
|
|
|||
| go readMemoryEvents(startTime, gefdFile, "/gcs", int64(*gcsMemLimitBytes), gcsControl) | |||
| go readMemoryEvents(startTime, oomFile, "/containers", containersLimit, containersControl) | |||
|
|
||
| type Container struct { | ||
| id string | ||
| id string |
There was a problem hiding this comment.
Please add 1-line concise comment for both ID and sandboxID.
| h.podsMutex.Lock() | ||
| if c.sandboxID != "" { | ||
| if pod, exists := h.pods[c.sandboxID]; exists { | ||
| delete(pod.containers, id) |
There was a problem hiding this comment.
What is the behaviour with standalone containers?
Do we delete the entry from map and cgroup for them too?
| // Check for virtual pod annotation | ||
| virtualPodID, isVirtualPod := settings.OCISpecification.Annotations[annotations.VirtualPodID] | ||
| virtualPodID := settings.OCISpecification.Annotations[annotations.VirtualPodID] | ||
| isVirtualPod := virtualPodID != "" |
There was a problem hiding this comment.
We do not compare the id here with virtualPodID
There was a problem hiding this comment.
We do — line 412: if isVirtualPod && id == virtualPodID. That's where we force criType = "sandbox" for the first virtual pod container.
|
|
||
| delete(h.containers, id) | ||
|
|
||
| // Extract pod cgroup manager under lock, delete cgroup outside lock to |
There was a problem hiding this comment.
Presently, we delete the cgroup for virtual pods under lock.
Let's continue that behaviour. It would simplify the code below too.
There was a problem hiding this comment.
Done — moved cgroup delete back under the lock and merged into containersMutex.
| if err := h.AddContainerToVirtualPod(id, virtualPodID); err != nil { | ||
| return nil, errors.Wrapf(err, "failed to add container %s to virtual pod %s", id, virtualPodID) | ||
| // Determine the sandboxID for this container. | ||
| sandboxID := id |
There was a problem hiding this comment.
Can we move this logic to the next switch?
Under line 507, we already extract the sandboxID and check it's not empty. We can set c.sandboxID after that.
virtualPodID is set only when criType is container.
There was a problem hiding this comment.
Moved into the switch — each case sets its own sandboxID now.
| entry.WithField("path", vpRootDir).Debug("Removed virtual pod root directory") | ||
| } | ||
| // addContainerToPod registers a container as belonging to a pod. | ||
| func (h *Host) addContainerToPod(sandboxID, containerID string) { |
There was a problem hiding this comment.
Can we inline this method?
Replace the separate VirtualPod tracking (dedicated type, 7 exported
methods, parent cgroup manager, reverse-lookup map) with a unified
uvmPod type and a single pods map on Host. All pod types (V1 sandbox,
virtual pod, V2 shim) now go through the same code path:
- createPodInUVM allocates a cgroup under /pods/{sandboxID}
- addContainerToPod tracks container→pod membership
- RemoveContainer handles cleanup uniformly
Cgroup hierarchy changes from:
/containers/{id} (V1 sandbox)
/containers/virtual-pods/{virtualPodID} (virtual pod)
to:
/pods/{sandboxID} (all pod types)
Workload containers nest under their pod:
/pods/{sandboxID}/{containerID}
Signed-off-by: Shreyansh Jain <shreyanshjain7174@gmail.com>
Signed-off-by: Shreyansh Sancheti <shsancheti@microsoft.com>
a724fae to
ad3ee5f
Compare
The GCS guest runtime (
internal/guest/runtime/hcsv2/uvm.go) tracks virtual pods separately from V1 sandbox containers — a dedicatedVirtualPodtype, seven exported methods, a parent cgroup manager, and a reverse-lookup map. V1 sandboxes have no pod-level tracking at all. Adding V2 shim support would need a third path.This collapses all three into one: a private
uvmPodtype and a singlepodsmap onHost. Every sandbox — V1, virtual pod, or V2 shim — goes throughcreatePodInUVM, which allocates a cgroup under/pods/{sandboxID}. Workload containers nest at/pods/{sandboxID}/{containerID}. Container-to-pod membership is tracked viaaddContainerToPod. Cleanup inRemoveContaineris a single code path: remove the container from the pod, and when the sandbox container itself is removed, delete the pod's cgroup.Cgroup hierarchy changes from:
to:
Standalone (non-CRI) containers keep their own cgroup at
/pods/{id}with no pod entry — same isolation as before, just under the new prefix.Network namespace teardown for virtual pod sandboxes is preserved:
RemoveContainerskipsRemoveNetworkNamespacefor virtual pod sandbox containers since the host-driven path (TearDownNetworking→RemoveNetNS→removeNIC) handles adapter removal first.cmd/gcs/main.goreplaces the/containers/virtual-podsparent cgroup with/podsand drops theInitializeVirtualPodSupportcall.Tested E2E with both shims:
io.containerd.runhcs.v1)io.containerd.lcow.v2)/run/gcs/c/<podId>/run/gcs/pods/<podId>/<podId>/sys/fs/cgroup/memory/pods/<podId>/sys/fs/cgroup/memory/pods/<podId>/containers/virtual-pods/