Summary
This issue tracks the KEP for adding a consolidation-aware scale-in heuristic to the ReplicaSet controller's pod deletion algorithm.
When the ConsolidatingScaleDown feature gate is enabled, the ReplicaSet controller prefers deleting pods on nodes with fewer total active pods during scale-down, enabling workload consolidation onto fewer nodes. It also respects do-not-disrupt signals so that pods on protected nodes are deprioritized for deletion.
Motivation
The current ReplicaSet scale-down algorithm prefers deleting pods on nodes with more colocated replicas (a spreading heuristic). While this promotes even distribution, it actively works against node consolidation. Node autoscalers (Karpenter, cluster-autoscaler) reclaim nodes when workloads consolidate during scale-down, but the spreading heuristic distributes deletions evenly, leaving every node partially occupied which makes consolidating nodes more disruptive.
KEP-2255 (PodDeletionCost) provides a mechanism via annotations, but requires an external controller to continuously update annotations before scale-down events — operationally complex and API-server intensive.
Key Design Points
- Feature gate:
ConsolidatingScaleDown (kube-controller-manager)
- Owning SIG: sig-apps
- Complementary to KEP-2255 (PodDeletionCost) — both mechanisms coexist, with PodDeletionCost taking precedence in the existing sort order
- Zero overhead when disabled — conditional node informer initialization
Related
KEP
TBD
/sig apps
Summary
This issue tracks the KEP for adding a consolidation-aware scale-in heuristic to the ReplicaSet controller's pod deletion algorithm.
When the
ConsolidatingScaleDownfeature gate is enabled, the ReplicaSet controller prefers deleting pods on nodes with fewer total active pods during scale-down, enabling workload consolidation onto fewer nodes. It also respects do-not-disrupt signals so that pods on protected nodes are deprioritized for deletion.Motivation
The current ReplicaSet scale-down algorithm prefers deleting pods on nodes with more colocated replicas (a spreading heuristic). While this promotes even distribution, it actively works against node consolidation. Node autoscalers (Karpenter, cluster-autoscaler) reclaim nodes when workloads consolidate during scale-down, but the spreading heuristic distributes deletions evenly, leaving every node partially occupied which makes consolidating nodes more disruptive.
KEP-2255 (PodDeletionCost) provides a mechanism via annotations, but requires an external controller to continuously update annotations before scale-down events — operationally complex and API-server intensive.
Key Design Points
ConsolidatingScaleDown(kube-controller-manager)Related
pod-deletion-costkubernetes#123541KEP
TBD
/sig apps