Setting up Over-Provisioning
It's considered a best practice to create appropriate PriorityClass
for your applications. Now, let's create a global default priority class using the field globalDefault:true
. This default PriorityClass
will be assigned pods/deployments that don’t specify a PriorityClassName
.
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: default
value: 0
globalDefault: true
description: "Default Priority class."
We'll also create PriorityClass
that will be assigned to pause pods used for over-provisioning with priority value -1
.
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: pause-pods
value: -1
globalDefault: false
description: "Priority class used by pause-pods for overprovisioning."
Pause pods make sure there are enough nodes that are available based on how much over provisioning is needed for your environment. Keep in mind the —max-size
parameter in ASG (of EKS node group). Cluster Autoscaler won’t increase number of nodes beyond this maximum specified in the ASG
apiVersion: apps/v1
kind: Deployment
metadata:
name: pause-pods
namespace: other
spec:
replicas: 2
selector:
matchLabels:
run: pause-pods
template:
metadata:
labels:
run: pause-pods
spec:
priorityClassName: pause-pods
containers:
- name: reserve-resources
image: registry.k8s.io/pause
resources:
requests:
memory: "6.5Gi"
In this case we're going to schedule a single pause pod requesting 7Gi
of memory, which means it will consume almost an entire m5.large
instance. This will result in us always having 2 "spare" worker nodes available.
Apply the updates to your cluster:
priorityclass.scheduling.k8s.io/default created
priorityclass.scheduling.k8s.io/pause-pods created
deployment.apps/pause-pods created
Once this completes the pause pods will be running:
NAME READY STATUS RESTARTS AGE
pause-pods-7f7669b6d7-v27sl 1/1 Running 0 5m6s
pause-pods-7f7669b6d7-v7hqv 1/1 Running 0 5m6s
An we can see additional nodes have been provisioned by CA:
NAME STATUS ROLES AGE VERSION
ip-10-42-10-159.us-west-2.compute.internal Ready <none> 3d v1.25.6-eks-48e63af
ip-10-42-10-111.us-west-2.compute.internal Ready <none> 33s v1.25.6-eks-48e63af
ip-10-42-10-133.us-west-2.compute.internal Ready <none> 33s v1.25.6-eks-48e63af
ip-10-42-11-143.us-west-2.compute.internal Ready <none> 3d v1.25.6-eks-48e63af
ip-10-42-11-81.us-west-2.compute.internal Ready <none> 3d v1.25.6-eks-48e63af
ip-10-42-12-152.us-west-2.compute.internal Ready <none> 3m11s v1.25.6-eks-48e63af
These two nodes are not running any workloads except for our pause pods, which will be evicted when "real" workloads are scheduled.