Pod Affinity and Anti-Affinity
Pods can be constrained to run on specific nodes or under specific circumstances. This can include cases where you want only one application pod running per node or want pods to be paired together on a node. Additionally, when using node affinity pods can have preferred or mandatory restrictions.
For this lesson, we'll focus on inter-pod affinity and anti-affinity by scheduling the checkout-redis
pods to run only one instance per node and by scheduling the checkout
pods to only run one instance of it on nodes where a checkout-redis
pod exists. This will ensure that our caching pods (checkout-redis
) run locally with a checkout
pod instance for best performance.
The first thing we want to do is see that the checkout
and checkout-redis
pods are running:
NAME READY STATUS RESTARTS AGE
checkout-698856df4d-vzkzw 1/1 Running 0 125m
checkout-redis-6cfd7d8787-kxs8r 1/1 Running 0 127m
We can see both applications have one pod running in the cluster. Now, let's find out where they are running:
checkout-698856df4d-vzkzw ip-10-42-11-142.us-west-2.compute.internal
checkout-redis-6cfd7d8787-kxs8r ip-10-42-10-225.us-west-2.compute.internal
Based on the results above, the checkout-698856df4d-vzkzw
pod is running on the ip-10-42-11-142.us-west-2.compute.internal
node and the checkout-redis-6cfd7d8787-kxs8r
pod is running on the ip-10-42-10-225.us-west-2.compute.internal
node.
In your environment the pods may be running on the same node initially
Let's set up a podAffinity
and podAntiAffinity
policy in the checkout deployment to ensure that one checkout
pod runs per node, and that it will only run on nodes where a checkout-redis
pod is already running. We'll use the requiredDuringSchedulingIgnoredDuringExecution
to make this a requirement, rather than a preferred behavior.
The following kustomization adds an affinity
section to the checkout deployment specifying both podAffinity and podAntiAffinity policies:
- Kustomize Patch
- Deployment/checkout
- Diff
apiVersion: apps/v1
kind: Deployment
metadata:
name: checkout
namespace: checkout
spec:
template:
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- redis
topologyKey: kubernetes.io/hostname
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- service
- key: app.kubernetes.io/instance
operator: In
values:
- checkout
topologyKey: kubernetes.io/hostname
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/type: app
name: checkout
namespace: checkout
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: service
app.kubernetes.io/instance: checkout
app.kubernetes.io/name: checkout
template:
metadata:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
labels:
app.kubernetes.io/component: service
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/instance: checkout
app.kubernetes.io/name: checkout
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- redis
topologyKey: kubernetes.io/hostname
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- service
- key: app.kubernetes.io/instance
operator: In
values:
- checkout
topologyKey: kubernetes.io/hostname
containers:
- envFrom:
- configMapRef:
name: checkout
image: public.ecr.aws/aws-containers/retail-store-sample-checkout:0.4.0
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 3
name: checkout
ports:
- containerPort: 8080
name: http
protocol: TCP
resources:
limits:
memory: 512Mi
requests:
cpu: 250m
memory: 512Mi
securityContext:
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
volumeMounts:
- mountPath: /tmp
name: tmp-volume
securityContext:
fsGroup: 1000
serviceAccountName: checkout
volumes:
- emptyDir:
medium: Memory
name: tmp-volume
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/instance: checkout
app.kubernetes.io/name: checkout
spec:
+ affinity:
+ podAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ - labelSelector:
+ matchExpressions:
+ - key: app.kubernetes.io/component
+ operator: In
+ values:
+ - redis
+ topologyKey: kubernetes.io/hostname
+ podAntiAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ - labelSelector:
+ matchExpressions:
+ - key: app.kubernetes.io/component
+ operator: In
+ values:
+ - service
+ - key: app.kubernetes.io/instance
+ operator: In
+ values:
+ - checkout
+ topologyKey: kubernetes.io/hostname
containers:
- envFrom:
- configMapRef:
name: checkout
To make the change, run the following command to modify the checkout deployment in your cluster:
namespace/checkout unchanged
serviceaccount/checkout unchanged
configmap/checkout unchanged
service/checkout unchanged
service/checkout-redis unchanged
deployment.apps/checkout configured
deployment.apps/checkout-redis unchanged
The podAffinity section ensures that a checkout-redis
pod is already running on the node — this is because we can assume the checkout
pod requires checkout-redis
to run correctly. The podAntiAffinity section requires that no checkout
pods are already running on the node by matching the app.kubernetes.io/component=service
label. Now, let's scale up the deployment to check the configuration is working:
Now validate where each pod is running:
checkout-6c7c9cdf4f-p5p6q ip-10-42-10-120.us-west-2.compute.internal
checkout-6c7c9cdf4f-wwkm4
checkout-redis-6cfd7d8787-gw59j ip-10-42-10-120.us-west-2.compute.internal
In this example, the first checkout
pod runs on the same pod as the existing checkout-redis pod, as it fulfills the podAffinity rule we set. The second one is still pending, because the podAntiAffinity rule we defined does not allow two checkout pods to get started on the same node. As the second node doesn't have a checkout-redis
pod running, it will stay pending.
Next, we'll scale the checkout-redis
to two instances for our two nodes, but first let's modify the checkout-redis
deployment policy to spread out our checkout-redis
instances across each node. To do this, we'll simply need to create a podAntiAffinity rule.
- Kustomize Patch
- Deployment/checkout-redis
- Diff
apiVersion: apps/v1
kind: Deployment
metadata:
name: checkout-redis
labels:
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/team: database
spec:
template:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- redis
topologyKey: kubernetes.io/hostname
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/team: database
name: checkout-redis
namespace: checkout
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: redis
app.kubernetes.io/instance: checkout
app.kubernetes.io/name: checkout
template:
metadata:
labels:
app.kubernetes.io/component: redis
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/instance: checkout
app.kubernetes.io/name: checkout
app.kubernetes.io/team: database
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- redis
topologyKey: kubernetes.io/hostname
containers:
- image: public.ecr.aws/docker/library/redis:6.0-alpine
imagePullPolicy: IfNotPresent
name: redis
ports:
- containerPort: 6379
name: redis
protocol: TCP
app.kubernetes.io/instance: checkout
app.kubernetes.io/name: checkout
app.kubernetes.io/team: database
spec:
+ affinity:
+ podAntiAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ - labelSelector:
+ matchExpressions:
+ - key: app.kubernetes.io/component
+ operator: In
+ values:
+ - redis
+ topologyKey: kubernetes.io/hostname
containers:
- image: public.ecr.aws/docker/library/redis:6.0-alpine
imagePullPolicy: IfNotPresent
name: redis
Apply it with the following command:
namespace/checkout unchanged
serviceaccount/checkout unchanged
configmap/checkout unchanged
service/checkout unchanged
service/checkout-redis unchanged
deployment.apps/checkout unchanged
deployment.apps/checkout-redis configured
The podAntiAffinity section requires that no checkout-redis
pods are already running on the node by matching the app.kubernetes.io/component=redis
label.
Check the running pods to verify that there are now two of each running:
NAME READY STATUS RESTARTS AGE
checkout-5b68c8cddf-6ddwn 1/1 Running 0 4m14s
checkout-5b68c8cddf-rd7xf 1/1 Running 0 4m12s
checkout-redis-7979df659-cjfbf 1/1 Running 0 19s
checkout-redis-7979df659-pc6m9 1/1 Running 0 22s
We can also verify where the pods are running to ensure the podAffinity and podAntiAffinity policies are being followed:
checkout-5b68c8cddf-bn8bp ip-10-42-11-142.us-west-2.compute.internal
checkout-5b68c8cddf-clnps ip-10-42-12-31.us-west-2.compute.internal
checkout-redis-7979df659-57xcb ip-10-42-11-142.us-west-2.compute.internal
checkout-redis-7979df659-r7kkm ip-10-42-12-31.us-west-2.compute.internal
All looks good on the pod scheduling, but we can further verify by scaling the checkout
pod again to see where a third pod will deploy:
If we check the running pods we can see that the third checkout
pod has been placed in a Pending state since two of the nodes already have a pod deployed and the third node does not have a checkout-redis
pod running.
NAME READY STATUS RESTARTS AGE
checkout-5b68c8cddf-bn8bp 1/1 Running 0 4m59s
checkout-5b68c8cddf-clnps 1/1 Running 0 6m9s
checkout-5b68c8cddf-lb69n 0/1 Pending 0 6s
checkout-redis-7979df659-57xcb 1/1 Running 0 35s
checkout-redis-7979df659-r7kkm 1/1 Running 0 2m10s
Let's finish this section by removing the Pending pod: