Run Inference on an AWS Inferentia Node using Amazon EKS
Now we can use the compiled model to run an inference workload on an AWS Inferentia node.
Install Device Plugin for AWS Inferentia
In order for our DLC to use the Neuron cores they need to be exposed. The Neuron device plugin Kubernetes manifest files expose the Neuron cores to the DLC. These manifest files have been pre-installed into the EKS Cluster.
When a Pod requires the exposed Neuron cores, the Kubernetes scheduler can provision an Inferentia node to schedule the Pod to. This is the Pod that we will schedule. Note that we have a resource requirement of aws.amazon.com/neuron
.
apiVersion: v1
kind: Pod
metadata:
name: inference
namespace: aiml
labels:
role: inference
spec:
containers:
- command:
- sh
- -c
- sleep infinity
image: ${AIML_DL_IMAGE}
name: inference
resources:
limits:
aws.amazon.com/neuron: 1
serviceAccountName: inference
Set up a provisioner of Karpenter for launching a node which has the Inferentia chip
The lab uses Karpenter to provision an Inferentia node. Karpenter can detect the pending pod which requires Neuron cores and launch an inf1 instance which has the required Neuron cores.
You can learn more about Karpenter in the Karpenter module that's provided in this workshop.
Karpenter has been installed in our EKS cluster, and runs as a deployment:
NAME READY UP-TO-DATE AVAILABLE AGE
karpenter 1/1 1 1 5m52s
The only setup that we will need to do is to update our EKS IAM mappings to allow Karpenter nodes to join the cluster:
Karpenter requires a provisioner to provision nodes. This is the Karpenter provisioner that we will create:
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: aiml
spec:
labels:
type: karpenter
requirements:
- key: "karpenter.sh/capacity-type"
operator: In
values: ["on-demand"]
- key: "karpenter.k8s.aws/instance-family"
operator: In
values: [inf1]
providerRef:
name: aiml
ttlSecondsAfterEmpty: 30
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: aiml
spec:
subnetSelector:
aws-ids: ${AIML_SUBNETS}
securityGroupSelector:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
# Increase the ephemeral storage for hosting a large Deep Learning Container
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
deleteOnTermination: true
tags:
app.kubernetes.io/created-by: eks-workshop
Apply the provisioner manifest:
Create a pod for inference
Now we can deploy a Pod for inference:
Karpenter detects the pending pod which needs Neuron cores and launches an inf1 instance which has the Inferentia chip. Monitor the instance provisioning with the following command:
2022-10-28T08:24:42.704Z DEBUG controller.provisioning.cloudprovider Created launch template, Karpenter-eks-workshop-cluster-3507260904097783831 {"commit": "37c8653", "provisioner": "default"}
2022-10-28T08:24:45.125Z INFO controller.provisioning.cloudprovider Launched instance: i-09ddba6280017ae4d, hostname: ip-100-64-10-250.ap-northeast-1.compute.internal, type: inf1.xlarge, zone: ap-northeast-1a, capacityType: spot {"commit": "37c8653", "provisioner": "default"}
2022-10-28T08:24:45.136Z INFO controller.provisioning Created node with 1 pods requesting {"aws.amazon.com/neuron":"1","cpu":"125m","pods":"6"} from types inf1.xlarge, inf1.2xlarge, inf1.6xlarge, inf1.24xlarge {"commit": "37c8653", "provisioner": "default"}
2022-10-28T08:24:45.136Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "37c8653"}
The inference pod should be scheduled on the node provisioned by Karpenter. Check if the Pod is in it's ready state:
It can take up to 8 minutes to provision the node, add it to the EKS cluster, and start the pod.
We can use the following command to get more details on the node that was provisioned to schedule our pod onto:
This output shows the capacity this node has:
{
"attachable-volumes-aws-ebs": "39",
"aws.amazon.com/neuron": "1",
"aws.amazon.com/neuroncore": "4",
"aws.amazon.com/neurondevice": "1",
"cpu": "4",
"ephemeral-storage": "104845292Ki",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "7832960Ki",
"pods": "38",
"vpc.amazonaws.com/pod-eni": "38"
}
We can see that this node as a aws.amazon.com/neuron
of 1. Karpenter provisioned this node for us as that's how many neuron the pod requested.
Run an inference
This is the code that we will be using to run inference using a Neuron core on Inferentia:
import os
import time
import torch
import torch_neuron
import json
import numpy as np
from urllib import request
from torchvision import models, transforms, datasets
## Create an image directory containing a small kitten
os.makedirs("./torch_neuron_test/images", exist_ok=True)
request.urlretrieve("https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg",
"./torch_neuron_test/images/kitten_small.jpg")
## Fetch labels to output the top classifications
request.urlretrieve("https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json","imagenet_class_index.json")
idx2label = []
with open("imagenet_class_index.json", "r") as read_file:
class_idx = json.load(read_file)
idx2label = [class_idx[str(k)][1] for k in range(len(class_idx))]
## Import a sample image and normalize it into a tensor
normalize = transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
eval_dataset = datasets.ImageFolder(
os.path.dirname("./torch_neuron_test/"),
transforms.Compose([
transforms.Resize([224, 224]),
transforms.ToTensor(),
normalize,
])
)
image, _ = eval_dataset[0]
image = torch.tensor(image.numpy()[np.newaxis, ...])
## Load model
model_neuron = torch.jit.load( 'resnet50_neuron.pt' )
## Predict
results = model_neuron( image )
# Get the top 5 results
top5_idx = results[0].sort()[1][-5:]
# Lookup and print the top 5 labels
top5_labels = [idx2label[idx] for idx in top5_idx]
print("Top 5 labels:\n {}".format(top5_labels) )
This Python code does the following tasks:
- It downloads and stores an image of a small kitten.
- It fetches the labels for classifying the image.
- It then imports this image and normalizes it into a tensor.
- It loads our previously created model.
- It runs the prediction on our small kitten image.
- It gets the top 5 results from the prediction and prints these to the command-line.
We copy this code to the Pod, download our previously uploaded model, and run the code:
Top 5 labels:
['tiger', 'lynx', 'tiger_cat', 'Egyptian_cat', 'tabby']
As output we get the top 5 labels back. We are running the inference on an image of a small kitten using ResNet-50's pre-trained model, so these results are expected. As a possible next step to improve performance we could create our own data set of images and train our own model for our specific use case. This could improve our prediction results.
This concludes this lab on using AWS Inferentia with Amazon EKS.