Inference with AWS Inferentia
Prepare your environment for this section:
This will make the following changes to your lab environment:
- Installs Karpenter in the Amazon EKS cluster
- Creates an S3 Bucket to store results
- Creates an IAM Role for the Pods to use
- Installs the AWS Neuron device plugin
You can view the Terraform that applies these changes here.
AWS Inferentia is the purpose-built accelerator designed to accelerate deep learning workloads.
Inferentia has processing cores called Neuron Cores, which have high-speed access to models stored in on-chip memory.
You can easily use the accelerator on EKS. The Neuron device plugin exposes Neuron cores and devices to Kubernetes as a resource. When your workloads require Neuron cores, the Kubernetes scheduler can assign the Inferentia node to the workloads. You can even provision the node automatically using Karpenter.
This lab provides a tutorial on how to use Inferentia to accelerate deep learning inference workloads on EKS. In this lab we will:
- Compile a ResNet-50 pre-trained model for use with AWS Inferentia
- Upload this model to an S3 Bucket for later use
- Create a Karpenter Provisioner to provision Inferentia EC2 instances
- Launch an inference Pod that uses our previous model to run our inference against
Let's get started.