Optimize AWS Inferentia utilization with FastAPI and PyTorch models on Amazon EC2 Inf1 & Inf2 instances

Reduce energy consumption of your machine learning workloads by up to 90% with AWS purpose-built accelerators

AWS Inferentia2 builds on AWS Inferentia1 by delivering 4x higher throughput and 10x lower latency