Accelerate your generative AI distributed training workloads with the NVIDIA NeMo Framework on Amazon EKS aws.amazon.com Post date July 16, 2024 No Comments on Accelerate your generative AI distributed training workloads with the NVIDIA NeMo Framework on Amazon EKS External Tags Amazon Elastic Kubernetes Service, artificial-intelligence, best-practices, distributed training, generative-ai, Technical How-to
End-to-end LLM training on instance clusters with over 100 nodes using AWS Trainium aws.amazon.com Post date May 29, 2024 No Comments on End-to-end LLM training on instance clusters with over 100 nodes using AWS Trainium External Tags Amazon EC2, AWS Neuron, AWS Trainium, best-practices, distributed training, Neuron, Technical How-to
Amazon SageMaker model parallel library now accelerates PyTorch FSDP workloads by up to 20% aws.amazon.com Post date December 22, 2023 No Comments on Amazon SageMaker model parallel library now accelerates PyTorch FSDP workloads by up to 20% External Tags Amazon SageMaker, Announcements, distributed training, generative-ai
Fast and cost-effective LLaMA 2 fine-tuning with AWS Trainium aws.amazon.com Post date October 5, 2023 No Comments on Fast and cost-effective LLaMA 2 fine-tuning with AWS Trainium External Tags artificial-intelligence, AWS Trainium, distributed training, DL Training, generative-ai, Intermediate (200)