Reducing the Size of AI Models

Applied LLM Quantisation with AWS Sagemaker | Analytics.gov

Democratizing LLMs: 4-bit Quantization for Optimal LLM Inference

Scaling Down, Scaling Up: Mastering Generative AI with Model Quantization

Tensor Quantization: The Untold Story