Reducing the Size of AI Models

Deploying Large Language Models: vLLM and QuantizationStep by Step Guide on How to Accelerate…

Quantizing the AI Colossi

Improving LLM Inference Latency on CPUs with Model Quantization

Exploring “Small” Vision-Language Models with TinyGPT-V

ExLlamaV2: The Fastest Library to Run LLMs

Run Llama 2 70B on Your GPU with ExLlamaV2

Tensor Quantization: The Untold Story

Quantize Llama models with GGML and llama.cpp

GPTQ or bitsandbytes: Which Quantization Method to Use for LLMs — Examples with Llama 2