Democratizing LLMs: 4-bit Quantization for Optimal LLM Inference

Scaling Down, Scaling Up: Mastering Generative AI with Model Quantization

Tensor Quantization: The Untold Story