新增 LLM 量化章节,新增 9 篇相关的论文:
《Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference》
《Mixed Precision Training》
《The case for 4-bit precision: k-bit Inference Scaling Laws》
《SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models》
《LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale》
《ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers》
《SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot》
《GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers》
《LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models》
详细内容参考:
https://www.huaxiaozhuan.com/--
FROM 112.64.61.*