GPTQ: Accurate post-training compression for generative pretrained transformers E Frantar, S Ashkboos, T Hoefler, D Alistarh arXiv preprint arXiv:2210.17323, 2022 | 141* | 2022 |
Sparsegpt: Massive language models can be accurately pruned in one-shot E Frantar, D Alistarh International Conference on Machine Learning, 10323-10337, 2023 | 67* | 2023 |
The optimal bert surgeon: Scalable and accurate second-order pruning for large language models E Kurtic, D Campos, T Nguyen, E Frantar, M Kurtz, B Fineran, M Goin, ... arXiv preprint arXiv:2203.07259, 2022 | 52 | 2022 |
Optimal brain compression: A framework for accurate post-training quantization and pruning E Frantar, D Alistarh Advances in Neural Information Processing Systems 35, 4475-4488, 2022 | 46 | 2022 |
M-FAC: Efficient matrix-free approximations of second-order information E Frantar, E Kurtic, D Alistarh Advances in Neural Information Processing Systems 34, 14873-14886, 2021 | 35 | 2021 |
On the sample complexity of adversarial multi-source PAC learning N Konstantinov, E Frantar, D Alistarh, C Lampert International Conference on Machine Learning, 5416-5425, 2020 | 23 | 2020 |
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression T Dettmers, R Svirschevski, V Egiazarian, D Kuznedelev, E Frantar, ... arXiv preprint arXiv:2306.03078, 2023 | 18 | 2023 |
SPDY: Accurate pruning with speedup guarantees E Frantar, D Alistarh International Conference on Machine Learning, 6726-6743, 2022 | 17 | 2022 |
Ziplm: Hardware-aware structured pruning of language models E Kurtic, E Frantar, D Alistarh arXiv preprint arXiv:2302.04089, 2023 | 8 | 2023 |
L-greco: An efficient and general framework for layerwise-adaptive gradient compression M Alimohammadi, I Markov, E Frantar, D Alistarh arXiv preprint arXiv:2210.17357, 2022 | 4 | 2022 |
oViT: An Accurate Second-Order Pruning Framework for Vision Transformers D Kuznedelev, E Kurtic, E Frantar, D Alistarh arXiv preprint arXiv:2210.09223, 2022 | 2 | 2022 |
Towards End-to-end 4-Bit Inference on Generative Large Language Models S Ashkboos, I Markov, E Frantar, T Zhong, X Wang, J Ren, T Hoefler, ... arXiv preprint arXiv:2310.09259, 2023 | 1 | 2023 |
Sparse Finetuning for Inference Acceleration of Large Language Models E Kurtic, D Kuznedelev, E Frantar, M Goin, D Alistarh arXiv preprint arXiv:2310.06927, 2023 | 1 | 2023 |
Scaling laws for sparsely-connected foundation models E Frantar, C Riquelme, N Houlsby, D Alistarh, U Evci arXiv preprint arXiv:2309.08520, 2023 | 1 | 2023 |
CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models D Kuznedelev, E Kurtic, E Frantar, D Alistarh Thirty-seventh Conference on Neural Information Processing Systems, 2023 | | 2023 |
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models E Frantar, D Alistarh arXiv preprint arXiv:2310.16795, 2023 | | 2023 |
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization D Kuznedelev, E Kurtic, E Iofinova, E Frantar, A Peste, D Alistarh arXiv preprint arXiv:2308.02060, 2023 | | 2023 |
QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models T Pegolotti, E Frantar, D Alistarh, M Püschel arXiv preprint arXiv:2307.03738, 2023 | | 2023 |
JaxPruner: A concise library for sparsity research JH Lee, W Park, N Mitchell, J Pilault, J Obando-Ceron, HB Kim, N Lee, ... arXiv preprint arXiv:2304.14082, 2023 | | 2023 |
Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression D Kuznedelev, S Tabesh, K Noorbakhsh, E Frantar, S Beery, E Kurtic, ... arXiv preprint arXiv:2303.14409, 2023 | | 2023 |