Williams, M., Chrysostomou, G. and Aletras, N. orcid.org/0000-0003-4285-1965 (Submitted: 2024) Self-calibration for language model quantization and pruning. [Preprint - arXiv] (Submitted)
Abstract
Quantization and pruning are fundamental approaches for model compression, enabling efficient inference for language models. In a post-training setting, state-of-the-art quantization and pruning methods require calibration data, a small set of unlabeled examples. Conventionally, randomly sampled web text is used, aiming to reflect the model training data. However, this poses two key problems: (1) unrepresentative calibration examples can harm model performance, and (2) organizations increasingly avoid releasing model training data. In this paper, we propose self-calibration as a solution. Our approach requires no external data, instead leveraging the model itself to generate synthetic calibration data as a better approximation of the pre-training data distribution. We extensively compare the performance of self-calibration with several baselines, across a variety of models, compression methods, and tasks. Our approach proves consistently competitive in maximizing downstream task performance, frequently outperforming even using real data.
Metadata
| Item Type: | Preprint | 
|---|---|
| Authors/Creators: | 
 | 
| Copyright, Publisher and Additional Information: | © 2024 The Author(s). For reuse permissions, please contact the Author(s). | 
| Dates: | 
 | 
| Institution: | The University of Sheffield | 
| Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) | 
| Funding Information: | Funder Grant number RESPONSIBLE AI UK EP/Y009800/1 | 
| Depositing User: | Symplectic Sheffield | 
| Date Deposited: | 10 Jan 2025 11:02 | 
| Last Modified: | 10 Jan 2025 11:02 | 
| Status: | Submitted | 
| Identification Number: | 10.48550/arXiv.2410.17170 | 
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:220835 | 

 CORE (COnnecting REpositories)
 CORE (COnnecting REpositories) CORE (COnnecting REpositories)
 CORE (COnnecting REpositories)