Youssef, P., Zhao, Z. orcid.org/0000-0002-3060-269X, Braun, D. et al. (2 more authors) (2025) Position: Editing large language models poses serious safety risks. In: Singh, A., Fazel, M., Hsu, D., Lacoste-Julien, S., Berkenkamp, F., Maharaj, T., Wagstaff, K. and Zhu, J., (eds.) Proceedings of the 42nd International Conference on Machine Learning. 42nd International Conference on Machine Learning (PMLR), 13-19 Jul 2025, Vancouver, Canada. Vol. 267. Proceedings of Machine Learning Research (PMLR), pp. 82426-82440. ISSN: 2640-3498. EISSN: 2640-3498.
Abstract
Large Language Models (LLMs) contain large amounts of facts about the world. These facts can become outdated over time, which has led to the development of knowledge editing methods (KEs) that can change specific facts in LLMs with limited side effects. This position paper argues that editing LLMs poses serious safety risks that have been largely overlooked. First, we note the fact that KEs are widely available, computationally inexpensive, highly performant, and stealthy makes them an attractive tool for malicious actors. Second, we discuss malicious use cases of KEs, showing how KEs can be easily adapted for a variety of malicious purposes. Third, we highlight vulnerabilities in the AI ecosystem that allow unrestricted uploading and downloading of updated models without verification. Fourth, we argue that a lack of social and institutional awareness exacerbates this risk, and discuss the implications for different stakeholders. We call on the community to (i) research tamper-resistant models and countermeasures against malicious model editing, and (ii) actively engage in securing the AI ecosystem.
Metadata
| Item Type: | Proceedings Paper |
|---|---|
| Authors/Creators: |
|
| Editors: |
|
| Copyright, Publisher and Additional Information: | © 2025 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
| Dates: |
|
| Institution: | The University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
| Date Deposited: | 17 Jun 2026 22:12 |
| Last Modified: | 17 Jun 2026 22:12 |
| Published Version: | https://proceedings.mlr.press/v267/youssef25a.html |
| Status: | Published |
| Publisher: | Proceedings of Machine Learning Research (PMLR) |
| Refereed: | Yes |
| Related URLs: | |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:242207 |

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)