Position: Editing large language models poses serious safety risks

Youssef, P., Zhao, Z. orcid.org/0000-0002-3060-269X, Braun, D. et al. (2 more authors) (2025) Position: Editing large language models poses serious safety risks. In: Singh, A., Fazel, M., Hsu, D., Lacoste-Julien, S., Berkenkamp, F., Maharaj, T., Wagstaff, K. and Zhu, J., (eds.) Proceedings of the 42nd International Conference on Machine Learning. 42nd International Conference on Machine Learning (PMLR), 13-19 Jul 2025, Vancouver, Canada. Vol. 267. Proceedings of Machine Learning Research (PMLR), pp. 82426-82440. ISSN: 2640-3498. EISSN: 2640-3498.

Abstract

Large Language Models (LLMs) contain large amounts of facts about the world. These facts can become outdated over time, which has led to the development of knowledge editing methods (KEs) that can change specific facts in LLMs with limited side effects. This position paper argues that editing LLMs poses serious safety risks that have been largely overlooked. First, we note the fact that KEs are widely available, computationally inexpensive, highly performant, and stealthy makes them an attractive tool for malicious actors. Second, we discuss malicious use cases of KEs, showing how KEs can be easily adapted for a variety of malicious purposes. Third, we highlight vulnerabilities in the AI ecosystem that allow unrestricted uploading and downloading of updated models without verification. Fourth, we argue that a lack of social and institutional awareness exacerbates this risk, and discuss the implications for different stakeholders. We call on the community to (i) research tamper-resistant models and countermeasures against malicious model editing, and (ii) actively engage in securing the AI ecosystem.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Youssef, P. Zhao, Z. https://orcid.org/0000-0002-3060-269X Braun, D. Schlötterer, J. Seifert, C.
Editors:	Singh, A. Fazel, M. Hsu, D. Lacoste-Julien, S. Berkenkamp, F. Maharaj, T. Wagstaff, K. Zhu, J.
Copyright, Publisher and Additional Information:	© 2025 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Dates:	Published (online): 13 July 2025 Published: July 2025
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Date Deposited:	17 Jun 2026 22:12
Last Modified:	17 Jun 2026 22:12
Published Version:	https://proceedings.mlr.press/v267/youssef25a.html
Status:	Published
Publisher:	Proceedings of Machine Learning Research (PMLR)
Refereed:	Yes
Related URLs:	Author
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:242207

Download

Published Version

Filename: youssef25a.pdf

Licence: CC-BY 4.0

CLICK TO DOWNLOAD

CORE (COnnecting REpositories)

Position: Editing large language models poses serious safety risks

Abstract

Metadata

Download

Published Version

Export

Statistics