Zhang, L., Valentino, M. orcid.org/0000-0002-9959-8385 and Freitas, A. (2025) Autoformalization in the wild: assessing LLMs on real-world mathematical definitions. In: Christodoulopoulos, C., Chakraborty, T., Rose, C. and Peng, V., (eds.) Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), 04-09 Nov 2025, Suzhou, China. Association for Computational Linguistics, pp. 1720-1738. ISBN: 9798891763326.
Abstract
Thanks to their linguistic capabilities, LLMs offer an opportunity to bridge the gap between informal mathematics and formal languages through autoformalization. However, it is still unclear how well LLMs generalize to sophisticated and naturally occurring mathematical statements. To address this gap, we investigate the task of autoformalizing real-world mathematical definitions: a critical component of mathematical discourse. Specifically, we introduce two novel resources for autoformalization, collecting definitions from Wikipedia (Def_Wiki) and arXiv papers (Def_ArXiv). We then systematically evaluate a range of LLMs, analyzing their ability to formalize definitions into Isabelle/HOL. Furthermore, we investigate strategies to enhance LLMs’ performance including refinement through external feedback from Proof Assistants, and formal definition grounding, where we augment LLMs’ formalizations through relevant contextual elements from formal mathematical libraries. Our findings reveal that definitions present a greater challenge compared to existing benchmarks, such as miniF2F. In particular, we found that LLMs still struggle with self-correction, and aligning with relevant mathematical libraries. At the same time, structured refinement methods and definition grounding strategies yield notable improvements of up to 16% on self-correction capabilities and 43% on the reduction of undefined errors, highlighting promising directions for enhancing LLM-based autoformalization in real-world scenarios.
Metadata
| Item Type: | Proceedings Paper |
|---|---|
| Authors/Creators: |
|
| Editors: |
|
| Copyright, Publisher and Additional Information: | © 2025 The Authors. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
| Dates: |
|
| Institution: | The University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
| Date Deposited: | 20 Nov 2025 09:35 |
| Last Modified: | 20 Nov 2025 09:49 |
| Status: | Published |
| Publisher: | Association for Computational Linguistics |
| Refereed: | Yes |
| Identification Number: | 10.18653/v1/2025.emnlp-main.90 |
| Related URLs: | |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:234717 |
Download
Filename: 2025.emnlp-main.90.pdf
Licence: CC-BY 4.0

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)