SecureMind: A Framework for Benchmarking Large Language Models in Memory Bug Detection and Repair

Wang, H., Jacob, D., Kelly, D. et al. (3 more authors) (2025) SecureMind: A Framework for Benchmarking Large Language Models in Memory Bug Detection and Repair. In: ISMM '25: Proceedings of the 2025 ACM SIGPLAN International Symposium on Memory Management. 2025 ACM SIGPLAN International Symposium on Memory Management (ISMM 2025), 17 Jun 2025, Seoul, South Korea. Association for Computer Machinery , pp. 27-40. ISBN: 979-8-4007-1610-2/25/06

Abstract

Large language models (LLMs) hold great promise for automating software vulnerability detection and repair, but ensuring their correctness remains a challenge. While recent work has developed benchmarks for evaluating LLMs in bug detection and repair, existing studies rely on hand-crafted datasets that quickly become outdated. Moreover, systematic evaluation of advanced reasoning-based LLMs using chain-of-thought prompting for software security is lacking. We introduce SecureMind, an open-source framework for evaluating LLMs in vulnerability detection and repair, focusing on memory-related vulnerabilities. SecureMind provides a user-friendly Python interface for defining test plans, which automates data retrieval, preparation, and benchmarking across a wide range of metrics. Using SecureMind, we assess 10 representative LLMs, including 7 state-of-the-art reasoning models, on 16K test samples spanning 8 Common Weakness Enumeration (CWE) types related to memory safety violations. Our findings highlight the strengths and limitations of current LLMs in handling memory-related vulnerabilities.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Wang, H. Jacob, D. Kelly, D. Elkhatib, Y. Singer, J. Wang, Z. https://orcid.org/0000-0001-6157-0662
Copyright, Publisher and Additional Information:	© 2025 Copyright held by the owner/author(s). This is an open access conference paper under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited.
Keywords:	Software bug detection, Bug repair, Large language models
Dates:	Accepted: 3 May 2025 Published: 13 June 2025
Institution:	The University of Leeds
Academic Units:	The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds)
Funding Information:	Funder Grant number EPSRC (Engineering and Physical Sciences Research Council) EP/X018202/1 EPSRC (Engineering and Physical Sciences Research Council) EP/X037304/1
Depositing User:	Symplectic Publications
Date Deposited:	16 May 2025 12:50
Last Modified:	12 Aug 2025 10:25
Status:	Published
Publisher:	Association for Computer Machinery
Identification Number:	10.1145/3735950.3735954
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:226674

Download

Published Version

Filename: 3735950.3735954.pdf

Licence: CC-BY 4.0

CLICK TO DOWNLOAD

CORE (COnnecting REpositories)

SecureMind: A Framework for Benchmarking Large Language Models in Memory Bug Detection and Repair

Abstract

Metadata

Download

Published Version

Export

Statistics