Majumdar, S. orcid.org/0000-0003-3935-4087, Deshpande, A., Das, P.P. et al. (1 more author) (2026) Comprehending C codes with LLMs: Effective comment generation through retrieval and reasoning. Pattern Recognition Letters, 199. pp. 295-302. ISSN: 0167-8655
Abstract
Software maintenance requires substantial time for program comprehension. Code comments significantly improve understandability by providing a glass-box view of the code and are thus essential for maintainability. Prior work has analyzed comment attributes, built automated systems to detect irrelevant comments, and applied machine learning to generate meaningful comments. With the rise of large language models, comment generation has accelerated, particularly for Java and Python. In this paper, we present a first-of-its-kind framework for code comment generation in C, a language widely used in low-level tasks. We explore the effectiveness of few-shot learning, retrieval-augmented generation, and code structure based context modeling. Our work builds on prior field studies conducted across seven companies in India and the UK, resulting in a dataset of 20,206 human-annotated C comments rated for usefulness. By 2024, contributions from 40 academic teams and 50 hackathon groups expanded this dataset to 24,578 comments. We further introduce a reusable evaluation framework involving human experts and large language model evaluators, grounded in eight dimensions derived from four industry case studies. A subset of 11,797 comments has been annotated for the presence or absence of these dimensions, serving as both input for generation and evaluation. Our results show that GPT-4o mini-trained models produce comments most aligned with human-annotated ones, achieving a similarity score of 0.64, followed by Gemini 1.5 at 0.58. GPT-4.5 achieves the highest alignment with humans as an evaluator, while Llama-3.1-70b performs the lowest.
Metadata
| Item Type: | Article |
|---|---|
| Authors/Creators: |
|
| Keywords: | Generative AI; Software maintenance; Code comprehension; Comment generation in C; Retrieval augmented generation; LLM critics |
| Dates: |
|
| Institution: | The University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
| Date Deposited: | 05 Feb 2026 09:59 |
| Last Modified: | 09 Feb 2026 16:36 |
| Published Version: | https://www.sciencedirect.com/science/article/pii/... |
| Status: | Published |
| Publisher: | Elsevier |
| Identification Number: | 10.1016/j.patrec.2025.10.007 |
| Related URLs: | |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:237524 |

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)