LLMs for Code: Overview of the information retrieval in software engineering track at fire 2024

Paul, S., Majumdar, S. orcid.org/0000-0003-3935-4087, Shah, R. et al. (9 more authors) (2025) LLMs for Code: Overview of the information retrieval in software engineering track at fire 2024. In: Ghosh, K., Mandl, T., Majumder, P. and Ganguly, D., (eds.) Working Notes of FIRE 2024 - Forum for Information Retrieval Evaluation. FIRE 2024 - Forum for Information Retrieval Evaluation, 12-15 Dec 2024, Gandhinagar, India. CEUR Workshop Proceedings, Aachen, Germany, pp. 549-555. ISSN: 1613-0073.

Abstract

The Software Engineering Information Retrieval (IRSE) track focuses on developing automated methods to evaluate code comments using a machine learning framework. This year, the track featured two key tasks: (i) predicting the usefulness of code comments and (ii) estimating code quality. The first task focuses on distinguishing code comments as either useful or not useful. The dataset comprises 9,048 pairs of code comments sourced from open-source C-based projects on GitHub, along with an additional dataset generated by teams utilizing large language models (LLMs). A total of 12 teams from various universities contributed to this effort, conducting experiments that were evaluated using both quantitative and qualitative metrics. Notably, while labels generated by large language models introduce bias into the prediction model, they also contribute to reducing overfitting, leading to more generalizable results. The sub-track pertaining to code quality estimation was introduced this year. Given a problem description, and a list of large language model (LLM) generated software code, the objective of the task is to automatically estimate the functional correctness of each generated code. For the purpose of evaluation, each problem-solution pair is then ranked by these estimated probabilities of functional correctness, the quality of which is then reported with standard ranking performance measures.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Paul, S. Majumdar, S. https://orcid.org/0000-0003-3935-4087 Shah, R. Das, S. Ghosh, M. Ganguly, D. Calikli, G. Sanyal, D. Das, P.P. Clough, P.D. Bandyopadhyay, A. Chattopadhyay, S.
Editors:	Ghosh, K. Mandl, T. Majumder, P. Ganguly, D.
Copyright, Publisher and Additional Information:	Copyright © 2024 for the individual papers by the papers' authors. Copyright © 2024 for the volume as a collection by its editors. This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0).
Keywords:	Large Language Models, Comment Usefulness Prediction, Code Quality Estimation, bert, GPT-2
Dates:	Published: 12 December 2025
Institution:	The University of Leeds
Academic Units:	The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds)
Date Deposited:	05 Feb 2026 14:45
Last Modified:	06 Feb 2026 16:22
Published Version:	https://ceur-ws.org/Vol-4054/
Status:	Published
Publisher:	CEUR Workshop Proceedings
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:237528

CORE (COnnecting REpositories)

LLMs for Code: Overview of the information retrieval in software engineering track at fire 2024

Abstract

Metadata

Download

Published Version

Export

Statistics