Paul, S., Majumdar, S. orcid.org/0000-0003-3935-4087, Shah, R. et al. (9 more authors) (2025) Overview of the “Information Retrieval in Software Engineering” (IRSE) track at Forum for Information Retrieval 2024. In: Ganguly, D., Sanyal, D. K., Majumder, P., Majumdar, S. and Gangopadhyay, S., (eds.) FIRE '24: Proceedings of the 16th Annual Meeting of the Forum for Information Retrieval Evaluation. The 16th Annual Meeting of the Forum for Information Retrieval Evaluation, 12-15 Dec 2024, Gandhinagar, India. Association for Computing Machinery, New York, NY, pp. 18-21. ISBN: 979-8-4007-1318-7.
Abstract
The “Software Engineering Information Retrieval” (IRSE) track aims to devise solutions for the automated evaluation of code comments within a machine learning framework, with labels generated by both humans and large language models. Within this track, we offered a total of two tasks this year - i) a comment usefulness prediction task, and ii) a code quality estimation task.
The comment classification task involves discerning comments as either useful or not useful. The dataset includes 9,048 pairs of code comments and surrounding code snippets drawn from open-source C-based projects on GitHub and an additional dataset generated by teams employing large language models. In total, 12 teams representing various universities have contributed their experiments. These experiments were assessed through quantitative metrics, primarily the F1-Score, and qualitative evaluations based on the features developed, the supervised learning models employed, and their respective hyper-parameters. It is worth noting that labels generated by large language models introduce bias into the prediction model but lead to less over-fitted results.
The sub-track pertaining to code quality estimation was introduced this year. Given a problem description, and a list of large language model (LLM) generated software code, the objective of the task is to automatically estimate the functional correctness of each generated code. For the purpose of evaluation, each problem-solution pair is then ranked by these estimated probabilities of functional correctness, the quality of which is then reported with standard ranking performance measures.
Metadata
| Item Type: | Proceedings Paper |
|---|---|
| Authors/Creators: |
|
| Editors: |
|
| Copyright, Publisher and Additional Information: | Copyright © 2024 Copyright held by the owner/author(s). This work is licensed under a Creative Commons Attribution 4.0 International License. |
| Keywords: | Large Language Models, Comment Usefulness Prediction, Code Quality Estimation |
| Dates: |
|
| Institution: | The University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
| Date Deposited: | 05 Feb 2026 15:20 |
| Last Modified: | 05 Feb 2026 15:20 |
| Published Version: | https://dl.acm.org/doi/10.1145/3734947.3735667 |
| Status: | Published |
| Publisher: | Association for Computing Machinery |
| Identification Number: | 10.1145/3734947.3735667 |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:237529 |
Download
Filename: Overview of the “Information Retrieval in Soware Engineering” (IRSE).pdf
Licence: CC-BY 4.0

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)