Where tests fall short: empirically analyzing oracle gaps in covered code

Maton, M., Kapfhammer, G.M. and McMinn, P. orcid.org/0000-0001-9137-7433 (Accepted: 2025) Where tests fall short: empirically analyzing oracle gaps in covered code. In: Proceedings of the International Symposium on Empirical Software Engineering and Measurement (ESEM 2025). ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 28 Sep - 03 Oct 2025, Honolulu, Hawai, USA. Institute of Electrical and Electronics Engineers (IEEE). (In Press)

Abstract

Background: Developers often rely on statement coverage to assess test suite quality. However, statement coverage alone may only lead to 10% fault detection, necessitating more rigorous approaches. While mutation testing is effective, its execution and human analysis costs remain high. Identifying covered statements that are not checked by oracles (e.g., assertions) offers a cost-effective alternative; however, the lack of empirical evidence for selecting the appropriate Oracle Gap Calculation Approach (OGCA) prevents developers from making informed choices. Aims: This knowledge-seeking study compares oracle gap characteristics determined by different OGCAs to assist developers in choosing the most valuable approach for their use cases. Method: Using mixed-method empirical analysis, we conduct an in-depth evaluation of the oracle gaps produced using three OGCAs: Checked Coverage using a Dynamic Slicer (CCDS ), Checked Coverage using an Observational Slicer (CCOS ), and Pseudo-Tested Statement Identification (PTSI). Across 30 Java classes from six open-source projects, we report on a quantitative evaluation of gap prominence, distribution, fault detection correlation and execution times, as well as results from a qualitative manual inspection of the statement types found in the oracle gaps. Results: The qualitative analysis showed data-loading statements, iteration statements and output updates to be most prominent in the oracle gaps. PTSI identified the oracle gaps with the lowest median mutation score (0.32), highlighting areas requiring more fault detection improvement compared to CCDS (0.76) and CCOS (0.50). PTSI also had the shortest median execution time (19.9 seconds), far quicker than both CCDS (273.2 seconds) and CCOS (5957.1 seconds). Conclusions: PTSI quickly reveals the priority testing areas for improved fault detection, making it an effective OGCA for developers to identify where tests fall short.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Maton, M. Kapfhammer, G.M. McMinn, P. https://orcid.org/0000-0001-9137-7433
Copyright, Publisher and Additional Information:	© 2025 The Author(s).
Dates:	Accepted: 17 June 2025
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Funding Information:	Funder Grant number ENGINEERING AND PHYSICAL SCIENCE RESEARCH COUNCIL EP/X024539/1
Depositing User:	Symplectic Sheffield
Date Deposited:	07 Aug 2025 15:38
Last Modified:	25 Sep 2025 11:49
Status:	In Press
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Refereed:	Yes
Related URLs:	Conference
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:230086

Download

Accepted Version

Under temporary embargo

Filename: maton2025.pdf

Request a copy

CORE (COnnecting REpositories)

Where tests fall short: empirically analyzing oracle gaps in covered code

Abstract

Metadata

Download

Accepted Version

Export

Statistics