Maton, M., Kapfhammer, G.M. and McMinn, P. orcid.org/0000-0001-9137-7433 (Accepted: 2025) Where tests fall short: empirically analyzing oracle gaps in covered code. In: Proceedings of the International Symposium on Empirical Software Engineering and Measurement (ESEM 2025). ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 28 Sep - 03 Oct 2025, Honolulu, Hawai, USA. Institute of Electrical and Electronics Engineers (IEEE) (In Press)
Abstract
Background: Developers often rely on statement coverage to assess test suite quality. However, statement coverage alone may only lead to 10% fault detection, necessitating more rigorous approaches. While mutation testing is effective, its execution and human analysis costs remain high. Identifying covered statements that are not checked by oracles (e.g., assertions) offers a cost-effective alternative; however, the lack of empirical evidence for selecting the appropriate Oracle Gap Calculation Approach (OGCA) prevents developers from making informed choices. Aims: This knowledge-seeking study compares oracle gap characteristics determined by different OGCAs to assist developers in choosing the most valuable approach for their use cases. Method: Using mixed-method empirical analysis, we conduct an in-depth evaluation of the oracle gaps produced using three OGCAs: Checked Coverage using a Dynamic Slicer (CCDS ), Checked Coverage using an Observational Slicer (CCOS ), and Pseudo-Tested Statement Identification (PTSI). Across 30 Java classes from six open-source projects, we report on a quantitative evaluation of gap prominence, distribution, fault detection correlation and execution times, as well as results from a qualitative manual inspection of the statement types found in the oracle gaps. Results: The qualitative analysis showed data-loading statements, iteration statements and output updates to be most prominent in the oracle gaps. PTSI identified the oracle gaps with the lowest median mutation score (0.32), highlighting areas requiring more fault detection improvement compared to CCDS (0.76) and CCOS (0.50). PTSI also had the shortest median execution time (19.9 seconds), far quicker than both CCDS (273.2 seconds) and CCOS (5957.1 seconds). Conclusions: PTSI quickly reveals the priority testing areas for improved fault detection, making it an effective OGCA for developers to identify where tests fall short.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2025 The Author(s). |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number ENGINEERING AND PHYSICAL SCIENCE RESEARCH COUNCIL EP/X024539/1 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 07 Aug 2025 15:38 |
Last Modified: | 07 Aug 2025 15:38 |
Status: | In Press |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
Refereed: | Yes |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:230086 |
Download
Filename: paper.pdf
