Empirically evaluating flaky test detection techniques combining test case rerunning and machine learning models

Abstract

A flaky test is a test case whose outcome changes without modification to the code of the test case or the program under test. These tests disrupt continuous integration, cause a loss of developer productivity, and limit the efficiency of testing. Many flaky test detection techniques are rerunning-based, meaning they require repeated test case executions at a considerable time cost, or are machine learning-based, and thus they are fast but offer only an approximate solution with variable detection performance. These two extremes leave developers with a stark choice. This paper introduces CANNIER, an approach for reducing the time cost of rerunning-based detection techniques by combining them with machine learning models. The empirical evaluation involving 89,668 test cases from 30 Python projects demonstrates that CANNIER can reduce the time cost of existing rerunning-based techniques by an order of magnitude while maintaining a detection performance that is significantly better than machine learning models alone. Furthermore, the comprehensive study extends existing work on machine learning-based detection and reveals a number of additional findings, including (1) the performance of machine learning models for detecting polluter test cases; (2) using the mean values of dynamic test case features from repeated measurements can slightly improve the detection performance of machine learning models; and (3) correlations between various test case features and the probability of the test case being flaky.

Metadata

Item Type:	Article
Authors/Creators:	Parry, O. https://orcid.org/0000-0002-0917-1274 Kapfhammer, G.M. Hilton, M. McMinn, P.
Copyright, Publisher and Additional Information:	© The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Keywords:	Software testing; Flaky tests; Machine learning
Dates:	Accepted: 9 February 2023 Published (online): 28 April 2023 Published: 28 April 2023
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	04 May 2023 10:22
Last Modified:	04 May 2023 10:22
Published Version:	http://dx.doi.org/10.1007/s10664-023-10307-w
Status:	Published
Publisher:	Springer Science and Business Media LLC
Refereed:	Yes
Identification Number:	10.1007/s10664-023-10307-w
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:198846

Download

Published Version

Filename: s10664-023-10307-w.pdf

Licence: CC-BY 4.0

CLICK TO DOWNLOAD

CORE (COnnecting REpositories)

Empirically evaluating flaky test detection techniques combining test case rerunning and machine learning models

Abstract

Metadata

Download

Published Version

Export

Statistics