Using real faults to evaluate test suite prioritization techniques

regression testing
research methodology
Evaluate test prioritization with real faults!

Gregory M. Kapfhammer



In a previous post, called Regression testing of software is costly — but you can do something about it!, I pointed out that software engineers often write test suites that they will re-run as they modify a program. This valuable — and expensive! — process, called regression testing, helps developers to ensure that they have not introduced new defects as they add features or bug fixes.

Instead of focuses on the practices of software engineers, this draws attention to the common practices that researchers follow when they are assessing the effectiveness of prioritization techniques that reorder a test suite. Many research papers, including some of my own like (Lin et al. 2017) seed the program under test with synthetic faults called mutants and then see how quickly different test orderings detect those faults. Since mutation testing tools exist for many programming languages, this approach is appealing to researchers who want to evaluate the effectiveness of a new test prioritizer.

One of my research collaborations lead to the recent publication of (Paterson et al. 2018) , a paper that calls into question the use of mutants during the experimental evaluation of test suite prioritization methods. Using Defects4J, the database of real faults for Java programs, this paper reports on experiments that investigate how the use of mutants and real faults influence the experimental study of coverage-based test prioritizers. The paper shows that real faults, in comparison to synthetic mutants, are harder for a reordered test suite to detect. The results also suggest that using mutants leads to an unpredictable scoring of a test suite’s effectiveness. In the context of test prioritization, this paper shows that mutants are not a surrogate for real faults!

Get the Gist!
Further Details

If you want to learn more about these new experimental results, please read (Paterson et al. 2018) ! Since I would like to learn about and study other approaches, I hope that you will contact me with your suggestions for how to experimentally assess a test suite prioritization technique.

Return to Blog Post Listing


Lin, Chu-Ti, Kai-Wei Tang, Jiun-Shiang Wang, and Gregory M. Kapfhammer. 2017. “Empirically Evaluating Greedy-Based Test Suite Reduction Methods at Different Levels of Test Suite Complexity.” Science of Computer Programming.
Paterson, David, Gregory M. Kapfhammer, Gordon Fraser, and Phil McMinn. 2018. “Using Controlled Numbers of Real Faults and Mutants to Empirically Evaluate Coverage-Based Test Case Prioritization.” In Proceedings of the 13th International Workshop on Automation of Software Test.