Using real faults to evaluate test suite prioritization techniques

07 December 2018

In a previous post, called Regression testing of software is costly — but you can do something about it!, I pointed out that software engineers often write test suites that they will re-run as they modify a program. This valuable — and expensive! — process, called regression testing, helps developers to ensure that they have not introduced new defects as they add new features or bug fixes.

Instead of focuses on the practices of software engineers, this draws attention to the common practices that researchers follow when they are assessing the effectiveness of prioritization techniques that reorder a test suite. Many research papers, including some of my own like (Lin2017)  ,  seed the program under test with synthetic faults called mutants and then see how quickly different test orderings detect those faults. Since mutation testing tools exist for many programming languages, this approach is appealing to researchers who want to evaluate the effectiveness of a new test prioritizer.

One of my research collaborations lead to the recent publication of (Paterson2018)  ,  a paper that calls into question the use of mutants during the experimental evaluation of test suite prioritization methods. Using Defects4J, the database of real faults for Java programs, this paper reports on experiments that investigate how the use of mutants and real faults influence the experimental study of coverage-based test prioritizers. The paper shows that real faults, in comparison to synthetic mutants, are harder for a reordered test suite to detect. The results also suggest that the use of mutants leads to an unpredictable scoring of a test suite's effectiveness. In the context of test suite prioritization, this paper's conclusion is that mutants are not a surrogate for real faults!

If you want to learn more about these results, please read (Paterson2018)  .  Since I would like to learn about other approaches, I hope that you will contact me with your suggestions for how to experimentally assess a test suite prioritization technique.

Enjoy this post? If so, please read, Responsive web testing helps to create a wow-worthy web, my most recent article.

Please support my work!

View the source.