Is slicing or mutation testing better at automatically identifying weaknesses in your test suite?

post

research paper

software testing

How can we best find the blind spots of our test suites?

Author

Gregory M. Kapfhammer

Published

2025

Introduction

Are your tests as effective as you think they are? While statement coverage is a common metric for test suite quality, it often doesn’t tell the whole story. High coverage scores can create a false sense of security, leaving critical “gaps” in your testing strategy. In the paper (Maton, Kapfhammer, and McMinn 2025) , my colleagues and I investigate how to identify an “oracle gap” that comprises the code that is executed by the tests but not actually checked by any assertions. A key contribution of this paper is a tool called GapGrep that uses three distinct methods to helps researchers and practitioners identify these oracle gaps in their test suites. Interested in learning more about this paper? Keep reading!

Key Contributions

This paper’s research offers a comprehensive empirical comparison of three different Oracle Gap Calculation Approaches (OGCAs) and the gaps that they produce:

Checked Coverage using a Dynamic Slicer (CCDS): This technique uses dynamic slicing to identify statements that influence the outcome of a test assertion.
Checked Coverage using an Observational Slicer (CCOS): This approach repeatedly deletes lines of code to see if the program’s behavior changes.
Pseudo-Tested Statement Identification (PTSI): This method identifies the pseudo-tested statements that can be removed without causing any test to fail.

This paper’s study gives a quantitative and qualitative analysis of these techniques, helping developers choose the most suitable approach for their needs. Read on to learn more!

Empirical Results

We conducted an empirical study on 30 Java classes from six open-source projects. This paper identifies several interesting results such as the following:

PTSI is the most efficient and effective OGCA: This method consistently identified the oracle gaps with the lowest mutation scores, indicating that it is the best at pinpointing areas where the test suite’s fault detection is weak.
PTSI is the fastest approach: Using pseudo-testedness to identify program statements in the oracle gap has the best performance, making it a practical choice for developers who need to quickly assess the strength of their test suites.
Distinct oracle gaps: The three OGCAs created oracle gaps that were largely distinct, suggesting that they uncover different types of testing weaknesses.

Future

While our study focused on programs implemented in Java, the concept of oracle gaps is applicable to a wide range of programming languages. Future work will involve expanding our analysis to other languages and exploring how these techniques can be integrated into the software development lifecycle to provide developers with real-time feedback on their tests. Ultimately, we hope that the current and future versions of the GapGrep tool can offer developers an automated approach to identifying how their test cases fall short.

Further Details

My colleagues and I are keen to help developers improve the quality of their software. If you have any questions about this research or want to share your own experiences with test suite analysis, please contact me. If you want to learn more about oracle gaps, you can read (Maton, Kapfhammer, and McMinn 2025) . To stay up-to-date on my latest research and blog posts, please subscribe to my mailing list. As always, your feedback is welcome!

Return to Blog Post Listing

References

Maton, Megan, Gregory M. Kapfhammer, and Phil McMinn. 2025. “Where Tests Fall Short: Empirically Analyzing Oracle Gaps in Covered Code.” In Proceedings of the 19th International Symposium on Empirical Software Engineering and Measurement.