Empirically comparing hazard-guided LLM mutation techniques with existing LLM- and rule-based approaches

empirical study

mutation testing

software testing

Proceedings of the 30th International Conference on Evaluation and Assessment in Software Engineering

Authors

Megan Maton

Gregory M. Kapfhammer

Phil McMinn

Published

2026

Abstract

Mutation testing tools normally rely on rule-based operators that mechanically swap and change source code tokens without un- derstanding the code’s purpose. Recently, Large Language Models (LLMs) have enabled mutant generators to consider more of the code’s context and history. However, current LLM-based mutation testing methods have limited prompts that prevent them from unleashing the full power and creativity of the LLM to produce a diverse set of aggressive, yet realistic and productive, mutants. This paper’s novel approach uses an LLM to interpret a method and re-implement it entirely with mutations guided by hazard analysis, analogous to a programmer misunderstanding a project’s requirements or an aspect of an algorithm’s implementation. To enable the empirical comparison of the new hazard-guided techniques with both prior LLM-based and traditional mutation testing tools, this paper also presents and applies a framework that integrates representative LLM-based methods. Using this framework and 279 bugs in 15 projects from the Defects4J dataset, the results show that the hazard-guided techniques can harness both local and cloud-based LLMs to generate compilable, diverse, and powerful mutants that help test cases to detect unique defects not found by other methods.

Details

Paper
LLM-Mutation/multiplex

Reference

@inproceedings{Maton2026,
 author = {Megan Maton and Gregory M. Kapfhammer and Phil McMinn},
 booktitle = {Proceedings of the 30th International Conference on Evaluation and Assessment in Software Engineering},
 title = {Empirically comparing hazard-guided LLM mutation techniques with existing LLM- and rule-based approaches},
 year = {2026}
}

Return to Paper Listing