As evident by Arcuri and Briand’s paper “A Hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering”, the field of search-based software engineering (SBSE) relies on statistical methods to support the empirical comparison of different techniques. Yet, this statistical source code is often bespoke and may not be available so that other researchers can replicate the analyses or learn from the project.
As a means for improving the maturity of the data analysis methods used in the SBSE field, I think that it would be useful if there were shared repositories of well-documented statistical analysis code and replication data. That is, the SBSE community would advance if its “hitchhikers” had access to “free vehicles” in the form of GitHub repositories containing the data sets and statistical analysis code used for published papers.
To learn more about the benefits associated with using shared repositories of statistical code in SBSE, you can read the suggestions in (Kapfhammer, McMinn, and Wright 2016)
Interested in learning more about this topic? Since this blog post was written, my colleagues and students and I have published (McMinn, Kapfhammer, and Wright 2016)