Can doubling experiments characterize worst-case time complexity?

post

database testing

performance analysis

Can experiments help us to understand algorithmic complexity?

Author

Gregory M. Kapfhammer

Published

2016

Introduction

It is essential for software engineers to understand the efficiency of the algorithms in a complex system. Since the worst-case analysis of an algorithm is often difficult to discern, one effective alternative is to empirically determine an algorithm’s worst-case time complexity is through doubling experiments. This post explores a tool called ExpOse, originally described in (Kinneer et al. 2015a) and (Kinneer et al. 2015b) , for conducting these experiments, focusing on the domain of search-based test data generation for relational database schemas (McMinn et al. 2016) . Keep reading to learn more about this technique and its broader applicability!

Doubling Experiments

A doubling experiment involves systematically doubling the size of the input to an algorithm and observing the change in its runtime. This method helps in empirically determining the algorithm’s order of growth, often expressed in “big-Oh” notation. By measuring the time needed to run the algorithm on inputs of size n and 2n, we can draw conclusions about its efficiency. Intuitively, if a doubling of the input size leads to a doubling in the execution time, then this means that the algorithm is likely linear in its worst-case time complexity and thus likely a member of the O(n) worst-case time complexity class.

Here are some of the key aspects of the technique presented in these papers:

Doubling Schemas: The tool systematically doubles the size of a relational database schema, including tables, columns, and constraints. This process is non-trivial due to the complex interrelationships within a database schema.
Automatic Experimentation: The tool automates the doubling process and measures the runtime for each input size. It uses a convergence algorithm to determine when a stable ratio of runtimes is achieved, indicating the likely worst-case time complexity.
Empirical Analysis: We applied this tool to various relational database schemas, revealing performance trade-offs in search-based test data generation. The results showed that, in many cases, the time complexity of automatically generating test data was linear or linearithmic, while in others, it was quadratic, cubic, or worse.

While these papers specifically apply the concept of doubling experiments to the domain of relational database schemas, the technique is general and can be used for other algorithms and domains. By systematically increasing the input size and observing the runtime changes, software engineers can gain valuable insights into the efficiency of various algorithms.

Database-Specific Results

The empirical study conducted using the ExpOse tool focused on nine different database schemas. The experiments revealed that doubling certain schema structures, such as UNIQUE, NOT NULL, and CHECK constraints, often resulted in linear or linearithmic time complexity. However, doubling the number of tables and columns produced less conclusive results, with many experiments indicating quadratic or cubic behavior. More details about the results from these experiments are available in (Kinneer et al. 2015a) .

Future Work

Future research will focus on extending the doubling technique to other types of constraints, such as FOREIGN KEYs, and implementing more realistic ways to double relational schemas. Additionally, automated parameter tuning will be explored to optimize the convergence conditions for different execution environments. We would also like to develop a more general-purpose version of the tool that would work for, say, Python functions or Java methods.

Conclusion

Doubling experiments provide a powerful and general approach to empirically determine the worst-case time complexity of algorithms. By applying the presented technique to search-based test data generation for relational database schemas, my colleagues and I have gained valuable insights into performance trade-offs and efficiency. Importantly, this approach holds promise for broader applications in software engineering, which we hope to explore in future work.

Further Details

Please contact me if you have any insights or experiences related to this topic. To stay informed about new research developments and blog posts, consider subscribing to my mailing list.

Return to Blog Post Listing

References

Kinneer, Cody, Gregory M. Kapfhammer, Chris J. Wright, and Phil McMinn. 2015a. “Automatically Evaluating the Efficiency of Search-Based Test Data Generation for Relational Database Schemas.” In Proceedings of the 27th International Conference on Software Engineering and Knowledge Engineering.

———. 2015b. “ExpOse: Inferring Worst-Case Time Complexity by Automatic Empirical Study.” In Proceedings of the 27th International Conference on Software Engineering and Knowledge Engineering.

McMinn, Phil, Chris J. Wright, Cody Kinneer, Colton J. McCurdy, Michael Camara, and Gregory M. Kapfhammer. 2016. “SchemaAnalyst: Search-Based Test Data Generation for Relational Database Schemas.” In Proceedings of the 32nd International Conference on Software Maintenance and Evolution.