Comparison with Gregor
Gregor is the default mutation testing engine of PITest. We have compared Gregor and Descartes over a set of open source projects available from GitHub. We considered the execution time to evaluate a test suite, the number of mutants created by both engines and the results they reported.
In all cases Descartes created fewer mutants and employed much less time to complete the analysis.
Nevertheless, the results given by Descartes are coarse-grained compared to Gregor but more actionable and easier to understand and fix. It is, in general, not a replacement, but a tool to discover the worst tested methods in a project. Also since it takes less time, Descartes can be used more frequently.
The following image shows execution time of Descartes with respect to Gregor in a set of open-source projects:
The following image shows how many mutants both engines create:
The table below summarizes the comparison:
| Descartes | Gregor | |||
|---|---|---|---|---|
| Project | Time | Mutants | Time | Mutants |
| authzforce | 0:08:00 | 626 | 1:23:50 | 7296 |
| aws-sdk-java | 1:32:23 | 161758 | 6:11:22 | 2141689 |
| commons-cli | 0:00:13 | 271 | 0:01:26 | 2560 |
| commons-codec | 0:02:02 | 979 | 0:07:57 | 9233 |
| commons-collections | 0:01:41 | 3558 | 0:05:41 | 20394 |
| commons-io | 0:02:16 | 1164 | 0:12:48 | 8809 |
| commons-lang | 0:02:07 | 3872 | 0:21:02 | 30361 |
| flink-core | 0:14:04 | 4935 | 2:29:45 | 43619 |
| gson | 0:01:08 | 848 | 0:05:34 | 7353 |
| imagej-common | 0:08:07 | 1947 | 0:29:09 | 15592 |
| jaxen | 0:01:31 | 1252 | 0:24:40 | 12210 |
| jfreechart | 0:05:48 | 7210 | 0:41:28 | 89592 |
| jgit | 1:30:08 | 7152 | 16:02:03 | 78316 |
| joda-time | 0:03:39 | 4525 | 0:16:32 | 31233 |
| jopt-simple | 0:00:37 | 412 | 0:01:36 | 2271 |
| jsoup | 0:02:43 | 1566 | 0:12:49 | 14054 |
| sat4j-core | 0:53:09 | 2304 | 10:55:50 | 17163 |
| pdfbox | 0:44:07 | 7559 | 6:20:25 | 79763 |
| scifio | 0:24:14 | 3627 | 3:12:11 | 62768 |
| spoon | 2:24:55 | 4713 | 56:47:57 | 43916 |
| urbanairship | 0:07:25 | 3082 | 0:11:31 | 17345 |
| xwiki-rendering | 0:10:56 | 5534 | 2:07:19 | 112605 |
For a full comparison including how the results from both mutations engine correlate check the experiments repository This repository contains a set of IPython notebooks including:
- a full comparison on a set of real open-source projects
- how the mutation scores computed with both engines are correlated
- the presence of pseudo-tested methods in those projects and
- a statistical proof that these are the worst tested methods on each project.
