The instances are diering in the amount of taxa (134 to 178), the number of input trees (3 to 10) and their average
similarity to each other (below 50% up to 90%).
7.3 Comparison of Algorithms
Our aim was to make a comparison of the dierent algorithms as fair as possible. Hence per instance we ran the pure EA for 500000 iterations and used the consumed CPU time as limit for all others. Following this we did 30 runs per algorithm setting and instance and state following TreeRank scores in Tables 1{4: the best result (best), the mean value (mean) with the corresponding standard deviation (sdv.) as well as the median value (med.). Overall best obtained mean values are printed bold. For the agglomerative clustering and scatter search instances we also give the number of trees and taxa in parentheses, e.g. (3x134)" for 3 trees with 134 taxa each. In case of the articial tree instances the name itself holds this information and also the TreeRank score upper bound used during creation, e.g. 5x150 70" for 5 trees with 150 taxa each and a maximal TreeRank score of 70% w.r.t the initial tree. As lower limit the given upper bound minus 10% was used, i.e. 60% in the previous example. For these instances the TreeRank score of the initial input tree is given in line init tree", too. We further list for each instance the TreeRank score of the best tree of the input collection T (est input") and the aforementioned time limit (CPU-time") in seconds. In the following discussion, all performance dierences (i.e.dierences in TreeRank score mean values) pointed out have been statistically veried by Wilcoxon rank sum tests and error levels are less than 5%.For seven out of the 12 instances the MA yields on average signicantly better solutions than the pure EA. In the remaining ve cases the observed dierences are not signi-cant (although for four the MAs mean values are higher and only for instance M808scatter the EA achieved a slightly better mean). VNS is always signicantly better than the pure EA except for M808scatter, 5x175 80, and 5x175 90,where the latter achieved higher scores. Compared to the MA, the VNS exhibits clearly better results on the real-world instances, but was less eective on the articially generated
ones, for which it achieved a higher mean score in only a single case (5x150 80). We believe the reason for this lies
in the initial solution of the VNS, which is the input tree having the highest TreeRank score. While this best input tree turned out to lie relatively close to high quality consensus trees in case of our real-world instances, there are generally larger dierences in the articial instances. As the VNS is dominated by its strong local search of the embedded VND, which consumes the major part of the CPU-time on larger instances, its diversication abilities are less pronounced than those of the population-based MA.Now we concentrate on the hybrid approaches. The results of HybB, in which VND is used to locally optimize new incumbent solutions within the EA, was disappointing,as its nal solutions are generally inferior to those of the other hybrids and VNS performed in most of the real-world instances better. The more systematic EA/VNS combinations HybS, HybI4, and Hyb I4 are more successful. They consistently achieve the overall best results; at least one of them performed on all instances signicantly better than all other algorithms. The only exception is Onco10, where the VNS yields equally good results. Although there are only few statistically signicant dierences between the sequential
and the intertwined hybrids, the former tends to yield better results for the real-world instances whereas the latter
seems to be better suited for the articial instances. Finally, the intertwined MA/VNS hybrid Hyb I4 shows for many nstances a better performance|in fact sometimes even the best|when compared to the same variant utilizing the pure
EA only (HybI4). Altogether the intertwined variants can be clearly considered the most successful.When comparing the scores of the best input trees with those of the overall best solutions found, it is apparent that our real instances have less room for improvement, mostly below 2%, whereas the articially generated ones allow for about 5% or even more. nother interesting fact is the relation between the TreeRank scores of the articial instances'initial trees and the corresponding best solutions found. As can be observed in Tables 3 and 4, the initial tree is more likely the (nearly) optimal consensus tree regarding the TreeRank score) when the derived input trees are close to the initial tree, hence when the radius of the inner circle in Figure 7 is small. The results of the best solutions also show that only the hybrid algorithms are consistently able to nd consensus trees being better than or equal to the initial trees, whereas the latter happens for instances 5x150 90 and 5x175 90.