4.2 Heapsort and Mergesort. We brie
y demon-
strate that also for Heapsort and Mergesort, the actual
running time varies with the presortedness of the input.
For Heapsort, Figure 11 shows the way the number
of inversions in the input aects the number of com-
parisons, the number of elements swaps, the number of
branch mispredictions, the running time, and the num-
ber of L2 data cache misses for input sequences of con-
stant length n = 2 106. The number of comparisons
and the number of element swaps performed by Heap-
sort is aected slightly, while the number of branch mis-
predictions is aected somewhat more. However, the
number of L2 data cache misses is greatly aected, and
varies by more than a factor of ten. The running time
shows a virtually identical behavior, except the increase
is by a factor close to four. This suggests that data
cache misses are the dominant factor for the running
time for Heapsort on this architecture. We leave open
the question of a theoretical analysis of the number of
cache misses of Heapsort as a function of Inv.
For Mergesort, we focus on the binary merge pro-
cess, and count the number of times there is an alterna-
tion in which of the two input subsequences provides the
next element output. It is easy to verify that the num-
ber of such alternations is dominated by the running
time of the Mergesort algorithm by Moat [14] based
on merging by nger search trees, which was proved to
have a running time of O(n log Inv
n ), i.e. the number of
alternations by standard Mergesort is O(n log Inv
n ). The
plots in Figure 12 show a very similar behavior for the
number of alternations, the number of branch mispre-
dictions, and the running time. The number of alterna-
tions is clearly correlated to the number of branch mis-
predictions, and these appear to be a dominant factor
for the running time of Mergesort. The number of data
cache misses increases only slightly for large disorder in
the input.