Abstract
Sort can be speeded up on parallel computers by dividing
and computing data individually in parallel. Merge sort
can be parallelized, however, the conventional algorithm
implemented on distributed memory computers has poor
performance due to the successive reduction of the number
of active (non-idling) processors by a half, up to one in
the last merging stage. This paper presents load-balanced
parallel merge sort algorithm where all processors participate
in merging throughout the computation. Data are
evenly distributed to all processors, and every processor
is forced to work in merging phase. Significant enhancement
of the performance has been achieved. Our analysis
shows the upper bound of the speedup of the merge time as
(P