ABSTRACT
A majority of parallel applications executed on HPC clusters
use MPI for communication between processes. Most users
treat MPI as a black box, executing their programs using the
cluster’s default settings. While the default settings perform
adequately for many cases, it is well known that optimizing
the MPI environment can significantly improve application
performance. Although the existing optimization tools are
effective when used by performance experts, they require
deep knowledge of MPI library behavior and the underlying
hardware architecture in which the application will be executed.
Therefore, an easy-to-use tool that provides recommendations
for configuring the MPI environment to optimize
application performance is highly desirable. This paper addresses
this need by presenting an easy-to-use methodology
and tool, named MPI Advisor, that requires just a single execution
of the input application to characterize its predominant
communication behavior and determine the MPI con-
figuration that may enhance its performance on the target
combination of MPI library and hardware architecture. Currently,
MPI Advisor provides recommendations that address
the four most commonly occurring MPI-related performance
bottlenecks, which are related to the choice of: 1) point-topoint
protocol (eager vs. rendezvous), 2) collective communication
algorithm, 3) MPI tasks-to-cores mapping, and 4)
Infiniband transport protocol. The performance gains obtained
by implementing the recommended optimizations in
the case studies presented in this paper range from a few percent
to more than 40%. Specifically, using this tool, we were
able to improve the performance of HPCG with MVAPICH2
on four nodes of the Stampede cluster from 6.9 GFLOP/s to
10.1 GFLOP/s. Since the tool provides application-specific
recommendations, it also informs the user about correct usage
of MPI.