an HPC cluster access terminal. In designing the methodology
and tool, the goal was to automate the tasks that
require knowledge of MPI library internals and provide the
user with easy-to-implement recommendations that could
enhance communication performance by tuning MPI library
parameters. In general, performance optimization has four
steps: measurement, analysis of performance bottlenecks,
generation of recommendations for optimization, and implementation
of the recommended optimizations. For MPIrelated
performance, MPI Advisor automates the first three
steps of this process, i.e., through generation of recommendations
for optimization. Accordingly, MPI Advisor incorporates:
1) data collection using the existing MPI profiling
interface (PMPI) and MPI Tools Information (MPI_T) interface,
2) analysis that translates the collected data into performance
metrics which identify specific performance degradation
factors, and 3) recommendations for optimization,
which are presented in a way that users with a minimum
knowledge of MPI will be able to implement them. We believe
that informing users about correct usage of MPI libraries
will enable them to make efficient use of MPI in the
future. Currently, MPI Advisor provides tuning strategies
to address the four most commonly occurring MPI-related
performance bottlenecks, which are related to the choice
of: 1) point-to point protocol (eager vs. rendezvous), 2)
collective communication algorithm, 3) MPI tasks-to-cores
mapping, and 4) Infiniband transport protocol. Support for
other MPI-related performance issues can and will be added
in the future. The current version of the tool supports both
MVAPICH23
and Intel MPI4
. Support for other MPI library
implementations will be included in future versions.
The innovations and contributions implemented in MPI
Advisor are: 1) specification of the measurements that are
required to perform the four MPI optimizations, including
the means to acquire this information using standard interfaces
such as PMPI and MPI-T, 2) a low-overhead implementation
of these measurements that gathers the required
data in a single execution of the input application and requires
no application instrumentation by the user, and 3)
a set of rules that uses the acquired measurements for recommending
any supported optimizations that are needed.
As the paper demonstrates, MPI Advisor has been used in
a real environment with benchmarks and real applications
and currently available high-performance MPI libraries. The
performance gains obtained in the case studies presented in
this paper range from a few percent to more than 40%.
This paper is organized as follows: Section 2 provides
a brief background about the MPI standard features and
the hwloc tool, which is used by MPI Advisor. Section
3 describes the MPI Advisor methodology and tool, while
Section 4 discusses the four tuning strategies currently supported
by MPI Advisor and indicates the performance tradeoffs
associated with each, in particular, for the MPI libraries
available on the Texas Advanced Computing Center’s Stampede
cluster. Section 5 demonstrates the efficacy of MPI
Advisor recommendations by presenting the recommendations
provided by each tuning strategy and the results of
implementing them. In Section 6 we review related research
and, finally, in Section 7 we conclude the paper and discuss
future research directions.
3http://mvapich.cse.ohio-state.edu/
4https://software.intel.com/en-us/intel-mpi-library