The experiments presented in this work were performed on
Hopper at the National Energy Research Scientic Com-
puting (NERSC) Center and Intrepid at the Argonne Lead-
ership Computing Facility (ALCF). Hopper is a Cray XE6
with a peak performance of 1:28 peta
ops, 153; 216 compute
cores, 212 TiB of RAM, and 2 PiB of online disk storage.
All experiments on Hopper were carried out using a Lustre
scratch le system composed of 26 I/O servers, each of which
provided access to 6 Object Storage Targets (OSTs). Un-
less otherwise noted, we used the default Lustre parameters
that striped each le across two OSTs. Intrepid is a Blue
Gene/P system with a peak performance of 557 tera
ops,
167; 936 compute cores, 80 TiB of RAM, and 7:6 PiB of on-
line disk storage. All experiments on Intrepid were carried
out by using a GPFS le system composed of 128 le servers
and 16 DDN 9900 storage devices. Intrepid also uses 640
I/O nodes to forward I/O operations between compute cores
and the le system. The Intrepid le system was nearly
full (95% capacity) during our evaluation study. We believe
that this situation signicantly degraded I/O performance
on Intrepid.