The current state of practice in supercomputer resource allocation places jobs from different users on disjoint
nodes both in terms of time and space. While this approach largely guarantees that jobs from different users
do not degrade one another’s performance, it does so at high cost to system throughput and energy efficiency.
This focused study presents job striping, a technique that significantly increases performance over the current
allocation mechanism by colocating pairs of jobs from different users on a shared set of nodes. To evaluate
the potential of job striping in large scale environments, the experiments are run at the scale of 128 nodes on
the state-of-the-art Gordon supercomputer. Across all pairings of 1024 process NAS parallel benchmarks,
job striping increases mean throughput by 26% and mean energy efficiency by 22%. On pairings of the real
applications GTC, LAMMPS, and MILC at equal scale, job striping improves average throughput by 12%
and mean energy efficiency by 11%. In addition, the study provides a simple set of heuristics for avoiding
low performing application pairs.
Copyright
c 2012 John Wiley & Sons, Ltd.