Section 5 analyzes the performance of the algorithms on Intrepid, an IBM BG/P at Argonne National Laboratory, Jaguar, a Cray XT4 at Oak Ridge National Laboratory, and Franklin, a Cray XT4 at NERSC, and finally Section 6 describes our analysis and conclusions.