The paper describes a low-level communication library, implemented
on both BG/Q and P7-IH, that we have adopted as a vehicle
to explore different optimization strategies. This library was used in
all our Graph500 submissions from 2011 to 2014 and other graph
algorithms [3, 19], with system configurations up to 98,304 nodes
and 6,291,456 threads.