We have presented a low-latency communication library for large-scale graph analytics and explored different network optimization strategies, including computation in the network for special collec-tive communication patterns. We have analyzed our software frame-work on two high performance computing systems, BlueGene/Q and POWER7 IH. We have also presented a graph programming model that exploits message aggregation and active messages for overlapping computation with fine grained communication. We have provided a detailed performance analysis of our communication li-brary and evaluated its performance using two data-intensive appli-cations. The performance evaluation has shown significant perfor-mance improvements, ranging from 5X to 10X, when compared to equally optimized MPI implementations, scaling up to 96K process-ing nodes and 6 million threads. We believe that our framework can also be used as the base for other, possibly more complex, graph processing algorithms.