We have presented a low-latency communication library for large-scale graph analytics and explored different network optimization strategies, including computation in the network for special collective communication patterns. We have analyzed our software frame-work on two high performance computing systems, BlueGene/Q and POWER7 IH. We have also presented a graph programming model that exploits message aggregation and active messages for overlapping computation with fine grained communication. We have provided a detailed performance analysis of our communication library and evaluated its performance using two data-intensive applications. The performance evaluation has shown significant performance improvements, ranging from 5X to 10X, when compared to equally optimized MPI implementations, scaling up to 96K processing nodes and 6 million threads. We believe that our framework can also be used as the base for other, possibly more complex, graph processing algorithms.