Graph algorithms expose the limitations of conventional architec-tures, in particular when accessing large data sets [4,26]. The mem-ory access patterns are often irregular with little spatial locality and data reuse, resulting in a high cache miss rate. Graph applications usually handle sparse data sets with relatively simple accompany-ing computational operations. Performance is therefore dominated by the data transfers as opposed to the computational capabilities, to sustain a large number of fine-grained, and nearly random remote data accesses across the full memory system and interconnect.