On BG/Q, the Message Unit (MU) [7, 9], as shown in Figure 2, bridges the 5D Torus network and the memory subsystem and is designed to provide ultra-low latency and high throughput. It has in-jection and reception control logic, which manages message send-ing and receiving, plus a global barrier control logic providing the barrier and collective functionality that are integrated onto the same physical torus network. The MU also supports atomic operations, L2 atomic, and L2 prefetching (i.e., reading messages from main memory and loading them into L2). On the sending side, the in-jection control logic interprets the message descriptor provided by the software, and fetches the message contents from memory to send them into the network. When a message arrives, the recep-tion control logic writes it into the appropriate location in the memory system (if possible, directly into the L2 cache). The hardware pro-vides efficient mechanisms to poll the network device at user level to detect the arrival of new packets. BG/Q’s system software pro-vides highly optimized C inlines, through the System Programming Interface (SPI), to program the MU and Torus interconnect.