determined by the network’s Maximum Transmission Unit (MTU). Threads atomically increment the Tail, get the available slot and write the message. When the size of the aggregated message reaches the Window size, the thread writing the last slot will send the full packet out. The Head pointer is updated when the network adapter signals the completion of the send operation, which could happen either through the interrogation of a hardware register, or in a network callback. In addition, the runtime can take advantage of optimized atomics where available. For example, we use BG/Q L2 Atomics,2 along with a low level flush operation that ensures all pending store operations reach the L2 cache, guaranteeing that the message contents are fully visible to the network adapter when the send is enqueued and the data is sent over the communication link.