Shared coalescing queues have much better scalability than their private counterparts on large distributed configurations because of the reduced memory requirements. The difficulty of implement-ing shared queues efficiently lies in the coordination among the threads that is required to allow the concurrent enqueueing of the small messages. We use atomics to coordinate the access to the queues. As shown in Figure 7, each coalescing queue is associated with two pointers, Head and Tail, along with a Window, which de-fines the packet size to be sent over the network, which is typically