while maintaining equal workloads. To do so, we greedily swap hairs between processors if it reduces the communication cost, iterating until we have reached a minimum, or have exceeded a maximum number of swaps (see Figure 13, right). This final clustering allows us to send less information between processors than if we had simply used all contact pairs, leading to more efficient parallelization. Note that we could use any algorithm for constructing the hair adjacency graph as long as it produces good communication patterns between processors.