4.4 All Reduce and Message Size
Figure 5 presents the latency of the AllReduce collective communication when the message size varies. Focusing first on the cost of the Revoked AllReduce operation, one can observe that the duration of the operation remains independent of the message size until the message size increases to 1MB or more. As the Revoked operation is interrupted before exchanging the entire communication volume, this behavior is expected. For larger message sizes, however, the delivery of the Revoke notification may be delayed by the granularity of the ongoing reduction computation; as these computations are progressing, the MPI progress engine is managing
them with maximum priority, and thus does not consider incoming fragments for that time duration. As soon as one of these computation completes, the Revoke notification is
delivered, supplementary computation on pipelined blocks are discarded, and further data transfers cancelled. For post-Revoke AllReduce operations, the impact of jitter on performance is visible only for small message operations. As soon as the message size is larger than 512 bytes, the initial performance difference is absorbed in the cost of
the operation itself. Interestingly, the standard deviation (between 2,000 runs) for both Revoked and jitter-disturbed AllReduce operations remains low, and of the same magnitude as the natural, failure-free standard deviation of the operation.