In fact, the real scenario is more complex because modern MPI implementations are well aware of hierarchies and are trimmed to use shared memory mechanisms for local communication. One of the downsides of MPI is that memory resources are wasted quadratically in the number of participating processes. The reason for this is that internal buffers are set up where the data is stored (and possibly duplicated) before the actual messages can be sent between the processes. On 106 cores, this would require 1012 memory buffers for a single global communication call (if the MPI implementation is not hierarchy-aware). In hybrid computing, the number of MPI processes is typically lower than the core count — saving resources in terms of memory and data transfers.