A naive implementation could proceed with the migration in two steps; each processor could gather information about
incoming particles in the first step, followed by an exchange of particles in the second step. The information gathering step
can either query all neighbors individually, or use a global collective operation. The two step process would minimize the
volume of data transmitted by limiting it to necessary exchanges only. The disadvantage is that information gathering step
will cost either an expensive global operation, or a minimum of 3d 1 non data operations. Given that relatively few particles
move out of a block along any one face/edge/corner in a single timestep, the latency can easily far outweigh the bandwidth
cost of communications.