For the code in key modules, we achieve parallel scaling of 45x,
50x, and 30x for fluid, face, and cloth simulations, respectively.
The modules have a spectrum of task granularity and locking behavior, and all but one are dominated by loop-level parallelism.
Many modules operate on streams of data. In some cases, modules iterate over their data, leading to significant temporal locality. This streaming behavior leads to very high on-die and main
memory bandwidth requirements. Finally, most moduleshave little
inter-thread communication since they are data-parallel, but a few
require heavy communication between data-parallel operations.