Similarly, when vector math (VM) is used in conjunction
with other optimization techniques like kernel splitting
(KS), the performance obtained is not as one would expect.
Scalar math (SM) tends to be faster with kernel splitting
(KS), though when isolated, the results are otherwise. Some
of these results are due to register pressure and the number
of threads that can run concurrently changing as the number
of registers and amount of local memory change.