code being originally designed for optimal CPU performance rather than being built from the ground up to be optimized for GPU execution, and the use of hand-written CUDA kernels with explicit control of shared memory usage. An existing radiation code may be ported to a hand-written kernel with more development effort.