The simplicity of this MMU design shows that address
translation can be implemented on the GPU without exotic
hardware. We find that using this GPU MMU design incurs
modest performance degradation (an average of less than
2% compared to an ideal MMU with an infinite sized TLB
and minimal latency page walks) while simplifying the
burden on the programmer.