Figure 1 shows an
overview of the GPU MMU evaluated in this paper. This
design uses a TLB per GPU compute unit (CU) and a shared
page walk unit to avoid excessive per-CU hardware. The
shared page walk unit contains a highly-threaded page table
walker and a page walk cache.