Through this data-driven approach we develop a proof-ofconcept
GPU MMU design that is fully compatible with
CPU page tables (x86-64 in this work). Figure 1 shows an
overview of the GPU MMU evaluated in this paper. This
design uses a TLB per GPU compute unit (CU) and a shared
page walk unit to avoid excessive per-CU hardware. The
shared page walk unit contains a highly-threaded page table
walker and a page walk cache.