To benefit from this bandwidth filtering, Design 1 in-cludes a private per-CU L1 TLB after scratchpad memory access and after the coalescing hardware. Thus, the MMU is only accessed on global memory accesses. Figure 5 shows Design 1 in light gray and Table 2 details the configuration parameters.