3.3 Overlap of data transfers and GPU execution ScaleGPU allows to overlap data transfers and GPU execution by executing GPU threads whose data has been delivered by TMH from CPU memory to GPU memory following the spatial and temporal localities. In addition, ScaleGPU can adjust the size of data transfer by changing the number of tags associated to a DRAM row while fully utilizing the PCIe bandwidth. On the other hand, existing GPU architectures must either serialize the transfers and the execution, or disable GPU memory by forwarding every memory access from GPU to CPU. Both methods incur a significant performance degradation.