3.3 Overlap of data transfers and GPU execution
ScaleGPU allows to overlap data transfers and GPU execution by
executing GPU threads whose data has been delivered by TMH from
CPU memory to GPU memory following the spatial and temporal
localities. In addition, ScaleGPU can adjust the size of data transfer
by changing the number of tags associated to a DRAM row while
fully utilizing the PCIe bandwidth. On the other hand, existing GPU
architectures must either serialize the transfers and the execution, or
disable GPU memory by forwarding every memory access from GPU
to CPU. Both methods incur a significant performance degradation.