To achieve such programming portability, programmers may consider
using the zero-copy scheme existing in modern GPUs, which
forwards all memory requests from GPU cores directly to CPU
memory. Therefore, the zero-copy scheme disables GPU memory
completely. The zero-copy scheme can also perform thread execution
and data transfers in parallel. However, the zero-copy scheme suffers
from significant performance degradation when either the data is
reused (e.g., spatial and temporal localities,) or there exist noncoalescable
memory transfer accesses (i.e. indirect memory accesses).
These limitations motivate using GPU memory as a cache of CPU
memory to provide both portability and performance by exploiting
the GPU’s high access localities and memory bandwidth.