4 EVALUATION
4.1 Methodology
We implement ScaleGPU on top of GPGPU-Sim version 3.1.2 [1]
with the parameters shown in Table 1. In particular, we carefully
model the CPU-GPU communication via PCIe as both ScaleGPU
and the zero-copy scheme communicate with CPU during kernel
execution. We obtain the PCIe parameters from a reference machine
using an NVIDIA Tesla C2050 GPU, and validated the performance
of both the PCIe link and the zero-copy scheme against the reference
machine using various pointer-chasing microbenchmarks. For the
zero-copy scheme, we enable only L2 caches [10]. We model a tag
access as a 64B DRAM access due to the minimal access granularity