Programmer-managed GPU memory is a major challenge in writing GPU applications. Programmers must rewrite and
optimize an existing code for a different GPU memory size for both portability and performance. Alternatively, they can achieve only
portability by disabling GPU memory at the cost of significant performance degradation. In this paper, we propose ScaleGPU, a novel
GPU architecture to enable high-performance memory-unaware GPU programming. ScaleGPU uses GPU memory as a cache of CPU
memory to provide programmers a view of CPU memory-sized programming space. ScaleGPU also achieves high performance by
minimizing the amount of CPU-GPU data transfers and by utilizing the GPU memory’s high bandwidth. Our experiments show that
ScaleGPU can run a GPU application on any GPU memory size and also improves performance significantly. For example, ScaleGPU
improves the performance of the hotspot application by ~48% using the same size of GPU memory and reduces its memory size
requirement by ~75% maintaining the target performance.