5 RELATED WORK Programmers must modify existing GPU codes manually to run on smaller GPU memory or disable GPU memory [8]. Compiler-assisted static partitioning [6] can be applied only to static analysis-friendly
workloads. ScaleGPU does not require any code modification and can be applied to any GPU workload. Lustig and Martonosi [7] propose programmer-managed overlapping of data transfer and GPU execution for a fixed size of memory. ScaleGPU does not require any programmer annotation and can be applied to any GPU memory size. To the best of our knowledge, near-future VM supports from industry will require non-trivial translation and page table overhead to support fine-grained paging for discrete GPUs [5]. In fact, ScaleGPU acts as a light-weight VM, which enables fine-grained paging without incurring such overheads.