As GPU's compute capabilities grow, their memory hierarchy
increasingly becomes a bottleneck. Current GPU memory
hierarchies use coarse-grained memory accesses to exploit
spatial locality, maximize peak bandwidth, simplify control,
and reduce cache meta-data storage. These coarse-grained
memory accesses, however, are a poor match for emerging
GPU applications with irregular control
ow and memory
access patterns. Meanwhile, the massive multi-threading of
GPUs and the simplicity of their cache hierarchies make
CPU-specic memory system enhancements ineective for
improving the performance of irregular GPU applications.
We design and evaluate a locality-aware memory hierarchy for
throughput processors, such as GPUs. Our proposed design
retains the advantages of coarse-grained accesses for spatially
and temporally local programs while permitting selective
ne-grained access to memory. By adaptively adjusting the
access granularity,memory bandwidth and energy are reduced
for data with low spatial/temporal locality without wasting
control overheads or prefetching potential for data with high
spatial locality. As such, our locality-aware memory hierarchy
improves GPU performance, energy-eciency, and memory
throughput for a large range of applications.
As GPU's compute capabilities grow, their memory hierarchy
increasingly becomes a bottleneck. Current GPU memory
hierarchies use coarse-grained memory accesses to exploit
spatial locality, maximize peak bandwidth, simplify control,
and reduce cache meta-data storage. These coarse-grained
memory accesses, however, are a poor match for emerging
GPU applications with irregular control
ow and memory
access patterns. Meanwhile, the massive multi-threading of
GPUs and the simplicity of their cache hierarchies make
CPU-specic memory system enhancements ineective for
improving the performance of irregular GPU applications.
We design and evaluate a locality-aware memory hierarchy for
throughput processors, such as GPUs. Our proposed design
retains the advantages of coarse-grained accesses for spatially
and temporally local programs while permitting selective
ne-grained access to memory. By adaptively adjusting the
access granularity,memory bandwidth and energy are reduced
for data with low spatial/temporal locality without wasting
control overheads or prefetching potential for data with high
spatial locality. As such, our locality-aware memory hierarchy
improves GPU performance, energy-eciency, and memory
throughput for a large range of applications.
การแปล กรุณารอสักครู่..