Graphic Processing Units (GPUs) [1], [2] are specialized massively parallel many-core processors that take advantage of Thread-Level Parallelism (TLP) to handle computations for computer graphics. GPUs dedicate the majority of the silicon area to data processing with only a small portion assigned to data caching and flow control circuitry, which makes them suitable for solving computer-intensive problems, in spite of general purpose CPUs, which embrace sequential data flow structure with the use of Instruction-Level Parallelism (ILP).