The literature has a few works that have implemented a GPU in HDL languages for FPGAs. The FlexGrip project [14] examines the implementation of a soft G80 GPU in a Virtex-6 FPGA. This architecture is able to run CUDA compiled objects without hardware recompilation. The results presented in the work indicated a considerable gain, up to 30x, when compared to MicroBlaze processor. However, the G80 architecture was the first GPU developed with general computation purpose, still in 2006, and several improvements are taken place in new GPUs architectures [1], [2], such as Fermi, which is the architecture used as reference in the present work.