the screen and further assembled to develop a complete image or a picture [1].The Gpu graphics pipeline
undergoes various steps like vertex operations, Primitive assembly, Rasterization, Fragment operations and
Composition [1].From this pipeline structure, Gpu have become further programmable. Purpose GPU is quite
suitable for computing intensive data parallel. In Jun. 2007 NVIDIA released CUDA and in Dec 2008,
Khronos Group released OpenCL1. In Aug. 2009, AMD launched ATI Stream SDK v2.0 Beta which
supported X86 processor. Open CL is an open standard for many processors [2]. Modern GPUs contain
hundreds of processing units, capable of achieving up to 1 TFLOPS for single-precision (SP) arithmetic, and
over 80 GFLOPS for double-precision (DP) calculations .Recent High Performance Computing (HPC)
optimized GPUs contain up to 4GB of on board memory and 100GB/sec [3].
Due to its parallel architecture and high performance of floating point and memory operations GPU is well
suited for many same scientific and engineering applications that occupy HPC clusters, leading to their
incorporation as HPC accelerator[3].So, they can reduce space, power, and cooling demands, and reduce the
number of operating system images that must be managed relative to traditional CPU-only clusters of similar
aggregate computational capability [3].GPU has its amazing computational capabilities and functionalities
that extends its applications to the field of non-graphics computations and such a type of GPU is known as General
Purpose GPU [4] Due to their cost performance and evolution speed they are becoming significant[4].