Conventional image processing techniques do not take
advantage of the available computing resources such as
multicore/manycore programming and become very time
intensive. GPU-based parallel computing has potential to
process large image files very fast. In this work, the impact of
CUDA-accelerated GPU computing on image processing
performance is studied. Image processing and filtering through
sequential C and parallel CUDA/C programs are implemented.
Six image files with sizes from 512x512 pixels to
16,384x16,384 pixels are considered. CUDA Events are used
to measure the GPU execution time.
According to the experimental results, image processing
and filtering is more efficient when done through CUDA
programming. This is because computations are done
concurrently in parallel by fully exploiting the available
processing resources to improve speedup. Considering the
matrix manipulation on GPU in this experiment, speedup up to
365x can be achieved for a 16,384x16,384-pixel image.
It should be noted that there is a considerable overhead
caused by the CUDA copy operation between the host and the
devices that may reduce the speedup. It is observed that the
overhead due to the additional CUDA operations is
considerable compared to the actual pixels manipulation. For
large images, the CUDA malloc() time may be neglected, but
the time to copy data from/to CPU to/from GPU significantly
impacts on the overall performance.
Future extensions of this work include considering more
complex filters that may require thread cooperation to handle
data dependency in order to improve CUDA performance.