First, we implemented the Viola and jones algorithm in the basic CPU version. The basic application is widened to GPU version using CUDA technology, and freeing CPU to perform other tasks. Then, the face detection algorithm has been optimized for the GPU using a grid topology and shared memory. These programs are compared and the results are presented. Finally, to improve the quality of face detection a second proposition was performed by the implementation of WaldBoost algorithm.