Recent OpenGL ES 2.0 API Specification for embedded systems graphics operations requires programmable vertex shaders to process vertex data. In order to facilitate 3D coordinate transformation and lighting operations, vertex shaders usually contain single instruction multiple data (SIMD) datapath and a special function unit (SFU). In this paper, we present a new design of the vertex shader processor in which a recently proposed non-uniform segmentation is adopted in the design of the special function unit in order to reduce the sizes of lookup tables (LUTs). Both fixed-point and floating-point arithmetic are supported to satisfy the requirements of various precisions and ranges. Compared with recent similar implementations, the proposed design has satisfactory energy efficiency with performance normalized by power consumption.