Properties (1) and (2) apply to gpu while (3) and (4) apply to its derivative, gpu' For error functions based on the sum of squares, only properties (1) and (3) are satisfied. However, Property(2) is needed to discourage local extrema and saddle points that are exactly opposite to the training value and property (4) is required to affect training algorithms in a way that pushes away from such exactly opposite values. The number of saddle points that are exterior in this sense can be very large (see Kalman & Kwasny, 1991).