Properties (1) and (2) apply to gpu while (3) and (4) apply to its derivative, gpu' For error functions based on the sum of squares, only properties (1) and (3) are satisfied. However, Property
(2) is needed to discourage local extrema and saddle points that are exactly opposite to the train-
ing value and property (4) is required to affect training algorithms in a way that pushes away
from such exactly opposite values. The number of saddle points that are exterior in this sense
can be very large (see Kalman & Kwasny, 1991).