The gradient descent algorithm for training the multilayer
perceptron is found slow especially when getting close
to a minimum. One of the reasons is that it uses a fixed-size
step. In order to take into account the changing curvature of
the error surface, many optimization algorithms use steps
that vary with each iteration. In order to solve this problem,
an adaptive learning rate [12] can be applied to attempt
keeping the learning step size as large as possible while
keeping learning stable. The learning rate is made responsive
to the complexity of the local error surface. In this approach,
new weights and biases are calculated using the current
learning rate at each epoch. New outputs and errors are then
calculated. As with momentum, if the new error exceeds the
old error by more than a predefined ratio for example, 1.04,
the new weights and biases are discarded. In addition, the
learning
ate is decreased. Otherwise, the new weights are
kept. If the new error is less than the old error, the learning
rate is increased. This procedure increases the learning rate.