the primary aim of developing an ANN is to genera& the features of the rainfall time series.
If an ANN properly learns the features of the data, then the ANN is said to achieve good
generalisation. Depending on the complexity of the network, however, an ANN may suffer from
either underfittlng or overfitting the training data. An ANN that is not suI&iently complex can
fail to detect fully the features in a complicated data set, leading to underfitting. An ANN that
is too complex may fit the noise, not just the features, leading to overfitting.
A popular technique to achieve generalisation is the early stopping method presented by
Sarle [S]. According to the early stopping method, the data was split into three sets, namely
a training set, a monitoring set, and a validation set. The training set wss used to train the
network, whereas the monitoring set was used to monitor the performance of the network at
regular intervals during training. Training stopped when the error, when the model is applied
to the monitoring set, reached a mlnlmum. The validation set is used for final evaluation of the
network performance.
The 34 storm events of this study were thus divided into three data sets:
l training set-16 storms with a total of 748 rainfall periods (each of 15 minutes),
l monitoring set-eight storms with a total of 376 rainfall periods, and
l validation set-ten storms with a total of 625 rainfall periods.
The maximum epoch for training wss set at 1000. An epoch wss defined ss a complete sweep
through the training patterns; the weights of a network were updated after each epoch. Therefore,
a maximum epoch of 1000 means that the weights were allowed to update at most 1000 times.
During training, the networks were checked at every 100 epochs. ‘Daining was stopped when
the error in the monitoring data reached its lowest value, or the training reached the maximum
epoch, whichever came first. Finally, the networks were evaluated against the validation data.
Where appropriate, a sigmoid activation function was adopted for the hidden nodes, whereas
a linear activation function was used for the output nodes. The use of a sigmoid function was
to enable nonlinearity of the network. The sigmoid function, however, was not adopted for the
output nodes because it would force the output to be bounded between 0.0 and 1.0; thii would
require scaling of the output variable by a known maximum value. This was not appropriate for
rainfall forecasting because it was undesirable to set a priori a maximum rainfall value for the
data. To overcome this situation, an identity (linear) function was used instead.