The training process ran for 500 epochs. The training and
validation error started to converge at around 300 epochs, which
is relatively slow. This may be caused by imbalance in the training
samples used. Imbalanced training samples means there is a
dominant class in the samples. In our case, the number of water
(ice concentration is 0) samples is about eight times the second
most common ice concentration level in the training samples.
Intuitively, this pushes the model to a “dangerous local minimum”
quickly [36]. When the model is at this local minimum,
it detects most of the input as water and can still achieve a low
cost. This causes underestimation of the ice concentration in
general if the training is stopped early. It may take many epochs
to get out of this local minimum, which leads to long training
period. There are several approaches that may be investigated to
resolve this issue, including undersampling the majority [37],
[38], oversampling the minority [37], [38], or using a Bayesian
cross-entropy cost function [36]. Our experiments (not shown
here) show that learning normally converges within 50 epochs
when using those methods, but none of the aforementioned
methods converges to a model better than training directly on
all the training samples for a long time. In this paper, we choose
to prioritize precision and bear with the long training time.
A better method to reduce the effect of imbalanced data is
needed for better performance in terms of computational time
and result accuracy.