1. Introduction
Learning and generalization in neural networks strongly depends on
the complexity of the network used, including the architecture of the
network, the number and types of parameters used by the network,
the procedure used for initialization of its parameters and the details
of the learning procedure. Models that are too complex may
learn the training data perfectly but will not generalize well. It is
commonly believed that the simplest models have the best generalization
capabilities, but proper regularization of the cost function
may ensure good generalization even in overparametrized models
[1]. Finding global minimum of a complex, nonlinear error function
with many parameters is an NP-hard problem [2]. Construction of
appropriate architecture and proper initialization of adaptive parameters
should enable finding close to optimal solutions for real-world
problems, significantly decreasing the learning time.