In the previous section we introduced a model of a Neuron, which computes a dot product
following a non-linearity, and Neural Networks that arrange neurons into layers. Together,
these choices define the new form of the score function, which we have extended from the
simple linear mapping that we have seen in the Linear Classification section. In particular, a
Neural Network performs a sequence of linear mappings with interwoven non-linearities. In
this section we will discuss additional design choices regarding data preprocessing, weight
initialization, and loss functions.