In a convolutional neural network data and functions have additional structure. The data x1,…,xn are images, sounds, or more in general maps from a lattice1 to one or more real numbers. In particular, since the rest of the practical will focus on computer vision applications, data will be 2D arrays of pixels. Formally, each xi will be a M×N×K real array of M×N pixels and K channels per pixel. Hence the first two dimensions of the array span space, while the last one spans channels. Note that only the input x=x1 of the network is an actual image, while the remaining data are intermediate feature maps.
The second property of a CNN is that the functions fl have a convolutional structure. This means that fl applies to the input map xl an operator that is local and translation invariant. Examples of convolutional operators are applying a bank of linear filters to xl.