Convolution
Convolution is a mathematical term, defined as applying a function repeatedly across the output of another function. In this context it means to apply a 'filter' over an image at all possible offsets. A filter consists of a layer of connection weights, with the input being the size of a small 2D image patch, and the output being a single unit. Since this filter is applying repeatedly, the resulting connectivity looks like a series of overlapping receptive fields, as shown in the 'sparse connectivity' image, which map to a matrix of the filter outputs (or several such matrices in the common case of using a bank of several filters). An important subtlety here is that why there are still a good deal of connections between the input layer and the filter output layer, the weights are tied together (as shown in the colored diagram). This means that during backpropagation, you only have to adjust a number of parameters equal to a single instance of the filter -- a drastic reduction from the typical FFNN architecture. Another nuance is that we could sensibly apply such filter to any input that's spatially organized, not just a picture. This means that we could add another bank of filters directly on top of our first filter bank's output. However, since the dimensionality of applying a filter is equal to the input dimensionality, we wouldn't be gaining any translation invariance with these additional filters, we'd be stuck doing pixel-wise analysis on increasingly abstract features. In order to solve this problem, we must introduce a new sort of layer: a subsampling layer.
ConvolutionConvolution is a mathematical term, defined as applying a function repeatedly across the output of another function. In this context it means to apply a 'filter' over an image at all possible offsets. A filter consists of a layer of connection weights, with the input being the size of a small 2D image patch, and the output being a single unit. Since this filter is applying repeatedly, the resulting connectivity looks like a series of overlapping receptive fields, as shown in the 'sparse connectivity' image, which map to a matrix of the filter outputs (or several such matrices in the common case of using a bank of several filters). An important subtlety here is that why there are still a good deal of connections between the input layer and the filter output layer, the weights are tied together (as shown in the colored diagram). This means that during backpropagation, you only have to adjust a number of parameters equal to a single instance of the filter -- a drastic reduction from the typical FFNN architecture. Another nuance is that we could sensibly apply such filter to any input that's spatially organized, not just a picture. This means that we could add another bank of filters directly on top of our first filter bank's output. However, since the dimensionality of applying a filter is equal to the input dimensionality, we wouldn't be gaining any translation invariance with these additional filters, we'd be stuck doing pixel-wise analysis on increasingly abstract features. In order to solve this problem, we must introduce a new sort of layer: a subsampling layer.
การแปล กรุณารอสักครู่..
