The figure shows two layers of a CNN. Layer m-1 contains four feature maps. Hidden layer m contains two feature maps ( and ). Pixels (neuron outputs) in and (outlined as blue and red squares) are computed from pixels of layer (m-1) which fall within their 2x2 receptive field in the layer below (shown as colored rectangles). Notice how the receptive field spans all four input feature maps. The weights and of and are thus 3D weight tensors. The leading dimension indexes the input feature maps, while the other two refer to the pixel coordinates.