Feature Depth & Universality
While current model can be trained up to around 8 functional layers, many more might be needed to scale up to the full complexity of the human ventral stream. Unfortunately, the prospects for such scaling seem bleak, since backpropagation is known to be subject to "disappearing gradiants", error signals that rapidly approach zero as there passed down the layers. Luckily, there are several ways in which researchers are fighting back against this problem. The first comes from neuroscience, where some (e.g. Serre), have mapped their architectures directly to neural structure, and the mapping suggests that maybe to only really have 8 or so layers. Another approach is to evoke evolution. Perhaps some convolution layers were learnt on an evolutionary timescale and are task invariant. Such universal features aren't all that implausible for the early layers, as things such as Gabor filters tend to be learnt across most task settings. The last, and most promising, approach is to look to the emerging "deep learning" community for insight. So called deep architectures can have arbitrarily many layers since they are trained in a greedy, layer-by-layer procedure. While the best greedy learning algorithm for convolutional architectures is currently unclear (most deep learning involves unsupervised error signals), inroads are being made and the future for convolutional networks remains bright.