Convolutional Neural Network (CNN) is a multilayer
learning framework, which may consist of an input layer,
a few convolutional layers and an output layer. The goal
of CNN is to learn a hierarchy of feature representations.
Response maps in each layer are convolved with a number
of filters and further down-sampled by pooling operations.
These pooling operations aggregate values in a smaller region
by downsampling functions including max, min, and
average sampling. The learning in CNN is based on Stochastic
Gradient Descent (SGD), which includes two main operations:
Forward and BackPropagation. Please refer to (Le-
Cun and Bengio 1998) for details.
We used a seven layer CNN (including the input layer and
two perception layers for regression output). The first convolution
layer has 32 filters of size 5 5, the second convolution
layer has 32 filters of size 5 5, and the third convolution
layer has 64 filters of size 55, respectively. The first
perception layer has 64 regression outputs and the final perception
layer has 6 regression outputs. Our system considers
6 grasping type classes.