For many data sets, it produces a highly accurate classifier.
The algorithm for inducing a random forest was developed by
Leo Breiman and Adele Cutler [12] the method combines
Breiman's "bagging" idea and the random selection of
features, in order to construct a collection of decision trees
with controlled variation.
Algorithm: Random forest classifier
Input:
1. Training Dataset N, Which is a set of training
observations and their associated class values.
Output: Generates Decision trees
Each tree is constructed based on the following steps
1. Let the number of training cases be N, and the
number of variables in the classifier be M.
2. The number m of input variables to be used to
determine the decision at a node of the tree; m
should be much less than M.
3. Choose a training set for this tree by choosing n
times with replacement from all N available training
cases (i.e. take a bootstrap sample). Use the rest of
the cases to estimate the error of the tree, by
predicting their classes.
4. For each node of the tree, randomly choose m
variables on which to base the decision at that node.
Calculate the best split based on these m variables in
the training set.
5. Each tree is fully grown and not pruned (as may be
done in constructing a normal tree classifier).
For prediction a new sample is pushed down the
tree. It is assigned the label of the training sample in the
terminal node it ends up in. This procedure is iterated over all
trees in the ensemble, and the average vote of all trees is
reported as random forest prediction.