Each layer in the attentional cascade is expected to meet a training target expressed in false
positive and false negative rates: among n negative examples declared positive by all of its preceding
layers, layer l ought to recognize at least (1 − γl)n as negative and meanwhile try not to sacrifice its
performance on the positives: the detection rate should be maintained above 1 − βl