Each sample k in the training set is a triplet containing (1) the RGB input patch xk, (2) the binary mask corresponding to the input patch mk (with mij k ∈ {±1}, where (i,j) corresponds to a pixellocation on the input patch) and (3) a label yk ∈{±1}which specifies whether the patch containsan object. Specifically, a patch xk is given label yk = 1 if it satisfies the following constraints: