Such that for each training sample Xi, the function yields f(xi) ≥0 for Yi = + 1, and f(xi) < 0 for Yi = -1. In other words, training samples of two different classes are separated by the hyperplane f(x) = wT X + b = 0, where w is weight vector and normal to hyperplane, b is bias or threshold and Xi is the data point