The approach to this binary classification problem was to implement several supervised learning algorithms and compare and contrast their results and error properties. The first such algorithm was logistic regression, which is a discriminative learning algorithm directly modeling the conditional probability, p(y|x). The fitting parameters ✓ 2 Rn+1, including the intercept terms are computed via the maximum likelihood estimators and then an optimization algorithm is used to find the optimal ✓. Both the second order Newton’s method and gradient ascent were explored. Newton’s Method was preferred to both stochastic and batch gradient ascent. Even though the implementation of the gradient ascent is simpler and each iteration is cheaper, since it only requires calculating the gradient rather the Hessian, it took far more iterations to converge.
Newton’s Method already had error, as measured by the norm of the gradient, of approximately machine
precision ✏ after 9-11 iterations, in comparison to the hundreds for both gradient ascents. Another challenge
of gradient descent is choosing the proper step length ↵ to expedite convergence. This could be done via
parameter fitting or by choosing an adaptive ↵ via a bisection method. However, a downside to Newton’s
Method is that it is more subject to round-o↵ errors. The Hessian must stay negative semi-definite and not
be poorly-conditioned, since it is being inverted in the algorithm. In order to avoid these poor numerical
properties, the feature data in the design matrix X 2 Rmx(n+1) was normalized for logistic regression, since
the original feature data given had very di↵erent scales of magnitude. Note that this normalization was
not necessary the Gaussian Discriminant Analysis (GDA) algorithm and so the implementation of GDA was
fairly straightforward in such that no modifications needed to be made.
GDA is a generative learning algorithm, which contrary to discriminative algorithms, first builds a model
for p(x|y = 1), the positive class of malignant tumors and also builds a model for p(x|y = 0), the negative
class of benign tumors. It then learns p(y|x) using Bayes’ Rule: