Suppose that the input data is x1, · · · , xn and the corresponding output
sequence is y1, . . . , yn and we seek to determine the linear function f(x) = ax+b
such that values yi are as close as possible to axi + b for 1 ≤ i ≤ n. This is
achieved by minimizing the total square error given by: