Recapping, our goal in training a neural network is to find weights and biases which minimize the quadratic cost function C(w,b)C(w,b). This is a well-posed problem, but it's got a lot of distracting structure as currently posed - the interpretation of ww and bb as weights and biases, the σσ function lurking in the background, the choice of network architecture, MNIST, and so on. It turns out that we can understand a tremendous amount by ignoring most of that structure, and just concentrating on the minimization aspect. So for now we're going to forget all about the specific form of the cost function, the connection to neural networks, and so on. Instead, we're going to imagine that we've simply been given a function of many variables and we want to minimize that function. We're going to develop a technique called gradient descent which can be used to solve such minimization problems. Then we'll come back to the specific function we want to minimize for neural networks.