Okay, let's suppose we're trying to minimize some function, C(v)C(v). This could be any real-valued function of many variables, v=v1,v2,…v=v1,v2,…. Note that I've replaced the ww and bb notation by vv to emphasize that this could be any function - we're not specifically thinking in the neural networks context any more. To minimize C(v)C(v) it helps to imagine CC as a function of just two variables, which we'll call v1v1 and v2v2: