With our fresh new combined shape and texture model, we have found a nice way to
describe how a face could change not only in shape but also in appearance. Now we
want to find which set of p shape and λ appearance parameters will bring our model
as close as possible to a given input image I(x). We could naturally calculate the error
between our instantiated model and the given input image in the coordinate frame
of I(x), or map the points back to the base appearance and calculate the difference
there. We are going to use the latter approach. This way, we want to minimize the
following function: