The action network outputs the control signal, the model network simulates the characteristics of
controlled object and outputs new state parameter, and the critic network outputs an estimate of cost function given by the Bellman equation associated with optimal control theory.