The Control/Action module and the Critic module are function approximation structures such as neural networks. The parameters of the Critic module are updated based on the Bellman’s principle of optimality, and the objective of updating the parameters of the Control/Action module is to
minimize the outputs of the Critic module.