For comparison purpose, the energy aware SARSA-AODV protocol is implemented. In SARSA-AODV, each node learns to tune its RREQs forwarding rate during route discovery process using SARSA RL algorithm. The proposed RL model is as follows. Mobile node state at time step t, st, corresponds to its expected residual lifetime (RT) in seconds. The state space is experimentally quantized into discrete intervals. Actions correspond to the ratio of RREQs to be forwarded by a node. Finally, the reward signal is defined so that energy consumption fairness among mobile nodes is enhanced. It is calculated at each node as follows: