The reason the dependence on choice history arose is explained
as follows. Consider an extreme case where no reward was given
in the last Mr trials (R(t − 1) = · · · = R(t − Mr) = 0). In this
case, the regression model in which only reward history is included
predicts that the subject chose option 1 with a probability of 0.5,
i.e., P(a(t) = 1) = 0.5. However, this consequence differs from the
actual behavior of the Q-learning model. In the Q-learning model
with αF < αL
, the value of the unchosen option remains unchanged
(when αF = 0) or decays slowly compared with the chosen option
(when αF > 0). In contrast, the value of the chosen option decays,
and the tendency for switching the option increases. Thus, the
regression coefficients for the choice history, bc
, become negative.