the corresponding Q-value continuously change. Thus, P(a(t) = 1)
is distributed discretely.
4.3. The effects of outcome value and learning rate in the standard
Q-learning model
We examined the influence of the outcome value parameter κ
and the learning rate αL
for the standard Q-learning model in which
the forgetting rate is zero (αF = 0). As previously discussed, if the
initial action values are all zero (i.e., Q1(1) = Q2(1) = 0), varying
the inverse temperature β has the same impact on the choice as
scaling R(t) by the same factor. Thus, varying κ is equivalent to
varying β by the same amount if we set the value of the neutral
outcome to zero. Therefore, we examined the effects of κ, instead
of β, while setting the value of the neutral outcome to zero.
Fig. 3(A) shows the estimated regression coefficients with
varying κ. As expected, the outcome value had a monotonic effect
on the regression coefficients over the entire reward history. In
addition, for the regression coefficients for choice history, the
outcome value also had a monotonic effect; the larger the κ, the
greater the negative dependence on the choice history was found
to be.