Kearns and Singh (2002) observed that the reset assumption is not strictly necessary. In any sufficiently long run, there must be some state that is repeatedly visited and can therefore serve as a kind of post hoc starting state for analysis. They showed that a PAC result could be derived for trajectory-based learning instead of assuming independent trialste