Böck, M., & Heitzinger, C. (2022). Speedy Categorical Distributional Reinforcement Learning and Complexity Analysis. SIAM Journal on the Mathematics of Data Science, 4(2), 675–693. https://doi.org/10.1137/20M1364436
In distributional reinforcement learning, the entire distribution of the return instead of just the expected return is modeled. The approach with categorical distributions as the approximation method is well-known in Q-learning, and convergence results have been established in the tabular case. In this work, speedy Q-learning is extended to categorical distributions, a finite-time analysis is performed, and probably approximately correct bounds in terms of the Cramér distance are established. It is shown that also in the distributional case the new update rule yields faster policy evaluation in comparison to the standard Q-learning one and that the sample complexity is essentially the same as the one of the value-based algorithmic counterpart. Without the need for more state-action-reward samples, one gains significantly more information about the return with categorical distributions. Even though the results do not easily extend to the case of policy control, a slight modification to the update rule yields promising numerical results.