Kresse, F. G. (2024). Deep off-policy evaluation with autonomous racing cars [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2024.117422
Transferring robot policies trained with Reinforcement Learning (RL) from simulation to the real-world is challenging due to frequent overfitting to the simulator’s particularities. This is especially problematic when policy evaluation occurs within the same simulator, as it conceals overfitting, necessitating frequent and costly real-world deployments for accurate performance estimation. Recently Off-Policy Evaluation (OPE) has shown promise in reducing the need for extensive real-world deployment by providing performance estimates based on approximations of real-world data distributions. However, the effectiveness of OPE methods in real-world robotics has not been extensively investigated, with existing benchmarks primarily relying on simulated environments that fail to capture real-world complexities and unpredictability. Addressing this gap, this thesis introduces the first real-world robotics benchmark for OPE methods, utilizing the affordable and accessible F1TENTH platform. As the performance of some of the existing OPE methods is inadequate when applied naively to our investigated F1TENTH environment, we explore specific improvements and benchmark over 20 OPE methods. Among these, we introduce the Termination-aware Per-Decision-Weighted Importance Sampling (TPDWIS) estimator, a novel Importance Sampling (IS) estimator capable of handling trajectories with non-uniform lengths, significantly outperforming previous IS estimators from the literature on our benchmark. Furthermore, we prove the consistency of this new estimator, indicating that it can be applied to more general environments. Finally, we provide recommendations on the most effective OPE methods to employ under specific constraints. Our novel dataset, benchmark, proposed improvements, and new estimator offer a robust foundation for future research and development in real-world OPE methods.