<div class="csl-bib-body">
<div class="csl-entry">Kresse, F. G. (2024). <i>Deep off-policy evaluation with autonomous racing cars</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2024.117422</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2024.117422
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/201148
-
dc.description.abstract
Transferring robot policies trained with Reinforcement Learning (RL) from simulation to the real-world is challenging due to frequent overfitting to the simulator’s particularities. This is especially problematic when policy evaluation occurs within the same simulator, as it conceals overfitting, necessitating frequent and costly real-world deployments for accurate performance estimation. Recently Off-Policy Evaluation (OPE) has shown promise in reducing the need for extensive real-world deployment by providing performance estimates based on approximations of real-world data distributions. However, the effectiveness of OPE methods in real-world robotics has not been extensively investigated, with existing benchmarks primarily relying on simulated environments that fail to capture real-world complexities and unpredictability. Addressing this gap, this thesis introduces the first real-world robotics benchmark for OPE methods, utilizing the affordable and accessible F1TENTH platform. As the performance of some of the existing OPE methods is inadequate when applied naively to our investigated F1TENTH environment, we explore specific improvements and benchmark over 20 OPE methods. Among these, we introduce the Termination-aware Per-Decision-Weighted Importance Sampling (TPDWIS) estimator, a novel Importance Sampling (IS) estimator capable of handling trajectories with non-uniform lengths, significantly outperforming previous IS estimators from the literature on our benchmark. Furthermore, we prove the consistency of this new estimator, indicating that it can be applied to more general environments. Finally, we provide recommendations on the most effective OPE methods to employ under specific constraints. Our novel dataset, benchmark, proposed improvements, and new estimator offer a robust foundation for future research and development in real-world OPE methods.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
reinforcement learning
en
dc.subject
autonomous driving
en
dc.subject
off-policy evaluation
en
dc.subject
importance sampling
en
dc.title
Deep off-policy evaluation with autonomous racing cars
en
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2024.117422
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Fabian Georg Kresse
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
dc.contributor.assistant
Berducci, Luigi
-
tuw.publication.orgunit
E191 - Institut für Computer Engineering
-
dc.type.qualificationlevel
Diploma
-
dc.identifier.libraryid
AC17320234
-
dc.description.numberOfPages
95
-
dc.thesistype
Diplomarbeit
de
dc.thesistype
Diploma Thesis
en
dc.rights.identifier
In Copyright
en
dc.rights.identifier
Urheberrechtsschutz
de
tuw.advisor.staffStatus
staff
-
tuw.assistant.staffStatus
staff
-
tuw.advisor.orcid
0000-0001-5715-2142
-
tuw.assistant.orcid
0000-0002-3497-6007
-
item.languageiso639-1
en
-
item.openairetype
master thesis
-
item.openairecristype
http://purl.org/coar/resource_type/c_bdcc
-
item.grantfulltext
open
-
item.cerifentitytype
Publications
-
item.fulltext
with Fulltext
-
item.mimetype
application/pdf
-
item.openaccessfulltext
Open Access
-
crisitem.author.dept
E376 - Institut für Automatisierungs- und Regelungstechnik
-
crisitem.author.parentorg
E350 - Fakultät für Elektrotechnik und Informationstechnik