Deep off-policy evaluation with autonomous racing cars

Kresse, Fabian Georg

doi:10.34726/hss.2024.117422

DC Field

Value

Language

dc.contributor.advisor

Grosu, Radu

dc.contributor.author

Kresse, Fabian Georg

dc.date.accessioned

2024-10-02T12:26:14Z

dc.date.issued

2024

dc.date.submitted

2024-09

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Kresse, F. G. (2024). <i>Deep off-policy evaluation with autonomous racing cars</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2024.117422</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2024.117422

dc.identifier.uri

http://hdl.handle.net/20.500.12708/201148

dc.description.abstract

Transferring robot policies trained with Reinforcement Learning (RL) from simulation to the real-world is challenging due to frequent overfitting to the simulator’s particularities. This is especially problematic when policy evaluation occurs within the same simulator, as it conceals overfitting, necessitating frequent and costly real-world deployments for accurate performance estimation. Recently Off-Policy Evaluation (OPE) has shown promise in reducing the need for extensive real-world deployment by providing performance estimates based on approximations of real-world data distributions. However, the effectiveness of OPE methods in real-world robotics has not been extensively investigated, with existing benchmarks primarily relying on simulated environments that fail to capture real-world complexities and unpredictability. Addressing this gap, this thesis introduces the first real-world robotics benchmark for OPE methods, utilizing the affordable and accessible F1TENTH platform. As the performance of some of the existing OPE methods is inadequate when applied naively to our investigated F1TENTH environment, we explore specific improvements and benchmark over 20 OPE methods. Among these, we introduce the Termination-aware Per-Decision-Weighted Importance Sampling (TPDWIS) estimator, a novel Importance Sampling (IS) estimator capable of handling trajectories with non-uniform lengths, significantly outperforming previous IS estimators from the literature on our benchmark. Furthermore, we prove the consistency of this new estimator, indicating that it can be applied to more general environments. Finally, we provide recommendations on the most effective OPE methods to employ under specific constraints. Our novel dataset, benchmark, proposed improvements, and new estimator offer a robust foundation for future research and development in real-world OPE methods.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

reinforcement learning

dc.subject

autonomous driving

dc.subject

off-policy evaluation

dc.subject

importance sampling

dc.title

Deep off-policy evaluation with autonomous racing cars

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2024.117422

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Fabian Georg Kresse

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Berducci, Luigi

tuw.publication.orgunit

E191 - Institut für Computer Engineering

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC17320234

dc.description.numberOfPages

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

tuw.advisor.orcid

0000-0001-5715-2142

tuw.assistant.orcid

0000-0002-3497-6007

item.languageiso639-1

item.openairetype

master thesis

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.grantfulltext

open

item.cerifentitytype

Publications

item.fulltext

with Fulltext

item.mimetype

application/pdf

item.openaccessfulltext

Open Access

crisitem.author.dept

E376 - Institut für Automatisierungs- und Regelungstechnik

crisitem.author.parentorg

E350 - Fakultät für Elektrotechnik und Informationstechnik

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(3.75 MB)

In Copyright

Show simple item record

Page view(s)

219

checked on Oct 2, 2024

Download(s)

131

checked on Oct 2, 2024

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM