On Normative Reinforcement Learning via Safe Reinforcement Learning

Neufeld, Emery A.; Bartocci, Ezio; Ciabattoni, Agata

doi:10.1007/978-3-031-21203-1_5

DC Field

Value

Language

dc.contributor.author

Neufeld, Emery A.

dc.contributor.author

Bartocci, Ezio

dc.contributor.author

Ciabattoni, Agata

dc.date.accessioned

2022-12-20T14:09:18Z

dc.date.available

2022-12-20T14:09:18Z

dc.date.issued

2022

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Neufeld, E. A., Bartocci, E., & Ciabattoni, A. (2022). On Normative Reinforcement Learning via Safe Reinforcement Learning. In <i>PRIMA 2022: Principles and Practice of Multi-Agent Systems - Proceedings</i> (pp. 72–89). https://doi.org/10.1007/978-3-031-21203-1_5</div> </div>

dc.identifier.uri

http://hdl.handle.net/20.500.12708/136993

dc.description.abstract

Reinforcement learning (RL) has proven a successful technique for teaching autonomous agents goal-directed behaviour. As RL agents further integrate with our society, they must learn to comply with ethical, social, or legal norms. Defeasible deontic logics are natural formal frameworks to specify and reason about such norms in a transparent way. However, their effective and efficient integration in RL agents remains an open problem. On the other hand, linear temporal logic (LTL) has been successfully employed to synthesize RL policies satisfying, e.g., safety requirements. In this paper, we investigate the extent to which the established machinery for safe reinforcement learning can be leveraged for directing normative behaviour for RL agents. We analyze some of the difficulties that arise from attempting to represent norms with LTL, provide an algorithm for synthesizing LTL specifications from certain normative systems, and analyze its power and limits with a case study.

dc.description.sponsorship

WWTF Wiener Wissenschafts-, Forschu und Technologiefonds

dc.language.iso

dc.relation.ispartofseries

Lecture Notes in Computer Science

dc.subject

Safe Reinforcement Learning

dc.subject

temporal logic

dc.subject

deontic logic

dc.title

On Normative Reinforcement Learning via Safe Reinforcement Learning

dc.type

Inproceedings

dc.type

Konferenzbeitrag

dc.relation.isbn

978-3-031-21203-1

dc.relation.doi

10.1007/978-3-031-21203-1_5

dc.description.startpage

dc.description.endpage

dc.relation.grantno

MA16-028

dcterms.dateSubmitted

2022

dc.type.category

Full-Paper Contribution

tuw.booktitle

PRIMA 2022: Principles and Practice of Multi-Agent Systems - Proceedings

tuw.container.volume

13753

tuw.book.ispartofseries

Lecture Notes in Computer Science (LNCS)

tuw.project.title

Werkzeuge für logisches Schließen in der Deontischen Logik und Anwendungen auf heilige indische Schriften

tuw.researchTopic.id

tuw.researchTopic.name

Computer Science Foundations

tuw.researchTopic.value

100

tuw.linking

https://www.researchgate.net/publication/365341772_On_Normative_Reinforcement_Learning_via_Safe_Reinforcement_Learning

tuw.publication.orgunit

E192-05 - Forschungsbereich Theory and Logic

tuw.publication.orgunit

E191-01 - Forschungsbereich Cyber-Physical Systems

tuw.publication.orgunit

E194-04 - Forschungsbereich Data Science

tuw.publisher.doi

10.1007/978-3-031-21203-1_5

dc.description.numberOfPages

tuw.author.orcid

0000-0001-5998-3273

tuw.author.orcid

0000-0002-8004-6601

tuw.author.orcid

0000-0001-6947-8772

tuw.event.name

The 24th International Conference on Principles and Practice of Multi-Agent Systems (Prima 2022)

tuw.event.startdate

16-11-2022

tuw.event.enddate

18-11-2022

tuw.event.online

Hybrid

tuw.event.type

Event for scientific audience

tuw.event.place

Valencia

tuw.event.country

tuw.event.presenter

Ciabattoni, Agata

wb.sciencebranch

Informatik

wb.sciencebranch.oefos

1020

wb.sciencebranch.value

100

item.languageiso639-1

item.grantfulltext

restricted

item.fulltext

no Fulltext

item.openairetype

conference paper

item.openairecristype

http://purl.org/coar/resource_type/c_5794

item.cerifentitytype

Publications

crisitem.project.funder

WWTF Wiener Wissenschafts-, Forschu und Technologiefonds

crisitem.project.grantno

MA16-028

crisitem.author.dept

E192-05 - Forschungsbereich Theory and Logic

crisitem.author.dept

E191-01 - Forschungsbereich Cyber-Physical Systems

crisitem.author.dept

E192-05 - Forschungsbereich Theory and Logic

crisitem.author.orcid

0000-0002-8004-6601

crisitem.author.parentorg

E192 - Institut für Logic and Computation

crisitem.author.parentorg

E191 - Institut für Computer Engineering

crisitem.author.parentorg

E192 - Institut für Logic and Computation

Appears in Collections:

Conference Paper

Show simple item record

Google Scholar^TM

Check

Google ScholarTM

Google Scholar^TM