Neufeld, E. A. (2023). Norm compliance for reinforcement learning agents [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.112881
With the impending advent of AI technologies that are deeply embedded in daily life -- such as autonomous vehicles, elder care robots, and robot nannies -- comes a natural apprehension over whether they can integrate smoothly with human society. From these concerns arises a question: can we impose norms -- be they ethical, legal, or social -- on these technologies while preserving the effectiveness of their performance? This proves a difficult question to answer in the presence of machine learning technologies, which are notoriously opaque and unpredictable. Reinforcement learning (RL) is a powerful machine learning technique geared toward teaching autonomous agents goal-directed behaviour in stochastic environments through a utility function. RL agents have proven capable of exhibiting complex behaviours on par with or beyond the abilities of expert human agents, and have also been a subject of interest for machine ethicists; it has been conjectured by many that RL might prove capable of delivering a positive answer to the above question. Indeed, there are already many attempts to implement an ``ethical agent'' with RL. However, these attempts largely ignore the complexities and idiosyncrasies of normative reasoning. Normative reasoning is the purview of the diverse field of Deontic Logic -- the logic of obligations and related notions -- which has yet to receive a meaningful place in the literature on ``ethical'' RL agents. In the following work, we will explore how RL can fall short of the goal of producing an ethical (or rather, normatively compliant) agent; this includes even more powerful developments like safe RL under linear temporal logic (LTL) constraints, due to the limits of LTL as a logic for normative reasoning. Even so, we provide a method for synthesizing LTL specifications that reflect the constraints deducible from certain normative systems. We will then present an alternative framework for imposing normative constraints from the perspective of altering the internal processes of an RL agent to ensure behaviour that complies (as much as possible) with a normative system. To actuate this process, we propose a module called the Normative Supervisor, which facilitates the translation of data from the agent and a normative system into a defeasible deontic logic, leveraging a theorem prover to provide recommendations and judgements to the agent. This allows us to present Online Compliance Checking (OCC) and Norm-Guided Reinforcement Learning (NGRL) for eliciting normatively compliant behaviour from an RL agent. OCC involves, in each state, filtering out from the agent's arsenal actions that do not comply with a normative system in that state, preventing the agent from taking actions that violate the normative system. When no compliant actions exist, a ``lesser evil'' solution is presented. In NGRL, the agent is trained with two objectives; its original task and a normative objective borne out in a utility function that punishes the agent when it transgresses the normative system. We show through a thorough series of experiments on RL agents playing simple computer games -- constrained by the wide variety of normative systems that we present-- that these techniques are effective, albeit flawed, and best utilized in tandem.