Enabling joint attention in human-robot interaction through gaze

Koller, Michael

doi:10.34726/hss.2023.112882

Record link:

https://doi.org/10.34726/hss.2023.112882
http://hdl.handle.net/20.500.12708/177521

Title:

Enabling joint attention in human-robot interaction through gaze

Citation:

Koller, M. (2023). Enabling joint attention in human-robot interaction through gaze [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.112882

reposiTUm DOI:

10.34726/hss.2023.112882

CatalogPlus:

AC16864324

Publication Type:

Thesis - Dissertation

Language:

English

Authors:

Koller, Michael

Advisor:

Vincze, Markus

Co-advisor:

Weiss, Astrid

Referee:

Belpaeme, Tony

Organisational Unit:

E376 - Institut für Automatisierungs- und Regelungstechnik

Date (published):

2023

Number of Pages:

140

Keywords:

Mensch; Roboter; Interaktion; Blick

Human; Robot; Interaction; Gaze

Abstract:

Menschen benutzen ihr Blickverhalten, um ihre Handlungen in Joint Attention Szenarien zu koordinieren. Kollaborative Serviceroboter sollen diese kommunikative Modalität ebenfalls nutzen, um mit Menschen während gemeinsamer Aufgaben flüssig zu interagieren. Sie müssen ihren eigenen Aufmerksamkeitsfokus signalisieren um ihre Absichten und Ziele zu kommunizieren. Gleichermaßen müssen Roboter auch soziale Hinweise der interagierenden Person wahrnehmen um ihren Aufmerksamkeitsfokus und ihre Ziele daraus abzuleiten.Dies eröffnet ein vielschichtiges Problemfeld, und die psychologische Forschung hat mehrere Komponenten der Joint Attention beschrieben. Verschiedene verhaltensbezogene und kognitive Aspekte werden auf unterschiedlichen zeitlichen Auflösungsebenen kategorisiert, von kurzfristigen Verhaltensweisen wie Blickabwendung bis hin zu langfristigen kognitiven Fähigkeiten wie Theory of Mind. Für diese Phänomene sind unterschiedliche wissenschaftliche und technologische Forschungsansätze anwendbar.Die jeweiligen Erkenntnisse müssen integriert werden, um zu einer funktionierenden Roboterimplementierung zu gelangen.In dieser Arbeit werden Erkenntnisse zu verschiedenen Aspekten der Joint Attention inder Mensch-Roboter-Interaktion vorgestellt. Es werden deren Zusammenhänge erörtert in Bezug auf die zeitliche Auflösung, konzeptionelle Herausforderungen, Abbildungen zwischenmenschlichen und technologischen kognitiven Fähigkeiten sowie die Nutzungtechnologischer Ansätze zur Nachahmung menschlicher Joint Attention-Fähigkeiten.Wir führen empirische Forschung zur Mensch-Roboter-Interaktion durch um festzustellen, ob Abweichungen von den vom Menschen inspirierten Blickparametern für Roboter anwendbar sind, ohne die Interaktion mit dem Menschen zu beeinträchtigen.Eine anschließende Analyse der psychologischen Forschung dient als Grundlage für die Entwicklung einer stochastischen Blicksteuerung, die aus Interaktionen zwischen Menschen während gemeinsamer Aktionen abgeleitet wurde.Durch einen neuartigen algorithmischen Ansatz wird die Planerkennung aus Videodaten in manipulationslastigen Aufgabenbereichen ermöglicht, wobei lediglich auf Standard-Robotersysteme wie Objekterkennung und klassische Planung zurückgegriffen wird. Ein Virtual-Reality-Simulator und ein Datensatz liefern Daten für komplexe Objektmanipulationsaufgaben mit langem Zeithorizont und detaillierter Annotation,einschließlich mehrerer Bildsequenzen, Objektposen und logischer Prädikate.Unser neuartiger algorithmischer Ansatz für eine erweiterte Variante des assistiven mehrarmigen Banditen verbessert die Leistung des Mensch-Roboter-Teams, wenn der Mensch gemäß einer systematischen irrationalen Verzerrung aus der Literatur handelt.

Humans employ gaze to coordinate their actions in joint attention scenarios. Collaborative service robots must leverage this communicative modality to fluently interactwith humans during joint action tasks. They must signal their own attentional focus,thus communicating their intentions and goals, and process social cues relating to the collaborator’s attentional focus and goal.This opens up a multifaceted problem space and psychological research has revealed several distinct constituting components of joint attention. Various behavioral and cognitive aspects are categorized on different temporal resolution levels, from short-termbehaviors such as gaze aversion, up to long-term cognitive capabilities such as Theory of Mind. For these phenomena, different scientific and technological research approaches are applicable. The respective findings must be integrated to arrive at a working implementation on a robotic system.This thesis presents findings for multiple aspects of joint attention in human-robotinteraction. It discusses their interrelation with respect to temporal resolution, conceptual challenges, mappings between human and technological cognitive capabilities, andhow to leverage technological approaches to emulate human joint attention capabilities.Empirical human-robot interaction research is performed to determine whether deviations from human-inspired gaze parameters are viable for robots without deteriorating the interaction with a human. A subsequent review of psychological research informs the design of a stochastic gaze controller derived from human-human interaction dataduring successful joint action tasks.Through a novel algorithmic approach, plan recognition from video data in manipulation heavy task domains is made possible while only relying on standard robotic systems such as object detection and classical planning. A virtual reality simulator and dataset provide samples of complex, long time-horizon object manipulation tasks with detailed annotation, including multiple image sequences, object poses, and logical predicates.Our novel algorithmic approach to an expanded setting of the assistive multi-armed bandit problem improves human-robot team performance when the human acts according to an empirically documented systematic irrational bias.We discuss the interrelation of the different contributions and propose methods for their integration. Throughout the thesis, we show how ongoing concerns about therobotic research setting, the use-case scenario cannot be disregarded. Design assumptions and interaction aspects outside of the given research setting must be critically evaluatedin order to emulate the breadth and depth of human social interactions.

Additional information:

Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

License:

In Copyright

Appears in Collections:

Thesis