Human Robot Intergation; Robot Behavior Generation; Deep Learning; Diffusion Models
en
Abstract:
Natural social interactions involve two agents exhibiting smooth and diverse behaviors that align with each other's intent in real time. Creating this level of expressiveness in human–robot interaction (HRI) requires a robot to go beyond simple reactive behaviors and instead anticipate the rich distribution of possible human actions, enabling responses that are diverse, human-like, and socially aligned. This thesis bridges the gap between complex generative modeling and actual robotic deployment by integrating visual perception, context-aware motion generation, and physical-hardware execution into a single coherent system. At the core of the system lies a latent diffusion framework designed for the joint generation of two-person social interactions. Given past context and a high-level interaction description, our model generates potential future motions for both agents in an interdependent manner. By operating within a temporally coherent latent space, the framework ensures smooth, aligned motion segments while significantly reducing computational overhead to support live interaction. To achieve real-time generation, the model is integrated into a continuous streaming pipeline that combines chunked diffusion inference with real-time SMPL-X pose estimation from a single RGBD camera, eliminating the need for restrictive motion capture systems and enabling continuous prediction from live human input. The framework is demonstrated both in simulation and through real-world experiments with Tiago++ and Unitree G1 robots, with generated reactor motion retargeted online to each platform's embodiment. Ultimately, this thesis provides a robust solution for diverse and responsive motion generation, advancing the development of socially aware robots capable of engaging with humans naturally and adaptively under realistic conditions.
en
Additional information:
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft