Design and Evaluation of Different Selection Metaphors for a Dissimilar Co-Embodied Avatar in Virtual Reality DIPLOMARBEIT zur Erlangung des akademischen Grades Diplom-Ingenieur im Rahmen des Studiums Visual Computing eingereicht von Gabriel Ratschiller, BSc Matrikelnummer 11778247 an der Fakultät für Informatik der Technischen Universität Wien Betreuung: Univ.Prof. Mag.rer.nat. Dr.techn. Hannes Kaufmann Mitwirkung: Univ.Ass. Hugo Brument, PhD Wien, 21. November 2024 Gabriel Ratschiller Hannes Kaufmann Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.at Design and Evaluation of Different Selection Metaphors for a Dissimilar Co-Embodied Avatar in Virtual Reality DIPLOMA THESIS submitted in partial fulfillment of the requirements for the degree of Diplom-Ingenieur in Visual Computing by Gabriel Ratschiller, BSc Registration Number 11778247 to the Faculty of Informatics at the TU Wien Advisor: Univ.Prof. Mag.rer.nat. Dr.techn. Hannes Kaufmann Assistance: Univ.Ass. Hugo Brument, PhD Vienna, November 21, 2024 Gabriel Ratschiller Hannes Kaufmann Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.at Erklärung zur Verfassung der Arbeit Gabriel Ratschiller, BSc Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die verwen- deten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der Arbeit – einschließlich Tabellen, Karten und Abbildungen –, die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter Angabe der Quelle als Entlehnung kenntlich gemacht habe. Wien, 21. November 2024 Gabriel Ratschiller v Danksagung Ich möchte diese Gelegenheit nutzen, um mich bei allen, die mich während meines Studi- ums und insbesondere während meiner Diplomarbeit unterstützt haben, zu bedanken. Mein Dank gilt zunächst meinen überaus engagierten Betreuern, Hannes Kaufmann und Hugo Brument, die mich während der gesamten Zeit der Diplomarbeit mit wertvollem Input unterstützt haben. Der unkomplizierte E-Mail-Austausch und die regelmäßigen Meetings waren eine enorme Hilfe bei der Durchführung der Arbeit. Mein besonderer Dank gilt meinen wunderbaren Eltern, die nicht nur mein Studium finanziell unterstützt haben, sondern mir auch stets das Gefühl gegeben haben, das Richtige zu tun, und nie an meinem Erfolg gezweifelt haben. Des Weiteren danke ich meinem Bruder, der sich nie gescheut hat, mir die eine oder andere (mehr oder weniger) wertvolle Lebensweisheit mit auf den Weg zu geben. Ein besonderer Dank gilt auch meiner Freundin, die mich vom ersten Tag meiner Di- plomarbeit an unterstützt und an mich geglaubt hat. Zum Schluss möchte ich mich auch bei allen TeilnehmerInnen der Nutzerstudie bedanken. Diese Arbeit ist meinen beiden verstorbenen Omas gewidmet. vii Acknowledgements I would like to take this opportunity to thank everyone who has supported me during my studies and especially during my diploma thesis. First of all, I would like to thank my extremely dedicated supervisors, Hannes Kaufmann and Hugo Brument, who provided me with valuable input throughout the entire time I was writing my thesis. The uncomplicated exchange of emails and regular meetings were an enormous help in completing the thesis. My special thanks go to my wonderful parents, who not only supported my studies financially but also always made me feel that I was doing the right thing and never doubted my success. I would also like to thank my brother, who has never shied away from giving me some (more or less) valuable life wisdom along the way. Special thanks also go to my girlfriend, who supported and believed in me from day one of my diploma thesis. Finally, I would like to thank all the participants of the user study. This thesis is dedicated to my two late grandmothers. ix Kurzfassung Die virtuelle Realität (VR) ermöglicht immersive Erfahrungen, bei denen die Nutzer über Avatare mit computergenerierten Umgebungen interagieren. Diese Interaktion basiert auf dem “Embodiment“, dem psychologischen Gefühl, in einem virtuellen Körper zu “stecken“. Dieses “Gefühl der Verkörperung“ (Sense of Embodiment, SoE) trägt wesentlich dazu bei, dass die Nutzer die VR-Umgebung als realistisch empfinden und erleben. Herkömm- liche VR-Schnittstellen stützen sich jedoch häufig auf menschenbezogene Auswahl- und Manipulationsmetaphern (z. B. Hand oder Cursor), die sich möglicherweise nicht gut auf nicht-menschliche Avatare übertragen lassen, das sind Avatare, die sich (manchmal stark) von der menschlichen Anatomie unterscheiden. Vor allem “Co-Embodiment“-Szenarien, bei denen sich mehrere Nutzer die Kontrolle über einen einzigen Avatar teilen, erfordern geeignete Interaktionsmetaphern, da sich herkömmliche Metaphern oft als unzureichend erweisen. Diese Arbeit beschreibt den Entwurf, die Implementierung und die Evaluierung von Auswahl- und Manipulationsmetaphern in VR für einen nicht-menschlichen, ko-verkörperten Avatar. Eine Interaktionsplattform wurde innerhalb einer bestehenden Multi-User-VR- Umgebung mit der Unity3D Software entwickelt. Diese Plattform ermöglicht es den Nutzern, die Kontrolle über einen nicht-menschlichen Avatar zu teilen und mit virtuellen Objekten in der Umgebung zu interagieren. Es wurden verschiedene Interaktionsme- taphern entwickelt, die gut zu den Fähigkeiten des nicht-menschlichen Avatars passen. Insbesondere wird in dieser Arbeit untersucht, wie sich verschiedene Auswahl- und Interak- tionsmetaphern auf die Benutzererfahrung, das SoE und die Ko-Präsenz in VR auswirken. Darüber hinaus wurden die implementierten Interaktionsmetaphern in einer Benutzer- studie evaluiert, um zu verstehen, welche Metaphern geeignet sind und für zukünftige Entwicklungen der gemeinsamen Steuerung von ko-verkörperten, nicht-menschlichen Avataren in VR in Betracht gezogen werden können. Die Ergebnisse zeigten, dass es in der Tat Unterschiede zwischen den Interaktionsmetaphern in Bezug auf die Leistung, die Benutzererfahrung und das SoE gibt. Sie zeigten auch, dass die entwickelten Interakti- onsmetaphern für den nicht-menschlichen Avatar geeignet sind und eine Grundlage für zukünftige Weiterentwicklungen bilden. xi Abstract Virtual Reality (VR) enables immersive experiences in which users interact with computer- generated environments through avatars. This interaction is based on embodiment, the psychological feeling of “being in“ a virtual body. The user’s perception of realism and experience in the VR environment is strongly influenced by the “Sense of Embodiment“ (SoE). However, traditional VR interfaces often rely on human-centric selection and manipulation metaphors (e.g., hand or raycast) that may not translate well to dissimilar avatars, i.e., avatars that differ (sometimes greatly) from human anatomy. In particular, co-embodied scenarios, where multiple users share control of a single avatar, require appropriate selection and interaction metaphors, as traditional metaphors often prove inadequate. This thesis describes the design, implementation, and evaluation of selection and manipu- lation metaphors in VR for a dissimilar co-embodied avatar. An interaction platform was developed within an existing multi-user VR environment built using Unity3D software. This platform allows users to share control of a dissimilar avatar and interact with virtual objects in the environment. Several interaction metaphors have been developed that fit well with the capabilities of the dissimilar avatar. In particular, this thesis investigates how different interaction metaphors affect user experience, SoE, and co-presence in VR. Furthermore, the implemented interaction metaphors were evaluated in a user study in order to understand which metaphors are suitable and can be considered for future developments of shared control of co-embodied dissimilar avatars in VR. The results showed that there are indeed differences between the interaction metaphors in terms of performance, sense of agency, and SoE. They also showed that the developed interaction metaphors are suitable for the dissimilar avatar and form a basis for further development in the future. xiii Contents Kurzfassung xi Abstract xiii Contents xv 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Aim of the Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Related Work 5 2.1 Virtual Reality and Avatars . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Body perception and Sense of Embodiment . . . . . . . . . . . . . . . 6 2.3 Dissimilar avatars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Co-Embodiment in Virtual Reality . . . . . . . . . . . . . . . . . . . . 12 2.5 Inverse Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.6 Selection Metaphors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3 Avatar Design and Implementation 21 3.1 Overview and Requirements . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Virtual Reality Environment . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 Selection Metaphors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4 FABRIK IK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.5 Evaluation Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4 Evaluation 43 4.1 Study Design and Hypotheses . . . . . . . . . . . . . . . . . . . . . . . 43 4.2 Technical Setup and Equipment . . . . . . . . . . . . . . . . . . . . . . 45 4.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.5 Data Collection and Analysis . . . . . . . . . . . . . . . . . . . . . . . 49 4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 xv 5 Discussion 63 5.1 User Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.2 Sense of Embodiment and Agency . . . . . . . . . . . . . . . . . . . . 64 5.3 User preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6 Conclusion 69 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 List of Figures 71 List of Tables 73 Bibliography 75 Appendix 80 CHAPTER 1 Introduction Virtual Reality (VR) is a tool that allows users to immerse themselves in a computer- generated world and interact with the environment using a head-mounted display (HMD) and hand-held VR controllers. VR hardware and devices have grown more accessible to consumers in recent years and are widely used in both the commercial and consumer sectors. As VR hardware becomes more widely available, its use has expanded to include healthcare, education and entertainment [5]. One of the key success factors of VR is its ability to create experiences that cannot be created with traditional 2D screens. By combining the virtual and real worlds, users can immerse themselves in a fantastic experience and escape the physics and conventions of the real world. Instead of only viewing a virtual environment, VR creates the impression that one is actually a part of it. A key factor that defines whether a user is genuinely “immersed“ in the virtual world is their “Sense of Presence“ (SoP) [8]. SoP is defined as the user’s subjective feeling of “being there“ and is influenced by both user characteristics (e.g., previous VR experience, concentration, tendency to motion sickness, and expectations of the VR experience) and media characteristics (e.g., audiovisual presentation of the content, response time, or size of the visual field). The stronger the SoP, the more the user feels part of the virtual world. To give the user the feeling of being in and interacting with the virtual world, the user is often placed inside a virtual representation by replacing parts of their body with a virtual body. This virtual body, also known as an “avatar“, serves as a visual orientation aid for users who are wearing an HMD and cannot see their real body. This virtual representation therefore assists the user in interacting with the virtual environment and enables them to execute their own movements precisely. The user’s movements are mapped to the limbs of the virtual avatar and animated based on the user’s motions. The better the translation from the user’s motions to the animated virtual limbs is, i.e., the more realistic the virtual movements are, the more the user feels moving in the virtual environment and the stronger the immersion. This feeling of “being in“ the virtual body 1 1. Introduction is known as embodiment [26]. Every VR experience heavily relies on this “Sense of Embodiment“ (SoE), since a poor SoE due to inadequate animations and representations of the user’s movements in VR can ruin the illusion of presence in the virtual world. This can negatively affect the VR experience and even cause physical symptoms such as cybersickness, dizziness, or nausea [40, 10, 15, 27]. Traditionally, VR avatars represent the user’s physical body and allow the user to control them based on familiar movements. But this technology goes even beyond this paradigm. Avatars can take on completely different virtual forms from the user’s physical form. The spectrum ranges from subtle variations such as altered skin tones or facial expressions to more extreme modifications that transform the user into an animal, a fantasy creature, or beings with unusual body parts [7]. In addition to their unusual appearance, avatars can also differ from human avatars in how the user controls them and which virtual body parts are controlled with which user interactions. Such dissimilar avatars have been studied more and more frequently recently as they can be relevant for use in therapeutic, social, or educational applications [31, 49]. Another emerging topic is co-embodiment, which refers to a single virtual avatar controlled by multiple users or agents simultaneously [16]. Both users’ movements can be averaged and mapped to the avatar, or each user can control a part of the avatar individually. This collaborative and shared VR experience can be used in areas such as rehabilitation, skills training and support for disabled users [20]. Studies show that this form of VR can be particularly beneficial in a learner-teacher environment, where a teacher controls the same avatar as a student, creating a shared learning experience [29]. 1.1 Motivation With the ability to simulate all avatars in virtual worlds, creating experiences that deeply immerse the user and promote a high SoE is challenging. Avatars must have appropriate proportions, reflect the user’s movements well, and not make unexpected movements; otherwise, the VR experience could be significantly compromised. For this reason, much research has focused on SoE, particularly with human-like avatars [21, 31]. In recent years, however, a new area of research has emerged that focuses on dissimilar avatars. This field complements the research on human-like avatars with new ways of integrating non-human avatars into the use of VR. When designing interactions with objects and environments for such (human-like or dissimilar) avatars, there are challenges to overcome to make the experience as user- friendly as possible and to ensure a strong SoE. A key challenge lies in the design of selection metaphors, i.e., the methods used for object selection and interaction in VR. Research shows that a well-designed selection metaphor can significantly impact SoE and usability [12, 22]. When co-embodied dissimilar avatars are involved, the requirement for intuitive interaction metaphors that generate a strong SoE becomes even more relevant. Careful thought must 2 1.2. Aim of the Work be given to how co-embodied avatars interact with their surroundings in co-embodiment settings and how to design such co-embodiment scenarios. As the user suddenly no longer has their familiar physical form, novel selection metaphors need to be developed that use the virtual avatar’s physical capabilities to enable intuitive interactions. Traditional VR interfaces typically rely on human-centric hand or cursor metaphors for selection [2, 33]. While proven useful for human-like avatars, they may not work well with dissimilar avatars with different physical capabilities. Therefore, there is a need for intuitive selection metaphors in co-embodied scenarios with dissimilar avatars, especially because of the unusual freedom of movement and the potential for mutual influence between the users. This lack of intuitive selection methods can hinder the user experience and limit the possibility of co-embodied interaction with dissimilar avatars in VR, thus reducing the user’s SoE. Co-embodiment research is still in its early stages, especially when it comes to dissimilar avatars. Although co-embodiment studies have been conducted with human-like avatars, the study of dissimilar avatars is a novel area of investigation that introduces new design considerations and opportunities. We want to address this lack of research in our thesis. 1.2 Aim of the Work This thesis investigates the design and evaluation of suitable selection and manipulation metaphors for a dissimilar avatar in a co-embodied interaction in VR. To achieve this, an experimental platform is developed within an existing multi-user VR framework with a dissimilar avatar with four arms and two heads. This allows users to share control of a dissimilar avatar and interact with virtual objects in the environment. We develop two interaction metaphors specifically tailored to the capabilities of the dissimilar avatar, whose movements are synchronized over the network to ensure a real-time collaborative interaction setting for both users. Further, we assess how these interaction metaphors can influence the user’s agency and SoE. To evaluate the effectiveness of these metaphors, we develop a series of interaction tasks in the VR environment. These tasks aim to assess the usability, efficacy, and user experience of each interaction metaphor. The goal is to evaluate how well the metaphors assist users in completing tasks that require them to select and manipulate virtual objects while co-embodied in the dissimilar avatar. Finally, a user study is conducted to collect objective and subjective data composed of performance metrics (such as task completion time), user preferences for the proposed selection metaphors, and user experience (such as sense of agency or SoE). By examining this data, we want to determine the most practical and efficient techniques for selecting co-embodied objects with dissimilar avatars and investigate the variables affecting SoE in this particular VR interaction setting. 3 1. Introduction 1.3 Methodology First, a structured literature review of existing research on selection and interaction metaphors in VR is conducted. The review focuses explicitly on metaphors relevant to dissimilar co-embodied avatars. Further, literature on the current state of the art about the influence of various selection metaphors on user agency and interaction within VR environments featuring co-embodied avatars is examined. In addition, the latest research in the fields of VR and SoE, dissimilar avatars, co-embodiment, and inverse kinematics is presented and discussed. Using the Unity3D engine, an experimental platform is designed and developed in which users co-embody and control a dissimilar avatar. To interact with objects in the VR environment with this avatar, two interaction metaphors for selecting and manipulating objects are designed and developed. These interaction metaphors are adapted to the conditions of the dissimilar avatar and exploit the strengths of the avatar’s morphology. Users then perform different tasks using different interaction metaphors. In addition, the movements when performing the tasks with the interaction metaphors are synchronized. To achieve this, both users are connected via the network and can control and interact with the avatar in real-time. The usability and user experience of the developed selection metaphors are evaluated with a user study, collecting objective and subjective data. The user study consists of measured user performance data when performing a selection task as well as questionnaires. The user performs a series of tasks in different body configurations, providing feedback on the sense of agency and the preferred interaction metaphor. The collected data on user performance (e.g., task completion time), as well as the subjective data gathered in the questionnaires, are then analyzed to identify the most usable and preferred selection metaphor for dissimilar co-embodied avatars. 1.4 Structure of the Thesis This thesis is structured as follows: Chapter 2 provides a comprehensive overview of the existing literature on the topics covered in this thesis. First, a general overview of VR and the SoE is given, followed by the state of the art on dissimilar avatars as well as co-embodiment in VR. Furthermore, a discussion about Inverse Kinematics (IK) is given before moving on to literature on selection metaphors for VR interaction. Chapter 3 describes the design and implementation of the avatar and the VR environment, including the developed interaction metaphors. It goes into detail about implementing the interaction system and explains how the interaction with the VR environment works. Finally, the design and implementation of the user study tasks are explained. In chapter 4, the user study conducted is described, including the design of the user study, the experimental procedures, and the data collection methods. The evaluated results of the data analysis are summarized. Chapter 5 discusses the user study results and highlights the limitations of the experimental platform described. Finally, chapter 6 summarizes the thesis and provides an outlook for future work. 4 CHAPTER 2 Related Work This chapter provides an overview of the state of the art of the concepts and technologies relevant to this thesis. First, we introduce VR and the role of avatars (section 2.1), then discuss the concept of body perception and SoE, and related influencing factors such as self-location, body ownership, and agency (section 2.2). We then present current research on dissimilar avatars and the idea behind exploring them (section 2.3), followed by an overview of co-embodiment in VR (section 2.4). In the section on Inverse Kinematics (IK) (section 2.5), we give an overview of the IK algorithm, which has also proven useful for animating the limbs of a dissimilar avatar. Finally, an overview of VR selection metaphors is given (section 2.6), with a focus on metaphors relevant for controlling dissimilar avatars. 2.1 Virtual Reality and Avatars VR technology allows users to enjoy interactive experiences and become completely immersed in a fictional world. Recent significant advances in the technology and the development of more powerful and affordable hardware have made it available to a wider range of consumers. Users can interact with virtual worlds and have fictional experiences that feel real thanks to wearable controllers and head-mounted displays (HMDs). Various industries, including gaming, education, training, entertainment, and even healthcare, are using VR applications [5]. Virtual representations of people are often one of the many building blocks of a VR experience. These representations can exist independently of the user in the virtual world, or they can be directly controlled by the user. In this case, it is called an “avatar“. An avatar is their virtual representation in the digital world. Avatars are an integral part of VR experiences and can be anything from realistic human-like figures to non-human beings and fantastical creatures [31, 23, 30]. They are versatile and allow users to express 5 2. Related Work their identity (reflecting their personal style if customized), interact with other users in virtual environments, and, most importantly, enhance immersion. 2.2 Body perception and Sense of Embodiment 2.2.1 Body perception An important concept in any VR experience is the concept of body perception. This describes the way people perceive and experience their own virtual bodies [38]. Depending on the characteristics of the virtual avatar and the expectations a person has when interacting with the virtual version of themselves, this feeling can be greatly affected by the use of VR technology and can vary in intensity [48]. Neyret et al. [38] investigated how people perceive and evaluate the shape of an avatar based on their own body size compared to a body based on their own body shape estimations and their ideal body in a VR environment. These three distinct avatars were made by the researchers, and the participants were required to observe and assess them once in the first-person perspective and once in the third-person perspective. The results of the study show that there is a difference in body perception depending on the setting the users are immersed in. For example, female participants in the study rated their real bodies as more attractive when it was viewed from the third-person perspective. Another finding is that viewing one’s own body from the third-person perspective helped to reduce body dissatisfaction. Another study looked at body weight perception in relation to self-perception and the perception of others [48]. Participants were asked to estimate the body weight of a virtual avatar once when they themselves embodied a photorealistic avatar and performed movements in front of a mirror and once when they watched an avatar controlled by another person perform the movements. It turned out that participants underestimated the weight of the virtual avatar when they controlled it themselves, compared to when they only observed another avatar. These results suggested that one’s own body perception in VR depends on how the virtual avatar is presented and how personalized it is. Furthermore, one’s own body perception also influences body perception in the virtual world. However, body perception can also be manipulated by giving people the feeling that an object that does not actually belong to their body is perceived as their own body. In the famous rubber hand illusion experiment, Botvinick et al. [6] showed that participants felt a sense of ownership towards a fake rubber hand after stimulating it visually and tactilely in synchrony with their own hidden hand. This shows that a high level of ownership can be an important factor in the success of virtual simulations and experiences. 2.2.2 Sense of Embodiment The “Sense of Embodiment“ (SoE), which refers to the feeling users have when using VR applications that the virtual body is their own and that they are in control of it, plays 6 2.2. Body perception and Sense of Embodiment an important role in the VR experience and is related to the topic of body perception in VR. Kilteni et al. [26] tried to generalize the definition of this concept of embodiment and described it as a combination of three sub-components: the sense of self-location (SL), the sense of agency (AG), and the sense of body ownership (BO). Self-location represents the feeling of “being in“ the virtual body, while agency refers to the feeling of being able to control and direct the actions of the virtual body. Finally, body ownership describes the feeling that an artificial limb or an entire body belongs to the person immersed in VR. As users often experience SoE unconsciously to a certain extent when being embodied in a virtual avatar, these three components of embodiment play a crucial role in the overall VR experience and immersion. There are several approaches to measuring SoE in users [26]. Measures for addressing the SoE can be: • Questionnaires (for all three sub-components) • Physiological response to threat. Can be measured using the “Skin Conductance Response“-test, which measures a short-term drop in the electrical conductivity of the skin in response to a thread (SL and BO). • Estimation of body position (SL). • Estimation of body parts’ size (BO). • Proprioceptive estimations, by letting users assess their ability to move during appropriate movement tasks or through visual feedback, e.g., assess what they see in a mirror (BO). Various factors can influence the SoE and its sub-components. The sense of self-location can be impaired if users have the feeling that they are no longer in the virtual body and have out-of-body experiences (OBE) [26]. Studies have also shown that self-location suffers when users look at the avatar from a third-person perspective rather than a first-person perspective [11]. The strength of the sense of agency, on the other hand, depends largely on how quickly the visual feedback is displayed after the execution of a movement. If there is a time delay between the user’s actions and the visual feedback, the sense of agency suffers and, as a result, the SoE [11]. The agency is not only influenced by the synchronicity between the user’s actions and the resulting visualization but can also be influenced by the embodiment of the tools that the user controls [26]. The sense of body ownership is influenced by a combination of bottom-up and top-down influences. Bottom-up information refers to visuo-tactile stimuli, i.e., stimuli that are transmitted from the sensory organs to the nervous system and brain. Top-down refers to the processing of sensory stimuli, e.g., seeing a virtual avatar and judging whether it represents one’s own body or not [26]. The SoE, and the factors that influence it have been well studied. Porssut et al. [40] 7 2. Related Work (a) No body visibility. (b) Low body visibility. (c) Medium body visibility. Figure 2.1: The three body representations to test body ownership (first-person perspec- tive) [34]. investigated how reaching the articular limits of a virtual arm can negatively affect a person’s SoE in VR. The participants were asked to hold a physical cylinder mapped to a virtual cylinder and had to perform reaching movements with varying degrees of distortion between their real and virtual arms. The experimenter measured their perception of the distortion and their SoE while performing the tasks. The results showed that negative distortions (which hinder movement by mapping the virtual hand behind the real hand position) are more easily detected than positive distortions (where the virtual hand is ahead of the real hand, helping to reach objects). Further, reaching the articular limit reduced the participants’ SoE. The authors stated that this is because reaching the articular limit makes users more aware of a discrepancy between their real and virtual arm movements, which leads to a lower detection threshold for movement distortions and a reduced SoE. A further study examined the impact of different virtual hand representations on the factors user performance, sense of agency, and sense of ownership when interacting with virtual objects and obstacles within a VR environment [33]. Participants had to complete a task that involved selecting and positioning a cube in a virtual environment having different virtual hand representations (sphere, controller, hand) and with different obstacle conditions. The authors found that the hand representation led to the strongest sense of ownership. The controller representation, on the other hand, performed best for precise positioning tasks, leading to good user performance. The results of the mentioned studies demonstrated that a realistic and comprehensible depiction and animation of the avatar in VR has a significant influence on the degree of immersion of the user. Furthermore, the realism of the virtual avatar’s representation is of greater importance than the number of visible limbs, as found out by Lugrin et al. [34]. They stated that the number of visible parts of an avatar’s body has a negligible effect on virtual body ownership, immersion, or performance, and that “sensorial immersion and a well-calibrated motion control“ [34] are more important for a strong immersion. 8 2.3. Dissimilar avatars (a) Human avatar. (b) Robot avatar. (c) Block-like avatar. Figure 2.2: Types of virtual avatars. (a) and (b) are anthropomorphic avatars, (c) is a non-anthropomorphic (dissimilar) avatar [24]. In the study, there were three body conditions tested: no visible body, low visible body (hands and forearms), and medium visible body (head, torso, arms) (see Figure 2.1). In the “No Body“ condition, virtual controllers were shown in the absence of a visible avatar body. In the “Low“ and “Medium“ conditions, they gradually increased the number of visible body parts of the avatar, with the “Medium“ condition having a similarity to a human body of about 50%. However, it should be noted that the “No Body“ condition also showed virtual controllers to assist the user in performing the tasks. Therefore, they noted, it may not have accurately tested the complete absence of a virtual body. 2.3 Dissimilar avatars Most studies on VR, and on SoE in particular, usually explored anthropomorphic avatars or avatars with a human representation and rarely non-human (dissimilar) representation. This is understandable, as studies have shown that users prefer anthropomorphic represen- tations of themselves in VR to non-anthropomorphic ones [24]. It makes little difference whether the virtual avatar looks like a human or only has human-like characteristics (e.g., a robot with the same number of limbs and body proportions as a human); these avatars are preferred by users over block-like avatars (Figure 2.2). Furthermore, research has indicated that the discrepancy between the physical appearance and the virtual avatar can impact self-perception, user experience, and task performance [24, 42]. However, in VR, there are no limits to the appearance of an avatar. Therefore, in recent years there has been an increasing amount of research exploring the potential of dissimilar avatars. The use of such avatars in VR raises a number of questions about the user’s SoE and the impact on the user experience and agency. Several studies have already explored avatars with structural differences. One of the early studies in 2013 investigated body ownership and control of a humanoid avatar that has a tail [45] (see Figure 2.3a). Participants were divided into two groups, where the first group could control the tail with hip movements, while the other group had a tail that could not be controlled and moved randomly. Participants with the controllable tail felt a greater sense of ownership and control over the virtual body with the extra appendage. 9 2. Related Work Even though it is not a normal part of the human body, they were also able to learn to control the avatar’s tail. Participants were also more anxious and tried to avoid danger to the avatar’s body and tail if they had a greater sense of control over the avatar. (a) Human avatar with an additional tail [45]. (b) Six digit virtual hand avatar [21]. (c) Full-body (FB) animal avatar [31]. Figure 2.3: Different avatar representation reported in the literature. Another study came to a similar conclusion, in which Hoyet et al. [21] investigated how users perceived and accepted controlling a virtual avatar with structural differences, specifically a six-digit virtual hand (see Figure 2.3b). According to the authors, partici- pants felt a great sense of agency and ownership over the entire six-digit virtual hand, and they were more receptive to the extra animated finger as a component of the hand if it was animated than they were to its rigid, not-animated state. In addition to avatars with additional limbs, avatars that deviate completely from a human-like representation, such as animals or mythical creatures, were also studied [31, 23, 30, 49] (see Figure 2.3c). Players reported high enjoyment and SoP when embodying animal avatars in the VR games and appreciated the unique abilities and perspectives offered by the dissimilar avatars, which allowed them to do things they cannot do with human-like avatars. In addition, the differences between full-body (the player’s posture is mapped onto the 10 2.3. Dissimilar avatars Figure 2.4: Proposed categorization system for dissimilar avatars applied to a virtual hand [7]. entire virtual body) and half-body (the player’s lower body is mapped onto all the limbs of an animal) control modes and third-person control modes of the animal-like avatar were investigated, with full-body and half-body control modes being effective in creating a sense of virtual body ownership for animal avatars and outperforming third-person control modes. Xu et al. [49] have shown that in addition to the high levels of SoE experienced by research participants when embodying animals, virtual animal embodiment can even evoke human empathy for the animals. To foster empathy, the researchers created a system called “iStrayPaws“ that simulated the lives of stray animals using a Virtual Reality Perspective-Taking (VRPT)-based methodology. Participants had to find shelter, food, and escape from mistreatment while being embodied in different stray animals. The results showed that using such a VRPT system can significantly increase empathy for stray animals compared to a narrative-based task. With such a wide variety of dissimilar avatar types, it makes sense to categorize these avatars. A framework for classifying dissimilar avatars has recently been published by Cheymol et al. [7]. They categorized them into three main groups: structural, volumetric, and superficial aspect dissimilarities. Figure 2.4 gives an overview of the proposed classification. Dissimilar avatar types that differ from the real human body in terms of skeletal structure or morphology are termed “structural dissimilar“. These differences include, for example, changes in the number of limbs. The term “volume dissimilarity“ describes avatar types with different body sizes and proportions. Finally, avatars with dissimilar surface characteristics, such as unusual skin color, texture, or material, are referred to as “superficial aspect dissimilar“. Based on this categorization, dissimilar avatars can be better studied, understood, and compared. If an avatar falls into one or more of these categories, this can produce a stronger or weaker SoE in the user, which must be taken into account when designing dissimilar avatars [7]. 11 2. Related Work (a) One-for-one. (b) One-for-all. (c) Re-embodiment. (d) Co-embodiment. Figure 2.5: Social presence configurations of agents [35]. 2.4 Co-Embodiment in Virtual Reality The concept of co-embodiment refers to the feeling of sharing a virtual body with one (or more) other users and controlling it together. Co-embodiment was first studied in experiments utilizing two voice assistants and their physical embodiment as a single entity, represented by the same car [35]. In their user enactment study, the authors examined how people perceive and respond to different configurations of social presence. In particular, they examined how the participants respond to virtual agents that are based on human models (with each agent bound to a single body), those that are designed as a universal system (where a single agent controls multiple bodies), those that make use of re-embodying (where an agent can move its social presence from one body to another), and those that are able to co-embody (where an agent can join another agent within a single body). Figure 2.5 illustrates the four configurations of social presence. Although the participants in the study were not physically co-embodied in a virtual agent, the study revealed some new insights and challenges regarding this new topic and laid the foundation for further co-embodiment research. In a later study, Fribourg et al. [16] began to investigate how the users’ agency changes in a co-embodied scenario. The users shared control over a virtual avatar with varying degrees of control (full, partial, or no control) based on a weighted average of both users’ movements. The participants had to use the virtual arm to complete a number of grabbing tasks. Additionally, they were asked to report their sense of control of the arm while performing the tasks. The results demonstrated that while participants are able to estimate their true level of control, they frequently overestimate their sense of agency when they are able to predict the movements of the avatar. These findings suggested that even partial co-embodiment can create a sense of shared agency and ownership of the virtual body. These findings were supported by other studies [50, 20], including Hapuarachchi et al. [20], who investigated a co-embodied scenario where each user controlled a different arm and analyzed their SoE. The study used a shared avatar where one participant controlled the left side and the other controlled the right side, both in the first-person perspective, and participants had to reach and touch targets that appeared randomly. The experiment was conducted under three conditions. Same target (a shared target), different targets: visible (each participant had a different target), and different targets: invisible (as in visible, but the partner’s target was hidden). 12 2.4. Co-Embodiment in Virtual Reality Figure 2.6: Virtual avatar controlled based on the weighted average of the teacher’s and learner’s movements in a co-embodiment scenario. [29]. The results suggested that the SoE towards the own arm was higher than for the arm controlled by the partner. In the case where the partner’s target was visible, this visual information can help to improve the SoE towards the arm controlled by the other user. This was because visibility helps to predict the partner’s actions and improve the SoE of the uncontrolled arm. Kodama et al. [28] went one step further and investigated the extent to which co- embodiment is suitable as a method for improving the sense of agency and the acquisition of motor skills. They used an avatar whose movement was controlled by a weighted average of the movements of multiple users (see Figure 2.6). The percentage that determines how much each user’s movements contribute to the avatar’s final virtual movement is referred to as “weight“. The idea was based on the fact that students can learn motor skills by observing and imitating their teachers’ movements. During the experiment, two conditions were tested: a static and an adjusted weight control. In the static weight control condition, the weights were set to a fixed value of 50% throughout, while in the adjusted weight control condition, the weights were initially set to 50% and then adjusted based on the learner’s performance. The results showed that although the weight adjustment method prevented a drop in learner performance, it also led to lower learning efficiency after the virtual co-embodiment ended. In another study, Kodama et al. [29] investigated how learning efficiency changed when the learner was located in different embodiment scenarios: virtual co-embodiment with the teacher, sharing the teacher’s first-person perspective, and learning alone. As a result, the efficiency of learning motor skills was improved by learning in the virtual co-embodiment scenario with a teacher compared to learning alone or by sharing the teacher’s first-person perspective. 13 2. Related Work 2.5 Inverse Kinematics In order to provide the user with the highest possible SoE in the application, avatars must be animated according to the movements of the user. This requires a simple but robust algorithm that maps the positions of the real limbs as accurately as possible to the limb positions of the virtual avatar. One algorithm to achieve this is called Inverse Kinematics (IK). In this algorithm, an articulated chain represents the users’ body, where each joint in the articulated chain represents a joint of the real body. In IK, each joint angle of an articulated chain is determined to achieve the desired position and orientation of the end effector. Only the position and orientation of the root joint and the end effector are known, and the remaining joint angles are calculated based on certain constraints. This makes it possible to procedurally animate an articulated chain in order to imitate realistic movements. A variety of IK methods are described in the literature, each with their own advantages and disadvantages. Aristidou et al. [3] described a number of frequently used IK techniques like Jacobian-based solvers, Newton-based solvers, Cyclic Coordinate Descent (CCD) solvers, Sequential Monte Carlo method (SMCM) or the Triangulation algorithm. Jacobian-based solvers. The Jacobian matrix is used to approximate the IK problem linearly. While this family of solvers produces smooth postures, they suffer from compu- tational complexity and singularity problems. There are several more computationally efficient adaptations of this algorithm, but for real-time applications, this approach may be too computationally expensive [14]. Newton-based solvers. Newton’s methods approach the problem by formulating it as a minimization problem. Newton’s methods can be complex to implement and compu- tationally expensive per iteration, but they can also be very effective. This approach results in a smooth motion with no sudden changes [17]. Cyclic Coordinate Descent (CCD) solvers. A heuristic iterative method that is known for its computational efficiency and ability to solve IK problems without complex matrix manipulation. Because it is designed for serial chains, CCD can be difficult to adapt to multiple end-effectors, and it can produce unrealistic animation and abrupt motion even with additional constraints [25]. Sequential Monte Carlo Method (SMCM) and particle filtering solvers. Statistical meth- ods that represent the articulated chain as a collection of particles. Each particle has 3 degrees of freedom (DoF) and they are connected to each other by length constraints. To reconstruct the DoF of the final joints, particle positions and length constraints are used, both of which are calculated based on customizable constraints that are dynamically adjusted by various preconditions and parameters. These methods avoid the problem of matrix singularity but can be computationally intensive [9]. Triangulation algorithm. The triangulation algorithm uses the cosine rule to determine each joint angle. It starts at the root joint and moves outwards towards the end effector. The final motion can look unnatural, but achieving the goal only takes one single iteration. It is also limited to solving problems with a single end effector [37]. 14 2.5. Inverse Kinematics Figure 2.7: Full iteration of the FABRIK algorithm consisting of a forward iteration (a)-(d) and a backward iteration (e)-(f) [3]. In their work, Aristidou et al. [3] proposed a simple and lightweight algorithm to tackle the IK problem, called Forward And Backward Reaching Inverse Kinematics (FABRIK). It is an iterative method that computes the joint positions by iterating forward and backward through the chain using the previously computed joint positions. Since the problem can be reduced to computing a point on a line, it is a fast algorithm optimized for animating articulated chains in real-time. The algorithm starts by calculating the distances between each joint di to check if the target position t can be reached. If the target cannot be reached, the algorithm constructs a line pointing to the target and terminates; otherwise, a full two-stage iteration is employed. In the first stage, the position of the end effector pn is set to the position of the target point (p′n = t). Then the point pn−1 is recalculated based on the line passing through pn−1 and p′n, taking into account the length constraints. This process 15 2. Related Work is repeated for each remaining joint up to the root joint. As the position of the root joint may now deviate from its original position, in the second stage, the root joint must be moved back to its starting position, and the positioning process of all joints is repeated, this time backward to the end effector. In this way, the position of the end effector moves closer to the target position with each iteration, and the algorithm stops when a certain threshold is reached. Figure 2.7 shows a full iteration of the FABRIK algorithm with a single target and four joints. This algorithm can also calculate articulated chains with more than one end effector. The counterpart to IK is forward kinematics (FK), which calculates the position and orientation of the end effector of an articulated chain from the known angles of each individual chain joint. While IK is a complex problem, especially when calculating anatomically and analytically correct models, FK is relatively simple and inexpensive to calculate. However, FK is only suitable for certain problems because, as mentioned above, it requires already-known joint angles, which are usually not known in advance for a real-time animation algorithm. 2.6 Selection Metaphors When users need to interact with the virtual environment through their avatars, the right choice of 3D interaction metaphors can greatly impact the user experience [22]. It is therefore important to carefully design these techniques to improve the user experience and SoE. Typically, 3D interaction is divided into four parts: • Navigation: the movement of the user in the VR environment • Selection: the action of pointing to an object and its validation • Manipulation: the changing of the state of a previously selected object (usually position and rotation, but can also be size, color, etc.). • System control: the interaction/dialogue between the user and the application through menus or functions Each of these four parts describes a different area of interaction that the user can control and interact with in VR, whereby only certain parts are covered depending on the interaction metaphor. In this thesis, we focus on the selection and manipulation parts and compare two metaphors to assess the SoE and agency of the dissimilar avatar. When designing selection and manipulation metaphors, a common strategy for improving user performance in terms of task completion times and error rates is the application of Fitts’ law [1, 36]. It states that the expected time required to acquire a target object is a function of the ratio of the width of the target to its distance. The most commonly used equation for Fitts’ law expresses it as a relationship between the width W of the target object, the distance A, and some regression coefficients that take into account the 16 2.6. Selection Metaphors reaction time required by the user to locate the target and the performance of the task. Equation 2.1 shows the relationship between these factors. T = a + b ∗ log2(A + W W ) (2.1) T is the index of difficulty (ID) and is given in bits per second, a unit of information. This is because the given equation is derived from an equation from information theory that models the transmission of information [36]. The amount of information transferred while performing a pointing task (in bits) can be considered as the difficulty of the task, where T is the movement time required to reach the target. A higher T value means a longer movement time and therefore a more difficult task. Several factors can influence the ID, as studies have shown [19]. Thus, the angle of the movement has a significant influence on performance, as well as the dimension of the target size along the primary axis of movement, while the other two dimensions tend to have less influence. This principle can be applied to the design of selection metaphors in VR interfaces to make target selection easier and more efficient. Studies have shown that the type of selection metaphor used can significantly impact user performance [1]. Depending on the characteristics of the selection metaphor (whether it is a free-space interaction, how good the visual feedback is, whether there are occlusions during the selection, etc.), they can influence the ID. An efficient selection metaphor combines both precision of target movement and ease and naturalness of interaction. Various 3D object selection and manipulation techniques exist in VR. Depending on the type of interaction, they are divided into exocentric metaphors (i.e., where the user interacts from outside (third-person perspective)), such as World-in-Miniature or automatic scaling, or egocentric metaphors (where the user interacts from inside (first-person perspective)), such as virtual hand or ray casting [1]. In the following subsections, we discuss two selection and manipulation metaphors that are useful for the dissimilar avatar in the co-embodied setup. 2.6.1 Go-Go Interaction The Go-Go interaction technique is a selection and manipulation metaphor for the interaction with near as well as distant objects in VR and was first introduced in 1996 by Poupyrev et al. [41]. The authors described the technique as a non-linear mapping of the users’ real arm movements and the movements of the virtual arm. At close range, the virtual arm precisely follows the user’s arm movements. As soon as the user moves the arm away from the body, the virtual arm grows non-linearly in order to reach distant objects without physically moving. This technology enables precise interaction and manipulation at close range as well as reaching distant objects. A useful measure to categorize interaction techniques is the Control-Display (CD) ratio. It describes the ratio between the movement of the input device (control) and the resulting movement of the virtual object (display) [1]. If the ratio is 1:1, the virtual 17 2. Related Work Figure 2.8: The mapping function F used in the Go-Go Interaction technique [41]. movement corresponds exactly to the physical movement. A ratio other than 1 means an “amplification effect“, where a small physical movement results in a larger virtual movement (CD ratio < 1) or a large physical movement results in a smaller virtual movement (CD ratio > 1). The Go-Go interaction technique uses a dynamic CD ratio that depends on the distance of the virtual hand from the user’s body. Specifically, it uses a non-linear mapping function to both reach distant objects and work accurately at close range (Figure 2.8). The threshold value D defines the point at which the function switches from linear to non-linear mapping. With linear mapping, the movements of the virtual hand follow the movements of the real hand. When the user extends the arm beyond the threshold, the virtual arm grows non-linearly. The threshold value is normally set at approximately 2 3 of the user’s arm length. The function F is defined as follows: Rv = F (Rr) = Rr if Rr < D Rr + k(Rr − D)2 otherwise (2.2) Here Rv is the length of the virtual arm and is calculated using the function F (Rr), where Rr is the length of the real arm (i.e., the length of the vector R⃗r pointing from the origin to the user’s hand). If the user’s real arm is not stretched beyond the threshold (Rr < D) the virtual arm length Rv equals the real arm length Rr. Otherwise, the virtual arm length is calculated based on the real hand position and an “amplification factor“ k(Rr − D)2, where k is a coefficient that defines how much the virtual arm grows or shrinks. The Go-Go interaction technique, with its core principle of non-linear mapping and intuitive interaction, has influenced the development of various VR interaction techniques. Many modern VR systems and applications incorporate elements of non-linear mapping and virtual hand extensions to make user interaction intuitive and efficient [1]. For example, Auteri et al. [4] combined the Go-Go technique with PRISM, a velocity-based scaling technique to improve accuracy, to achieve precise object manipulation in 3D. 18 2.6. Selection Metaphors Figure 2.9: The “Gaze-and-Pinch“ interaction with one or two hands: look at an object, pinch to select it, manipulate it with hand gestures [39]. The user study showed that the hybrid Go-Go + PRISM interface led to an almost 2:1 improvement in accuracy over Go-Go alone in a task where a virtual object had to be aligned within a distant target. 2.6.2 Gaze-and-Pinch Another interaction metaphor is the “Gaze-and-Pinch“ technique described by Pfeuffer et al. [39]. This interaction metaphor is often used for the selection and manipulation of distant objects that are out of the user’s reach. In VR interactions, the user’s eyes often naturally point at the interaction targets, and therefore this method takes advantage of this natural behavior. When using “Gaze-and-Pinch“, the user’s gaze is tracked by some special cameras in the HMD, and objects are selected that the user is looking at. In addition, hand gestures (e.g., pinching the thumb and index finger together) are used to confirm the selection and manipulate the target object (see Figure 2.9). The authors conducted an informal user evaluation of this technique by creating an experimental UI system that allowed users to interact with various use cases showcasing different “Gaze-and-Pinch“ variations. Users emphasized the potential of this new type of interaction that differs from other interactions and described the “Gaze-and-Pinch“ technique as easy to use. Since a handheld controller is not necessary for this interaction method, it can be used in various applications. It does, however, need modern hardware and a reliable hand and eye tracking system. 19 CHAPTER 3 Avatar Design and Implementation In this chapter, we present the design and implementation of the experimental platform, including the virtual environment and the interaction metaphors explicitly developed for the co-embodied, dissimilar avatar setup. Section 3.1 gives an overview of the requirements for the application, and section 3.2 explains the VR project created with Unity3D, the virtual environment developed, as well as the networked dissimilar avatar. In the following section, the implemented interaction metaphors are explained. Section 3.5 then discusses the evaluation tasks implemented to evaluate the interaction metaphors. 3.1 Overview and Requirements As explained above, an experimental platform is developed with the Unity3D software on top of an existing basic VR environment. To enable different VR HMDs to work with the application, we use the OpenXR plugin in Unity. The idea of the experimental platform is to allow several users to share control of a dissimilar avatar and to interact with virtual objects using specifically designed interaction metaphors (see Figure 3.1). Therefore, the experimental platform is divided into multiple parts that have to be carefully designed and implemented: • Two interaction metaphors have to be developed for the dissimilar avatar for enabling interaction with virtual objects • The dissimilar avatar has to be animated according to the interaction metaphors • The dissimilar avatar has to be networked to support multi-user interaction • Evaluation tasks must be implemented to test and evaluate these interaction metaphors 21 3. Avatar Design and Implementation Figure 3.1: Left - Overview of the project’s architecture, including the co-embodied dissimilar avatar. Right - An example body configuration of the two users showing the limbs they control. We cover each of these parts in detail in the following subsections. For now, we briefly describe the requirements for each of the parts. Interaction metaphors. We implement two different interaction metaphors used for selecting and manipulating objects in VR. These two metaphors should be intuitive to understand and easy to control for the users. Additionally, they should work in a one-player setup as well as in a networked multiplayer setup. Each of the two interaction metaphors comes with its own requirements, discussed in the following section 3.3. Dissimilar avatar. The dissimilar avatar should be designed and adapted to work with two users simultaneously. It should benefit from shared control, and the two interaction metaphors should work well with the avatar. The avatar should be animated based on the interaction metaphor used and evoke a strong SoE in the users. Networked interaction. The selected and manipulated virtual objects, as well as the avatar’s animations, should be networked and displayed synchronously for both users in order to enable joint interaction within the VR environment. The interaction of a single player should still be possible even if no second user is connected. Evaluation tasks. The developed experimental platform should be evaluated in a user study, and for this, evaluation tasks should be implemented. These evaluation tasks should assess the user experience and agency of the users when interacting in the VR environment with the interaction metaphors. 3.2 Virtual Reality Environment This section gives an overview of the structure and components of the VR platform, describes how the user’s movements are networked, how the dissimilar avatar is animated, and deals with the implementation of the interaction metaphors and interaction tasks. 22 3.2. Virtual Reality Environment Figure 3.2: The interaction test scene in which the user can grab cabbages using the interaction metaphors (left - in third-person perspective, right - first-person perspective). 3.2.1 3D scene setup The components of the application are developed in a 3D scene within the Unity3D game engine. The virtual environment consists of an outdoor scene with objects that the participant can interact with. The main components of the virtual scene are the dissimilar avatar, a mirror, and three objects (cabbages). The mirror serves as a visual reference for the user to understand the physique of their virtual avatar and its movements when using the interaction metaphors. Figure 3.2 shows the general setup of the scene. As the avatar in this study is stationary and does not need to be able to move, no visible spatial boundaries are defined for the playing field. In the user study, however, a fence is added to help participants improve their spatial perception during interaction. The objects are there so that the interaction metaphors can be used to grab and move the objects. 3.2.2 Networking In order for two users to control the dissimilar avatar at the same time, their movements need to be synchronized over the network. The Photon Unity Networking 2 (PUN2) library 1, which is available in the Unity Asset Store, is used for this purpose. PUN uses the concept of rooms to create and manage multiplayer games. When a player joins a room, they can see and interact with all other players currently connected to the same room. If a player tries to join a room but no room exists yet, the server will create a room and give that player the master client role (a master client is unique to each room; it can perform special actions, such as starting the game or kicking other players out). All subsequent players who want to join are then client players. In our application, a connection between the players is established when the application is started using the “PhotonManager“ GameObject. It has a script attached in which the current player joins a random open room. As there can only be one room in our concept, it is not necessary to create a separate room with specific settings, but joining a random 1https://www.photonengine.com/pun 23 3. Avatar Design and Implementation room is perfectly fine. If this is the first player to connect to the application, there is no room yet, and a new one will be created. The second player to connect to the application will join this room as a client player. The order in which users connect to the server determines the numbering of players and the initial body configuration (i.e., which user controls which eye and arm), but this configuration can be changed anytime during the simulation. When a player joins the game by entering the room, a new “NetworkPlayer“ prefabricated component (a prefab, i.e., a blueprint of a GameObject that can be created any number of times and always has the same scripts and properties) is instantiated with the PhotonView.cs and the NetworkPlayer.cs scripts attached. See Figure 3.3 for the room creation and networking functionality. The NetworkPlayer.cs script contains the logic to record the movements of the current user and pass them on to the network. To achieve this, the script takes the local movements of the “Player“ GameObject, which is directly controlled by the HMD and controllers, and sends them to the remote instance of the application. These movements are then transferred to a local GameObject called “GhostPlayer“ within the application instance of the other user, whose movements, in turn, control the limbs of the avatar of the corresponding player in the local instance. In the local instance of a player, the movements of the “Player“ GameObject are recorded, and, in addition to being sent over the network using the network player script, the avatar’s limbs are moved accordingly. An overview of these movement transmissions is shown in Figure 3.4. 3.2.3 Dissimilar Avatar This subsection describes the dissimilar avatar, which can be embodied by two users simultaneously and allows users to interact non-isomorphically with the virtual environ- ment through novel interaction metaphors. This means that a total of two HMDs and two pairs of controllers must be tracked and that it must be possible to assign each of these tracked devices to a limb of the avatar. The avatar used for this purpose consists of four arms and two eyes. The avatar resembles an upright slug with two tentacles protruding from each side of the torso and two eye-stalks protruding from the upper part of the torso. The avatar also has a tail that emerges from the back of the torso. The left image in Figure 3.5 shows the avatar in its default pose. The avatar is rigged in order to animate it according to the user’s movements. A detailed explanation of how the animation works with the help of the IK algorithm can be found in section 3.4. Each of the tentacles is controlled by nine bones, with one bone at the end of the articulated chain serving as an anchor and not moving. The eyestalks are controlled by nine bones as well, with all bones movable. In addition, the avatar has bones in the torso and tail to animate the rest of the body. The image on the right of Figure 3.5 shows the avatar rigged with its bones. As mentioned above, the avatar can be controlled by two users simultaneously, with one user controlling the left or right eye and the upper or lower arms. The second user then controls the other limbs. These body configurations can be adjusted as required. In order 24 3.2. Virtual Reality Environment Figure 3.3: Overview of the room creation and networking logic when starting the application. 25 3. Avatar Design and Implementation Figure 3.4: Overview of the logic of the synchronized and networked movements of the players that drive the dissimilar avatar limbs. to control the limbs of the avatar, a structure is required that defines which parts of the avatar are controlled by which player. For this purpose we introduce the two classes PlayerStruct.cs and PlayerArmStruct.cs. These classes bundle all information about a player and its assignment to the avatar limbs, and these structures are used for movement mappings throughout the application. Listing 3.1 and listing 3.2 show the structure of these classes. 1 public class PlayerStruct 2 { 3 public InteractionMetaphor InteractionMetaphor { get; set; } 4 public Transform EyeTarget { get; set; } 5 public PlayerArmStruct LeftArm { get; set; } 6 public PlayerArmStruct RightArm { get; set; } 7 } Listing 3.1: The PlayerStruct class containing information about the player controlling one part of the avatar. The PlayerStruct.cs class contains the following information: the interaction metaphor the player is currently using; the eye target Transform, i.e., which of the two eyes the player is controlling; and a “PlayerArmStruct“ for the left and right arm. The PlayerArmStruct.cs class contains all essential information about the control of the tentacles, e.g., whether it is a left or right arm; the virtual hand Transform, which is used to store the position of the virtual (Go-Go) hand; a Transform containing the position of the real hand; a Transform containing the position and rotation of the tentacle 26 3.3. Selection Metaphors Figure 3.5: Left: Shaded model of the dissimilar avatar. Right: Transparent avatar with its rig and bones structure. tip (necessary to influence the tentacle tip according to the rotation of the real hand); a Transform for the arm target, i.e., the object that controls the effective position and rotation of the virtual tentacles based on the real or virtual hand movement; and a field containing the FABRIK IK code for this arm to achieve different behavior of the IK algorithm depending on the interaction metaphor. 1 public class PlayerArmStruct 2 { 3 public bool Left { get; set; } 4 public Transform VirtualHand { get; set; } 5 public Transform RealHand { get; set; } 6 public Transform TentacleTip { get; set; } 7 public Transform ArmTarget { get; set; } 8 public FastIKFabrikStretch FabrikStretch { get; set; } 9 } Listing 3.2: The PlayerArmStruct class containing information about one arm that the player controls. 3.3 Selection Metaphors The core of this work is the implementation of suitable selection and interaction metaphors for the dissimilar avatar. In the following subsections, two interaction metaphors are discussed, which are later evaluated in a user study. These metaphors are suit- able for manipulating distant objects and can be used by two users simultaneously in a co-embodiment setup. To be able to assign the interaction metaphors to the user, we create the InteractionMetaphorSelector GameObject, which contains the script 27 3. Avatar Design and Implementation InteractionMetaphorSelector.cs and a PhotonView.cs script. The PhotonView.cs script is used to synchronize the changes that one user makes to their instance of the program in the InteractionMetaphorSelector GameObject with the other user’s program over the network. The InteractionMetaphorSelector.cs script has two drop-down menus, one for Player One (i.e., the master player) and one for Player Two (i.e., the client player), with different interaction metaphors to choose from (more details on the individual metaphors in the following subsections). In this way, the connected users can be assigned the desired metaphor, and their arms (depending on which one they control) will adopt the behavior required for the interaction metaphor. The selectable metaphors are “Default“, “GoGoInteraction“, “GazeAndManipulate“, and “GoGoPlusGazeAndManipulate“. With the default interaction, the movements of the user’s arms are mapped 1:1 to the avatar’s arms. The physical limitations of the avatar are respected, i.e., the user stretches their arm out very far, and the virtual arm is moved only as far as possible and will not stretch. This interaction metaphor can lead to a lower SoE, as the user observes a different behavior of the virtual arms than he would expect based on the movements of his own hands. The other three interaction metaphors are discussed in detail in the following subsections. 3.3.1 Go-Go Interaction The first of the two interaction metaphors that we implement and with which we eval- uate the usability of the dissimilar avatar in a co-embodied scenario is the so-called Go-Go interaction technique [41]. This is a non-linear scaling of the virtual arm or a non-linear mapping between the user’s movements and the resulting movement of the virtual arm. This makes it easier to reach distant objects in the virtual environ- ment. To achieve this mapping, a non-linear mapping function is used (see Equation 2.2). The MonoBehaviour script GoGoInteraction.cs is located on the character’s GameOb- ject so that the dissimilar avatar can grab objects using the Go-Go interaction technique. The parameters required by the script are the user’s arm length, a coefficient K, and two pivot Transforms, one for the left and one for the right arm. The arm length is required to determine the threshold value D of the mapping function. This value separates the linear from the non-linear mapping. The literature suggested that it should be 2 3 of the user’s arm length to achieve the best possible Go-Go behavior (compromise between good reachability of distant objects and no excessive arm distortion). The coefficient K determines how much the non-linear part of the mapping function increases, i.e., how much the arm grows when the user holds it further away from the body. Finally, the two pivot Transforms are needed to determine the grabbing direction or the length of the vector R⃗r pointing from the origin to the user’s hand. The pivot points act as the origin in a user-centered coordinate system, with the origin at the user’s chest. In our case, the pivots are located approximately at the position of the shoulder to obtain a suitable vector R⃗r, as this vector should represent the real arm as accurately as possible. 28 3.3. Selection Metaphors If the pivot points were located exactly in the center of the avatar, this would result in an unnatural grab direction. To calculate the Go-Go extension effect for a given arm and based on the corresponding pivot, we first need to calculate the direction and distance the real arm is moving. This is done by subtracting the pivot position from the real hand position to obtain the vector ⃗handPivot. To normalize ⃗handPivot, it is divided by Rr, which is the length of the vector pointing from the hand position to the pivot position (see listing 1 lines 4 to 6). The equations used are the following: ⃗handPivot = ⃗hand − ⃗pivot (3.1) Rr = ( ⃗hand.x − ⃗pivot.x)2 + ( ⃗hand.y − ⃗pivot.y)2 + ( ⃗hand.z − ⃗pivot.z)2 (3.2) | ⃗handPivot| = ⃗handPivot Rr (3.3) After calculating these values, we need to check whether the length Rr is above the threshold D (non-linear part of the mapping function) or below (linear part of the mapping function). If the value is lower, the position of the virtual hand is simply mapped to the position of the real hand. If the value is higher, we need to calculate how far the virtual arm should extend. This is done by calculating the new length R′ r (according to the Equation 2.2) using the coefficient K. The virtual hand position is then calculated using the pivot, the normalized grabbing direction, and the calculated length. Figure 3.6 shows how the different extensions of the arm affect the position of the virtual arm. 3.3.2 Gaze-and-Manipulate The second interaction metaphor is the so-called “Gaze-and-Manipulate“ interaction technique. This metaphor is inspired by the “Gaze-and-Pinch“ interaction technique [39]. The idea behind this interaction metaphor is to track the user’s gaze and trigger the selection and manipulation of objects by performing certain hand or finger movements. Often a “pinch“ motion is performed with the thumb and index finger to manipulate objects. Since in our setup we cannot directly track the user’s eye movements, we use the orientation of the head to check whether objects are being looked at. We also use the grip buttons on the VR controllers to select and manipulate objects instead of pinching motions with the fingers. Similar to the GoGoInteraction.cs script, the GazeAndManipulate.cs script is located on the character’s GameObject so that the dissimilar avatar can use this interaction metaphor to gaze at and retrieve objects. In our implementation, the objects that the user is looking at (i.e., turning their head at) are highlighted with a highlight 29 3. Avatar Design and Implementation (a) No extension of the real arm: linear part of the mapping function. (b) Slight extension of the real arm: still in the linear part of the mapping function. (c) Large extension of the real arm: non-linear part of the mapping function. Arm stretches accordingly. Figure 3.6: Go-Go interaction with different arm extensions. Left: real arm extension movement. Middle: movement of the virtual avatars’ arm. Right: First-person perspective of the virtual arm. The “wristband“ objects represent the positions of the real hands. material. Once the object is highlighted, it can be retrieved to the hand by pressing either the left or the right grip button (depending on the arm in which the user wants to retrieve the object), and once in reach, grabbed as usual. The parameters for the GazeAndManipulate.cs script are the smooth time and the gaze ray distance. The smooth time defines how fast the object will reach the hand, where a smaller value will reach the hand faster. The gaze ray distance parameter defines how far apart the individual rays are that are used to check whether an object is in the users’ field of view. Starting from the center of the screen, a total of nine rays are shot into the scene in a 3x3 grid (see Figure 3.7) to check whether an object is currently being viewed. If an object is hit by one of the rays, it is considered to be viewed and is highlighted. The larger the gaze ray distance parameter is, the less closely you have to look at objects to highlight them. The smaller the value, the closer the rays are to each other and the 30 3.3. Selection Metaphors closer and straighter you have to look at an object in order to interact with it. If the value is too high, it can happen that a distant object has space between the rays without being hit by the rays. Therefore, a compromise must be found between user-friendliness and guaranteeing the selection of objects by finding a suitable value. Figure 3.7: The nine rays that check for gazed objects. The gaze ray distance (GRD) defines the spacing between the individual rays. (Slightly tilted view to be able to see the rays.) When the user looks at an object and presses the grip button, the object is pulled towards the virtual avatar’s hand. To avoid lowering the SoE (as the object would just fly around), we have added an arm animation that grabs the distant object. There is a halo object for each of the avatar’s four tentacles, with a slightly transparent material. These halos have the Go-Go stretch logic implemented, meaning they can reach the distant object by stretching the mesh. During the retrieving animation, the halo always stays attached to the object, making it look as if a ghost tentacle is pulling the object towards the avatar’s hand. Once the object (and the halo arm) are close enough to the avatar’s hand, the halo mesh is turned off. Figure 3.8 shows what this animation looks like. 3.3.3 Go-Go plus Gaze-and-Manipulate In addition to the two interaction metaphors described above, we develop a third one. This is a combination of the two metaphors “Go-Go Interaction“ and “GazeAndManipulate“. In this metaphor, it is possible to grab objects with a stretched arm as in Go-Go, but also to benefit from the gaze mechanism. For example, an object can be brought to the stretched arm (which has been stretched by the Go-Go logic) using the gaze mechanism 31 3. Avatar Design and Implementation Figure 3.8: Animation of the arm halo retrieving the distant object. Once the object is close enough, the halo mesh is turned off. and then grabbed and manipulated as usual using Go-Go. We did not evaluate this metaphor in the user study because we explicitly wanted to analyze the effects of the two metaphors on the dissimilar co-embodied avatar individually and not the interaction between the two metaphors. However, combinations of metaphors can be further explored in future studies. 3.4 FABRIK IK A modified version of the FABRIK IK [3] (see section 2.5) algorithm is used to generate the character’s movements. The data from the tracked HMDs and controllers is transferred to the IK end effectors, which drive the articulated chains of the tentacles and eye-stalks. We adapt the FABRIK IK algorithm to take into account the movements of the arms when using the Go-Go interaction metaphor and adjust the joint constraints accordingly. Once the position and rotation of the virtual arm are determined using the calculations described in the interaction metaphor subsections above, the FABRIK IK code must take these results into account. The default behavior of the FABRIK IK logic is to rotate the individual bones according to the joints to reach the end effector. The length of the articulated chain, and therefore the length of the tentacles, is limited to the sum of all individual bone lengths. To achieve the desired Go-Go behavior, these lengths must be increased accordingly. We, therefore, adapt the FABRIK IK code so that, in addition to the default target (the user’s hand), it can also follow the virtual hand target (which corresponds to the newly calculated position and rotation). For this purpose, when defining the target (i.e., the end-effector in the articulated chain), it is checked whether the current arm is a virtual arm (in the case of arm halos of the Gaze-and- Manipulate interaction (see subsection 3.3.2 for a detailed explanation of the arm halo logic)), or whether the arm uses the Go-Go interaction. In the case of the virtual arm, the “RetrievingTarget“ (i.e., the object that the virtual halo arm follows) is set as the end-effector. In the case of the Go-Go interaction, the end effector is the Go-Go target, i.e., the position of the virtual hand target. If none of the cases apply, the default target is used as the end effector, i.e., the position of the user’s real hand. The check of these 32 3.4. FABRIK IK three cases is shown in listing 2. After the end effector has been defined, it is checked whether the current arm is an arm halo or a Go-Go-driven arm, and in both cases the “StretchIK“ code is executed. In the other cases, the “DefaultIK“ code is executed, which, as described above, targets the user’s real hand and does not stretch the bones. The “StretchIK“ logic uses a stretch algorithm to extend the arm beyond the standard bone lengths by calculating new bone lengths. The distance between the position of the end-effector and the position of the last bone in the articulated chain (that would be the tip of the tentacle) is calculated. Furthermore, the distance between the end-effector and the root bone (that is the first bone in the articulated chain and is approximately at shoulder level) is calculated. Using these two values, a new length is calculated for each bone (see Equations 3.4 to 3.6). LRemaining = ( ⃗target.x − t⃗ip.x)2 + ( ⃗target.y − t⃗ip.y)2 + ( ⃗target.z − t⃗ip.z)2 (3.4) LT otal = ( ⃗target.x − ⃗root.x)2 + ( ⃗target.y − ⃗root.y)2 + ( ⃗target.z − ⃗root.z)2 (3.5) LDesired = (LBones(i)/LComplete) · LRemaining (3.6) where LComplete = n i=1 LBones(i) (3.7) The next step is to check whether the total length (from the root bone to the end-effector) is greater or less than the sum of the current bone lengths. Based on this, the newly calculated length is either added to or subtracted from the current bone length to create a “stretch“ or “shrink“ effect on the tentacle mesh. The new lengths of the bones are added together and saved as the new total length (see Equation 3.7). Once the length of each bone is determined, the position of each bone after the root bone is set. This is done by multiplying the new bone length by the grabbing direction and adding it to the position of the previous bone (see Equation 3.8). Refer to listing 3 for the complete “StretchIK“ logic. BonePosi = BonePosi−1 + (direction · LBones(i−1)) (3.8) To give the user visual feedback about their real arm position, we have introduced a torus (or “wristband“) for each arm that indicates where the user’s hand is (see Figure 33 3. Avatar Design and Implementation 3.6). Especially in interactions that use the “stretch“ logic, such as the Go-Go interaction (where the user suddenly no longer sees the virtual hand where their real hand is), it can be helpful for the user to have a visual indication of where their real hand is currently located. 3.5 Evaluation Task In order to evaluate the SoE and effectiveness of the developed interaction metaphors (details in chapter 4), we had to design an evaluation task that the users will have to perform in the user study. The requirements for these evaluation tasks were as follows: 1. For both interaction metaphors, all directions in the front half of the user’s field of view must be covered. 2. The order of the trials must be randomized to minimize learning or predictable situations. 3. Intuitive gabbing situations must be implemented that direct the user’s full attention to the interaction and do not distract from the task at hand. Based on these requirements, an evaluation task system is developed that generates new random positions each time the user study system is started and spawns the objects at these positions. The number of objects that appear and the positions are freely configurable, but we choose the following setting for our user study: 40 grabbing actions, so-called ‘sub-trials’, should be performed for each body configuration. A sub-trial is the most atomic unit of this evaluation task system. It defines where the grabbable object spawns and which interaction metaphor the user must use to grab and collect this object. Further, it stores a number of information that is needed for the user study evaluation (e.g., the time it took the user to grab the object or how strongly the user felt a sense of agency). More specifically, a sub-trial contains the following information: • the spawning position (the position the grabbable object spawns in the scene) • the spawning angle (the angle as seen from the player’s forward vector at which the grabbable object spawns) • the spawning distance (the distance from the player at which the grabbable objects spawn) • the time needed to grab the object (the time between the appearance of the grabbable object and its collection; measured for later evaluation.) 34 3.5. Evaluation Task • the arm to be used (whether the player should use the right or left arm for grabbing the object) • the agency question answer (the answer given by the user to the agency question of the 7-point Likert scale) • the interaction metaphor (which interaction metaphor the user should use to grab the object) The grabbable objects can appear in various positions, based on the configured angles and distances. We choose five angles and two distances for our user study. As we do not want the user to grab objects that are behind his body (see requirement 1), we choose the following angles: -90°, -45°, 0°, 45°, and 90°. This covers the user’s front field of view (180° in total). For the distances we choose a “medium“ distance (1 meter), and a “long“ distance (2 meters). For each angle, there are two possible spawning positions, so in total there are 10 spawning positions with this number of angles and distances. Figure 3.9 shows the possible spawning position. In addition, each spawning position has a small random vertical offset to prevent the user from being able to anticipate the exact grabbing position. Listing 4 shows the code that generates these positions based on the number of angles and distances specified. First, a new list is initialized, containing a Vector3 value to store the final position of the grabbable, an int value to store the spawn angle in degrees, and a string value to indicate whether it is the medium or long distance. The position is used to instantiate a new grabbable and place it in the scene, and the angle and distance are stored in the sub-trial object for later evaluation. We iterate over all given angles and convert them from degrees to radians using following equation: angleRad = angleDeg · π/180; (3.9) Then we iterate over all given distances and calculate the height of the grabbable using a fixed height value multiplied by a random factor. The final position of the grabbable is calculated using the converted angle in radians and the distance. Finally, the calculated position, angle, and distance are stored in the newly created list, which is returned after all angles and distances have been used to calculate the possible positions. In order to evaluate two interaction metaphors for both user arms, each sub-trial also contains information about which metaphor the user must use to grab the object, as well as which arm the user must use. For each position, both interaction metaphors have to be covered (as well as both arms), making a total of 40 sub-trials. Two sub-trials form a so-called ‘trial’, i.e., a collection of two sub-trials that have the same spawning position and require the same arm for grabbing the object. The difference between the sub-trials is that the first sub-trial requires the first interaction metaphor to grab the object, and the second requires the second metaphor. A trial also contains a 35 3. Avatar Design and Implementation Figure 3.9: The possible spawning positions of the objects for five angles and two distances (pink = medium distance, blue =long distance). unique name for later evaluation. To meet the second requirement, the order of the two sub-trials within a trial is randomized, so that for some positions the first metaphor is the Go-Go Interaction and for others the first interaction metaphor is Gaze-and-Manipulate. The order of the trials is also randomized to prevent the user from anticipating the positions. For each sub-trial, a UI shows which interaction metaphor and which arm the user must use to grab the object (see Figure 3.10). If the user tries to grab the object with the wrong arm, it will not be possible to grab the object to avoid wrong data in the user study. After each sub-trial (after the user collects the object), a UI is displayed presenting the question “I felt that I could control the virtual arm as if it were my own arm“. A slider is displayed below the question, enabling the user to select a value on a Likert scale (see Figure 3.11) [32]. The scale contains seven values: (1) strongly disagree, (2) disagree, (3) 36 3.5. Evaluation Task Figure 3.10: The UI during each sub-trial to indicate the interaction metaphor and the arm to use. somewhat disagree, (4) neither agree nor disagree, (5) somewhat agree, (6) agree, and (7) strongly agree. The next subsection 3.5.1 explains how the user selects the desired value on the slider and confirms their choice. The answer is then saved in the “Agency question answer“ field of the sub-trial object. Once the user has: 1. grabbed an object with the first interaction metaphor 2. answered the agency question on the Likert scale UI 3. used the second interaction metaphor to grab the object on the same position 4. answered the second agency question a UI appears asking the user for the preferred metaphor for that position and the arm used (see Figure 3.12). This preference information is stored in the trial object in addition to the two sub-trials and will be used later for evaluation. Listing 5 shows the code for setting the preferred metaphor, waiting for user input, and saving the answer to the question in the test data object in the “PreferredInteractionMetaphor“ field. As we are 37 3. Avatar Design and Implementation Figure 3.11: The UI after each sub-trial with a 7-point Likert scale slider. randomly assigning the two metaphors to the two sub-trials, we need to check the current order of the metaphors; otherwise, we would save the wrong data. We then call the function SpawnNextSubTrial() to start the next trial, if there are any left. After a total of 20 trials, another UI appears informing the user that all trials for the current body configuration have been completed and that they can put the HMD down. Figure 3.13 shows the full procedure of the evaluation system for one body configuration. 3.5.1 UI Interaction In order to use the evaluation tasks in a user study, it must be possible to interact with the user interface using the VR controllers in order to answer the questions asked after each sub-trial. The requirement is that it should be possible to answer the questions about the agency and the preference of the interaction metaphors described above without mouse or keyboard input. The reason for this is that users should not put the HMD down to use the mouse and keyboard during the experiments, as this could destroy the immersion. Therefore, a simple and intuitive UI interaction logic was added. We use Unity’s new input system for this as it is versatile in the development of input logic. Additionally, controller inputs from various VR headsets can be used thanks to the OpenXR package. To select the desired value in the Likert slider UI panel or the preferred interaction metaphor in the metaphor selection UI panel, the joystick of the 38 3.5. Evaluation Task Figure 3.12: The UI after both sub-trials with a choice between the two interaction metaphors to indicate the preference. right controller can be used. The value of the x-axis of the joystick is being read (this corresponds to a left-right movement of the joystick) and converted to a boolean output. The following listing 6 shows the code for reading the joystick input. As we do not want to read out the input with every update frame but only with every change of direction of the joystick position, we must save the current joystick value (position) and compare it each time so that the selection logic (i.e., a value in the slider is selected or a preferred metaphor is selected) is not triggered with every frame. The selection logic is therefore only triggered in these cases: • Joystick idle position → Joystick moved to the right • Joystick idle position → Joystick moved to the left • Joystick left position → Joystick moved to the right • Joystick right position → Joystick moved to the left Once the desired value has been selected in the Likert slider UI, the selection can be confirmed with the primary button on the right controller, i.e., the “A“ button. Apart from selecting the desired values on the UI panels and confirming with the primary button, the user does not need any other UI interaction methods. 39 3. Avatar Design and Implementation 3.5.2 CSV Export In order to be able to use the data recorded during the trials for later evaluation, it must be converted into a meaningful format and exported. For this purpose, we use the UserStudyExporter.cs script, which converts the trial data into a .csv structure and exports it to a .csv file. Once all trials have been completed, the script automatically creates a file with the name “userstudydata_ddMMyy_HHmmss.csv“ (where the second part is the current date and time) inside the “Exports“ folder. The following data from the trials is saved in the file: • BodyConfig. The body configuration with which the user completes the trial (values: LeftEye-TopBody, RightEye-TopBody, LeftEye-BottomBody or RightEye- BottomBody). • Trial_Name The name that identifies the trial. The name is always as follows: “Trial_X“, where X is a number from 1 to 20 (since there are 20 trials). • SubTrial_Number. The number of the sub-trial (values: 1, 2). After two rows with the sub-trial data for the current trial object, a third line with SubTrial_Number 0 is added. In this third line, the preferred interaction metaphor selected by the user is stored. • Metaphor. The metaphor in use for the current sub-trial. Contains the value “—“ if it is the “‘preference“ row (values: GoGoInteraction, GazeAndManipulate). • Arm. The arm in use to grab the objects for the current sub-trial (values: Left, Right). • Angle. The angle at which the object for this sub-trial spawns (values: -90, -45, 0, 45, 90). • Distance. The distance at which the object for this sub-trial spawns (values: Medium, Long). • Time. The time it takes the user to grab this object. Contains the value “-1“ if it is the “‘preference“ row. • Answer. The answers given by the user during the execution of the task. For the sub-trial rows, this field stores the value of the 7-point Likert scale from 0 to 6 (see Figure 3.11 for the corresponding UI). For the “‘preference“ row, this value stores the preferred interaction metaphor (see Figure 3.12 for the corresponding UI). 40 3.5. Evaluation Task Figure 3.13: One full iteration of the evaluation system for one body configuration. Blue boxes: system actions. Green boxes: user actions. 41 CHAPTER 4 Evaluation In order to evaluate the interaction metaphors designed for the dissimilar avatar, a user study was conducted with 20 participants (16 male, 3 female, 1 non-binary). The main goal of this study was to evaluate the usability and intuitiveness of the metaphors in the context of co-embodiment and to assess the SoE of the users while controlling the dissimilar avatar. In this study, the application was tested with one person at a time to focus on the interaction metaphors. Co-embodiment with two users interacting simultaneously was not the scope of this study and will be conducted in future works. 4.1 Study Design and Hypotheses 4.1.1 Design In the user study, the two interaction metaphors were tested in four different blocks of body configurations (i.e., which body part the user will control, including which eye and which pairs of arms (see Figure 3.1 for an example body configuration)). Each block consisted of 10 grabbing positions for both arms, in which both interaction metaphors had to be used (see Figure 3.9 for the setup of the grabbing position). In total, there were 40 grabbing to perform, which were randomized within a block. The blocks themselves were counterbalanced in the user study using a Balanced Latin Square design (see Table 4.1). In the user study, we had the following independent variables: • Body Configuration: LeftEye-TopBody, RightEye-TopBody, LeftEye-BottomBody, RightEye-BottomBody • Angles of grabbable spawning positions: -90°, -45°, 0°, 45° and 90° • Distances of grabbable spawning positions: medium (1 meter), long (2 meters) 43 4. Evaluation Participant # Block 1 Block 2 Block 3 Block 4 1 LeftEye TopBody RightEye TopBody RightEye BottomBody LeftEye BottomBody 2 RightEye TopBody LeftEye BottomBody LeftEye TopBody RightEye BottomBody 3 LeftEye BottomBody RightEye BottomBody RightEye TopBody LeftEye TopBody 4 RightEye BottomBody LeftEye TopBody LeftEye BottomBody RightEye TopBody Table 4.1: Order of the body configurations defined by a Balanced Latin Square design. For the fifth participant, the order of participant #1 was used again, and so on. • Interaction metaphor: Go-Go Interaction, Gaze-and-Manipulate Interaction • Arm: Left Arm, Right Arm For the spawning position of the grabbable, we chose a height of 1.2 meters (with a 0.2 random offset); for the Go-Go interaction, we chose an arms length of 0.75 meters; and for the Go-Go coefficient K, we chose the value 6. Regarding the parameters of the Gaze-and-Manipulate interaction, we chose the smooth time to be 0.05 sec and the gaze ray distance to be 0.07 meters. 4.1.2 Research Questions and Hypotheses The main aim of this work was to investigate how useful the developed interaction metaphors are for the dissimilar avatar and how strong the user’s sense of agency was through the use of the metaphors. Our main research questions are: • How do different interaction metaphors influence the user’s SoE and immersion in VR with a dissimilar avatar? • Which interaction techniques do users prefer for interacting with objects while being embodied in the dissimilar avatar? We hypothesized that in the selection tasks, the objects with an angle closer to zero (i.e., more frontal to the user) are easier to grab than the objects that are further to the side of the user. We also hypothesize that objects that are further away from the user’s body are faster to grab with the Gaze-and-Manipulate interaction metaphor than with the Go-Go interaction metaphor. Accordingly, we believe that the nearer objects will be grabbed faster with the Go-Go interaction. Therefore, we want to test the following hypotheses on “User Performance“: 44 4.2. Technical Setup and Equipment H1.1: Objects that are close to the avatar can be grabbed faster with Go-Go interaction than with Gaze-and-Manipulate. H1.2: Objects that are further away can be grabbed faster with Gaze-and-Manipulate than with Go-Go interaction. H1.3: Objects located at positions with a smaller angle relative to the player axis can be grabbed faster than objects located at a larger angle. A central component in the design of the interaction metaphors was to convey a high SoE and sense of agency. We hypothesize that SoE and agency would be affected differently depending on the trial configuration or body configuration. We test the following hypotheses on “Sense of Embodiment and Agency“: H2.1: The sense of agency will be influenced by the trial configuration (metaphor, distance). H2.2: The sense of agency will be influenced by the body configuration. H2.3: The sense of self-location will be influenced by the body configuration. We want to find out which interaction metaphor is more user-friendly and preferred by the users. Since the Go-Go interaction is intuitive to use and we assume that the Gaze-and-Manipulate interaction is more complicated to use, as well as the “teleportation“ of objects may seem unnatural to the user, our final hypothesis on “User Preference“ is as follows: H3: Users will favor Go-Go interaction over Gaze-and-Manipulate interaction. 4.2 Technical Setup and Equipment For the user study, a virtual scene was created in Unity3D that contained a defined area surrounded by a fence. This fence was used to make it clear to the user that they did not have to move and that the objects would not appear outside this boundary. In addition, this fence was incorporated as a tool to increase the user’s immersion in the scene and allow them to better perceive distance in space. Figure 4.1 shows the technical setup and the first-person perspective of the user performing a grabbing task. The user study was carried out at the VR Lab at TU Wien and for some participants at the authors’ home. The main play area for the user study was an area measuring approximately 2.5 meters by 1.5 meters. The study was conducted with an Oculus Quest 2 HMD and two Oculus Quest 2 Touch controllers, with the HMD connected to a desktop computer via cable. The user study was conducted on a PC with Intel Core i9 (3.5GHz) CPU, 32GB RAM, NVIDIA GeForce RTX 3090 GPU in the VR Lab at TU Wien. At the authors’ home, a PC with AMD Ryzen 5 (3.6GHz) CPU, 32GB RAM, and NVIDIA GeForce RTX 2060 GPU was used. 45 4. Evaluation Figure 4.1: Left: User wearing a HMD and controllers performing a grabbing task. Right: First-person perspective of the VR environment. 4.3 Participants Twenty participants aged between 21 and 40 were recruited for the study. All participants had normal or corrected-to-normal vision, and all but one were right-handed. Of these participants, 10 reported having experience with VR using an HMD (7 of those regularly (experience level 6), 3 of those reported experience level 5), 4 reported moderate experience (level 2-4), and 6 had little to no experience with VR (2 answered experience level 0, 4 experience level 1). The VR experience levels are shown in Figure 4.2a. Regarding video game experience (Figure 4.2b), 7 participants reported a level of 6 (regularly), 5 reported a level of 5, while 3 reported a level of 3, and 3 reported experience level 0 (never). Experience levels 1 and 4 were each reported once. 4.4 Procedure During the study, a second instance of the application was launched that connected to the first instance and took control of the limbs that the user was not controlling. Although these limbs were not moved, their presence simulated a co-embodied scenario. Throughout the evaluation, participants were closely supervised by an experimenter 46 4.4. Procedure (a) VR with HMD experience. (b) Video games experience. Figure 4.2: Participants’ experience levels with VR with an HMD and video games. seated nearby. The experimenter first provided instructions on the VR controls and then offered assistance during each trial, if needed. The user study was divided into several parts: 1. Before the first block of trials: A preliminary questionnaire was used to collect demographic data from the participants. 2. During the four blocks of trials: Participants were asked to perform tasks in a body configuration in the VR application. 3. After each of the four blocks of trials: Participants were asked to complete a post-block questionnaire to collect data about their perceived SoE. 4. After the last block of trials: A post-experiment questionnaire was used to collect data about the participants’ VR experience. Parts 2 and 3 were repeated four times, with a different body configuration chosen for each block in order to test all possible configurations. Between parts 2 and 3, participants were given some time to recover from the VR use while completing the post-block questionnaire. Figure 4.3 shows a flowchart of the different parts of the user study and the approximate time allocated to each part. To reduce the possible influence of learning effects on the measurement data, we used a Balanced Latin Square design to counterbalance the order of the body configurations, as shown in Table 4.1. This design ensured that each configuration was presented in every possible order, thus preventing systematic bias in our results. 47 4. Evaluation Figure 4.3: Procedure of the study with the approximate time allocated to each part. As the user study tasks described above could be carried out in a stationary position and the participants did not have to move around in the room, the user study did not require a lot of space. It was sufficient if the participants had enough space to fully extend their arms and possibly lean forward or to the side. This allowed the study to be conducted in spaces with limited room, such as the authors’ home or the VR lab at TU Wien. Despite the different locations, the same procedure was followed in order to obtain usable and comparable data. After being welcomed and told how much space they would need to perform the tasks to avoid any risk of injury or damage to the equipment, they were asked to complete the preliminary demographic questionnaire. The HMD and controller with the necessary controls were then explained to the participants, and the HMD was handed out. The test environment was started to give the participants the opportunity to get used to the VR experience. In order to familiarize the participants with the upcoming tasks and interaction possibilities, they were given the opportunity to try out both metaphors (Go-Go interaction, Gaze-and-Manipulate) by grabbing dummy objects placed in the scene. A mirror was also placed in the virtual scene so that participants could see the dissimilar avatar they were controlling and explore the capabilities and physical limitations of the avatar. This “warm-up round“ was carried out until they felt ready to start the user study. At the start of the user study, the initial body configuration for the virtual avatar was set, and the first of four blocks of trials began. Each trial consisted of a grabbing task (first randomized interaction metaphor), followed by a Likert-type scale question to capture the user’s sense of agency, followed by the second grabbing task (second interaction metaphor), followed by another Likert-type scale question, and finally a choice between the two interaction metaphors as to which one the user preferred for that cabbage 48 4.5. Data Collection and Analysis position and arm. This was repeated 20 times, for a total of 40 grabbing tasks (40 agency questions and 20 preferred metaphor questions). After one body configuration, participants were given a post-block questionnaire about the VR experience. After a short break, the described procedure was repeated three more times to collect data for all four body configurations. At the end of the experiment, the participants were given a post-experiment questionnaire in which they answered statements about the VR experience and the interaction metaphors and could optionally fill in a free-form text field for additional comments and remarks. 4.5 Data Collection and Analysis We collected both objective and subjective data in the user study. During the trials, objective data was collected in the form of task completion time. The post-block questionnaires and the post-experiment questionnaire collected subjective data in the form of 7-point Likert-type scale statements and open-ended questions. In the objective evaluation, we compared the interaction metaphors in terms of the time required to complete the tasks. In the subjective evaluation, we analyzed the answers to the Likert- type scale statements and open-ended questions about agency, SoE, and workload. The results of the objective and subjective evaluation are presented in the following section 4.6 and discussed in chapter 5. 4.5.1 Demographics In a preliminary questionnaire, participants were asked to provide demographic informa- tion such as gender and age and to rate their experience with VR headsets and video games on a scale of 0 to 6, with 0 (never) being the lowest and 6 (regularly) being the highest. They were also asked to rate their general well-being before the start of the user study on a scale of 0 to 10 (0 being how they felt when they entered, 10 being that they would like to stop the experiment). 4.5.2 Objective Data In the objective evaluation, the times measured during the execution of the evaluation tasks were analyzed. For each sub-trial (see section 3.5 for more details), the time it takes a user to grab and collect the grabbable object was measured. The time started when the grabbable object spawned and the user received information on the UI about which interaction metaphor and which arm to use to grab the object. The time was stopped as soon as the user had placed the object in the basket. The data was used to find out which interaction metaphor was used, which arm was used, and which grabbable object positions took users longer to collect the objects and in which setting they were faster. This information was then used to determine which interaction metaphor is objectively more efficient. 49 4. Evaluation ID Statement OW1 It felt like the virtual body was my body. OW2 It felt like the virtual body parts were my body parts. OW3 The virtual body felt like a human body. OW4 It felt like the virtual body belonged to me. AG1 The movements of the virtual body felt like they were my movements. AG2 I felt like I was controlling the movements of the virtual body. AG3 I felt like I was causing the movements of the virtual body. AG4 The movements of the virtual body were in sync with my own movements. CH1 I felt like the form or appearance of my own body had changed. CH2 I felt like the weight of my own body had changed. CH3 I felt like the size (height) of my own body had changed. CH4 I felt like the width of my own body had changed. Table 4.2: First set of statements about ownership (OW), agency (AG) and change (CH) of the post-trial questionnaire. The participants answered on a 7-point Likert-type scale indicating the extent to which the statement applied to them during the trial (1=strongly disagree, 7=strongly agree). Statements taken from [43]. 4.5.3 Subjective Data In addition to the preliminary questionnaire, participants were given post-block question- naires and one post-experiment questionnaire to complete. These questionnaires included questions about the VR experience, agency and SoE, and workload. Participants were asked to complete a post-block questionnaire after each of the four blocks of the body configurations. After the last run, a short post-experiment was given to the partici- pants to complete. The preliminary, post-block, and post-experiment questionnaires can be found in Appendix G. The post-experiment questionnaire used a mixed-methods approach, including both Likert-type scale questions and open-ended questions with free-text responses, to gather subjective feedback from participants. Our user study focused primarily on the collection of subjective data in the form of 7- point Likert-type statements (where the user indicated the extent to which the statement applied to them during the trial) and in the form of preference questions. The subjective data was divided into three different categories, depending on when they were collected during the user study: post-trial, post-block and post-experiment data. Post-trial data includes the 7-point Likert scale agency statement data after each grabbing sub-trial and the preferred metaphor question after each trial. Post-block data, on the other hand, refers to the embodiment questionnaires after each of the four body configuration blocks. Finally, post-experiment data includes 7-point Likert scale statements about the VR experience as well as a free-form text field for comments. In the analysis, these three types of data categories were distinguished and analyzed individually. For the post-block data analysis, two sets of statements were evaluated after each of 50 4.5. Data Collection and Analysis ID Statement SL1 I felt as if my body was located in the center of the virtual body. SL2 I felt as if my body was located to the left of the virtual body. SL3 I felt as if my body was located to the right of the virtual body. SL4 I felt as if my head was located in the center of the virtual body. SL5 I felt as if my head was located to the left of the virtual body. SL6 I felt as if my head was located to the right of the virtual body. SL7 I felt as if my arms were where I saw the upper arms of the virtual body to be. SL8 I felt as if my arms were where I saw the lower arms of the virtual body to be. SL9 I felt as if my arms were to the left from where I saw the arms of the virtual body to be. SL10 I felt as if my arms were to the right from where I saw the arms of the virtual body to be. SL11 It was easy to grab the cabbages. Table 4.3: Second set of statements about self-location (SL) of the post-trial questionnaire. The participants answered on a 7-point Likert-type scale indicating the extent to which the statement applied to them during the trial (1=strongly disagree, 7=strongly agree). Custom statements by the authors. the four body configurations. The first set consisted of statements about the sense of ownership, agency, and change experienced by the participants while performing the tasks. The second set of statements consisted of statements about self-location. Table 4.2 shows the statements from the first set, and Table 4.3 shows the statements from the second set. To be able to make a meaningful statement about each individual factor (“ownership“, “agency“, “change“), the results has to be summed and grouped accordingly. This was done by calculating a score for each factor, which is the sum of the scores for each statement. Specifically, this means that the scores for statements OW1 to OW4 were added together and divided by four to give an overall score for the factor “ownership“. The same was done for the factors “agency“ and “change“ (see Equations 4.1, 4.2 and 4.3) [43]. We then compared the results of the individual factors between the different body configurations. Scoring “Ownership“ = OW1 + OW2 + OW3 + OW4 4 (4.1) Scoring “Agency“ = AG1 + AG2 + AG3 + AG4 4 (4.2) Scoring “Change“ = CH1 + CH2 + CH3 + Ch4 4 (4.3) 51 4. Evaluation ID Statement P1 I liked my virtual body. P2 My virtual body was disturbing. P3 It was easy to interact with the Go-Go interaction metaphor. P4 It was easy to interact with the GazeAndManipulate interaction metaphor. Table 4.4: Statements of the post-experiment questionnaire about the VR experience and interaction metaphor usability. The participants answered on a 7-point Likert-type scale indicating the extent to which the statement applied to them during the trial (1=strongly disagree, 7=strongly agree). Custom statements by the authors. For the post-experiment data analysis, statements about the VR experience and the usability of the two interaction metaphors, as well as additional comments, were analyzed. The statements are listed in Table 4.4. 4.5.4 Statistics During the user study, we recorded a total of 3200 sub-trials (1600 trials) from 20 users. Some of the sub-trials in the user study took an unreasonably long time, especially the first few trials because the participants often did not look in all directions to find the object and therefore took longer to find it. Therefore, we used the “Interquartile Range“ (IQR) method to detect and remove outliers from the data. To do this, we calculated the IQR for each independent variable and removed all data points that were more than 1.5 * IQR below the 25th percentile (Q1) or above the 75th percentile (Q3). This method ensured that extreme outliers in the data points did not skew the overall result in one direction or the other. 196 outliers were identified and discarded, resulting in a total of 3004 sub-trials (1502 trials). Regarding statistical analyses, we proceeded with the following pipeline. For normally distributed metrics, assessed using the Shapiro-Wilk test, we analyzed variance (ANOVA) with repeated measures factors. Greenhouse-Geisser adjustments were applied to the degrees of freedom when the sphericity assumption was violated. For metrics that deviated from a normal distribution, we used the non-parametric Aligned Rank Transform (ART) test [47]. The post-hoc analysis involved pairwise t-tests with Bonferroni corrections for customarily distributed dependent variables or the multifactor contrast test procedure presented in [13] for the non-normally distributed ones. Significant tests are reported with p-values lower than 0.05. 4.6 Results In this chapter, we present the results of the data evaluation. We first present the results from the objective data evaluation, followed by the data from the subjective data evaluation (divided into post-trial, post-block data, and post-experiment data). 52 4.6. Results 4.6.1 Objective Data Task Completion Time - Figure 4.4 shows how long users took on average to grab and collect objects for the independent variables “distance“ and “angle“. We compared the four independent variables “body configuration“, “arm“, “distance“, and “angle“. The results indicated an effect of the selection technique on completion time (F1,3000 =33.64, p < 0.001), where participants were faster with Go-Go (4.82±2.05) than Gaze-and- Manipulate (5.22±2.31). For the independent variables “body configuration“ and “arm“, no significant differences in task completion time were observed. However, for the independent variable “distance“ we could see that a smaller distance leads to shorter reaching times for both metaphors (F1,3000 = 116.51, p < 0.001). Participants were faster to grab closer cabbages (4.68±2.12) than farther ones (5.37±2.21). Similarly, a significant difference in task completion time was found based on the angle of the object (F4,3000 =180.04, p<0.001). Cabbages were grabbed faster when they were in front of the users at angle 0° (3.79±1.64), followed by -45° (4.71±1.96) and 45° (4.90±2.18), followed by 90° (5.79±2.14) and -90° (5.99±2.25). Figure 4.4: Task completion times for both interaction metaphors for the independent variables “distance“ (left) and “angle“ (right). 4.6.2 Subjective Data Post-Trial Data Agency - After each sub-trial, we collected and analyzed the responses to the 7-point Likert statement. This statement referred to the experienced agency during the grabbing task. The results can be observed in Figure 4.5. All values on the 7-point Likert scale were given at least once, with most responses in the upper range. The plot shows that 50% of the data lied between agency scores 5 and 7. We found a significant effect of the selection technique on the agency after grabbing a cabbage (F1,3000 = 108.50, 53 4. Evaluation Figure 4.5: Scores for the agency statement displayed after each sub-trial for the inde- pendent variables “metaphors“ (top left), “arm“ (top right), “angle“ (bottom left), and “distance“ (bottom right). p<0.001), where participants felt a higher sense of agency with Go-Go (4.81±1.05) than Gaze-and-Manipulate (4.49±0.97). Moreover, we noticed an effect of the independent variable “angle“ on agency after each trial (F4,3000 =17.71, p<0.001), where users felt a higher agency for cabbages in front of them at angle 0° (4.83±0.91), followed by cabbages at -45° (4.72±0.98) and 45° (4.70±1.03), followed by cabbages at -90° (4.50±1.06) and 90 (4.50±1.10). Selection Metaphor Preference - The data about the users’ preferences are shown in Figure 4.6. These data refer to the question about the preferred interaction metaphor, asked after each trial. For the independent variables “distance“ and “angle“, a clear 54 4.6. Results preference for the Go-Go interaction was seen. When the objects were positioned further away from the user, this preference was somewhat weaker but still clearly in favor of the Go-Go interaction. For the independent variable “angle“, this preference for Go-Go was significantly stronger for the smaller angles than for the extreme angles (-90° and 90°). For objects positioned at extreme angles, there was only a small preference for Go-Go. When the objects were positioned directly in front of the user, Go-Go was preferred almost twice as often as Gaze. The result of the analysis of the independent variable “body configuration and angle“ confirmed that the Go-Go interaction metaphor was generally preferred, but there are two outliers visible in the plot, namely for the extreme angle of -90°, the Gaze-and-Manipulate metaphor was preferred for the body configuration “Left Lower“, as well as for the angle 90° and the body configuration “Right Upper“. Otherwise, a similar picture emerged, i.e., the Go-Go interaction was favored more strongly for the smaller angles than for the larger angles. Post-Block Data This section presents the results of the post-block data analysis, which consisted of the summed scores of the factors of the first block of statements (ownership, agency, and change statements) and the individual scores of each statement of the second block (self-location statements) from the post-block questionnaire (refer to Appendix G for the complete questionnaire). The scores are shown in Figures 4.7, 4.8, and 4.9. Ownership - The statements on body ownership were rated similarly for all body configu- rations (F3,57 =2.08, p=0.13), with a median between 4 and 4.5 for all four configurations. For the “Left Upper“ and “Right Upper“ configurations, the sense of ownership was rated at least once with a score of 1, with “Left Upper“ also receiving a score of 7 at least once (Figure 4.7a). Agency - We found a significant effect of “body configuration“ on the agency scores (F3,57 = 3.86, p < 0.01). The median for all configurations was between 5.3 and 5.8 (Figure 4.7b), and post-hoc tests showed that participants reported the lowest agency with the “Right Lower“ configuration (5.55±0.77), while the highest sense of agency was experienced with the “Right Upper“ configuration (5.77±0.81). Change - We found no significant effect of “body configuration“ on the change scores (F3,57 =2.11, p=0.10). In addition, each body configuration received a score of 1 at least once, and the sense of change was not rated 7 for any configuration (Figure 4.7c). SL1 - For this statement, we found no significant differences between the individual body configurations (F3,57 =0.86, p=0.46). The median of all four configurations was 4.5 or 5 (Figure 4.8a). SL2 - For this statement, we found statistically significant differences between the different body configurations (F3,57 =6.30, p<0.001). Post-hoc tests showed that the scores were higher for the “Left Lower“ (4.25±2.12) and “Left Upper“ (3.80±1.76) compared to 55 4. Evaluation Figure 4.6: Preference scores for the independent variables “distance“ (top left), “angle“ (top right) and “body configuration and angle“ (bottom). “Right Upper“ (2.40±1.39) and “Right Lower“ (2.55±1.50). The analysis clearly shows that the participants rated this statement higher when they embodied the left eye than when they embodied the right eye (Figure 4.8b). SL3 - We also found statistically significant differences between the different body 56 4.6. Results (a) Ownership scores (b) Agency scores (c) Change scores Figure 4.7: Boxplots of the ownership statements (OW1-OW4), agency statements (AG1-AG4) and change statements (CH1-CH4) scores from the Virtual Embodiment Questionnaire [43]. 57 4. Evaluation (a) SL1. (b) SL2. (c) SL3. (d) SL4. (e) SL5. (f) SL6. Figure 4.8: Boxplots of the self-location statements scores (SL1-SL6) grouped by body configuration. 58 4.6. Results (a) SL7. (b) SL8. (c) SL9. (d) SL10. (e) SL11. Figure 4.9: Boxplots of the self-location statements scores (SL7-SL11). 59 4. Evaluation configurations (F3,57 = 6.30, p < 0.001). Post-hoc tests showed that the scores were higher for the “Right Lower“ (3.65±1.98) and “Right Upper“ (3.70±1.83) compared to “Left Upper“ (2.35±1.13) and “Left Lower“ (2.10±1.20). The analysis showed that the participants rated this statement higher if they embodied the right eye compared to the left eye (Figure 4.8c). SL4 - We found no significant differences (F3,57 = 0.33, p = 0.80) between the body configurations for the statement that users located their heads in the center of the virtual body (Figure 4.8d). SL5 - Similar to SL2 and SL3, we also found statistically significant differences between the body configurations (F3,57 = 9.94, p < 0.001). Post-hoc tests showed that “Left Lower“ (4.55±2.03) had higher scores than “Right Lower“ (2.55±1.50) and “Right Upper“ (2.05±0.88), and “Left Upper“ (3.50±1.76) than “Right Upper“ (Figure 4.8e). SL6 - We found a significant effect of body configurations (F3,57 = 8.49, p < 0.001). Post-hoc tests showed that scores for “Right Lower“ (3.85±1.89) and “Right Upper“ (3.85±1.78) were higher than “Left Lower“ (2.00±1.02) and “Left Upper“ (2.55±1.39) (Figure 4.8f). SL7 - This statement referred to the location of the arms and whether the participant saw them where the upper arms of the virtual body are or where the lower arms of the virtual body are. Again, we found a statistically significant difference between the different body configurations (F3,57 =5.69, p<0.001). This statement was rated higher when participants controlled the upper arms (5.20±1.28) compared to the lower arms (3.95±1.78) (Figure 4.9a). SL8 - The same applied here as for SL7, with the difference that the statement refered to the location of the lower arms. We found a significant effect of body configurations (F3,57 =4.87, p<0.001). The scores were higher if the participants performed the grabbing tasks with the lower arms (3.85±1.85) than the upper arms (2.55±1.46) (Figure 4.9b). SL9 - In this statement we did not find a significant effect of body configurations (F3,57 =2.35, p=0.08). The question refers to the fact that the arms were to the left of the point where participants saw the arms of the virtual body (Figure 4.9c). SL10 - We could not find any significant differences for this statement (F3,57 = 0.13, p=0.93) (Figure 4.9d). SL11 - We also found no differences between the individual body configurations for this statement. The statement that it was easy to grab the cabbages was consistently rated highly, with a median of 6 for all body configurations (Figure 4.9e). Post-Experiment Data This section summarizes the results of the post-experiment data analysis, which consisted of the scores of the statements of the post-experiment questionnaire (the full questionnaire 60 4.6. Results Figure 4.10: Scores for the post-experiment questionnaire statements. can be found in Appendix G) and a summary of the additional comments made by the participants. The results are shown in Figure 4.10. Post-Experiment questions Participants largely reported that they liked the virtual avatar (P1) (4.55±1.73), but some found it disturbing (P2) (3.15±1.54), possibly due to its inhuman appearance. When asked if it was easy to interact with the Go-Go interaction (P3) , the majority gave a very high score (6.00±0.97). However, when asked about the ease of interacting with the Gaze interaction metaphor (P4), participants tended to give a lower score (4.80±1.57). Participant Feeback Some participants stated that they found the Go-Go interaction intuitive and comfortable. In addition, some stated that they preferred the Go-Go interaction for interacting with close objects and the Gaze interaction for interacting with distant objects. In terms of the Gaze interaction, some participants commented that it would be more convenient if the same button was used for retrieving and grabbing an object rather than two separate buttons. One participant said that a crosshair would be helpful for aiming at objects and that gaze would work better if eye movements were tracked by the HMD. Regarding the avatar, it was noted that the shoulder height of the avatar should be adjusted to the real height of the participant in order to improve the agency. It was also noted that a third option for the user interface to select the preferred metaphor would be good, as sometimes both interaction metaphors felt the same in terms of agency and participants were forced to choose one interaction metaphor. 61 CHAPTER 5 Discussion In this chapter, we summarize and discuss the results of the user study. We test the proposed hypotheses by interpreting the collected data on task performance, user experience, and SoE. In section 5.4, we draw conclusions about the usability of the dissimilar avatar in a co-embodied scenario and the effectiveness of the two interaction metaphors, as well as pointing out the limitations of our work in section 5.5. 5.1 User Performance 5.1.1 H1.1 The results showed that the time taken by users to grab the objects with the interaction metaphors differed only slightly for the medium and long distances (see Figure 4.4). The median for all positions and both interaction metaphors was in the range of 4 and 5.3 seconds. Nevertheless, it could be seen that for both the long and medium distances, the time taken by the users to grab the object with the Go-Go interaction was slightly shorter. This may be due to the fact that the Go-Go interaction does not require any additional button clicks, whereas the Gaze-and-Manipulate interaction requires a button click to retrieve the object to the hand. Although the literature suggested that gaze- based interaction was faster than hand-based selection with a mouse [44] and provided more subjective immersion than mouse-based interaction [18], we did not observe such tendencies. However, studies have shown that the Go-Go interaction (in combination with the PRISM technique) was faster when interacting with objects at close range (objects 0.6 meters away), whereas gaze-based interactions were faster when interacting with objects at a longer distance (objects 3+ meters away) [46]. This was partly consistent with our results, as our long-distance setup used a distance of 2 meters, which may have been too short for efficient gaze interactions. We therefore hypothesize that we obtained these results because the objects were too close to exploit the full potential of the 63 5. Discussion Gaze-and-Manipulate interaction. In summary, our first hypothesis (H1.1) is validated; objects that are close to the avatar can be grabbed faster with Go-Go interaction than with Gaze-and-Manipulate. 5.1.2 H1.2 The results also showed that task completion times for the Go-Go interaction were slightly shorter than for the Gaze-and-Manipulate interaction for both the medium and long positions (see Figure 4.4). This could be due to the fact that in the Gaze-and-Manipulate interaction, the user first has to aim at the object before he can retrieve it, which can take a short moment if the object is not directly being looked at. In summary, these results do not support our second hypothesis (H1.2). Objects that are further away cannot be grabbed faster with Gaze-and-Manipulate than with the Go-Go interaction; instead, they can be grabbed slightly faster with the Go-Go interaction. 5.1.3 H1.3 The results confirmed that the time it took the player to grab and collect objects depended on the position at which the object was spawned. While there was no significant difference in task completion time between the positive angle and its mirrored negative value, there was indeed a difference in time between smaller and larger angles (regardless of whether it was a positive or negative angle) (see Figure 4.4). The measured time was smallest for the 0° angle, followed by the 45°/-45° angle. The longest time was measured for the 90°/-90° angles. As Fitts’ law states, the time required to reach an object is a relationship between the distance and width of the target, as well as the reaction time required by the user to localize the target (see Section 2.6 for a detailed explanation). We assume that users could locate objects in front of them more quickly than objects to the side, as they did not have to “search“ for the object. Therefore, the total time required to complete the task was shorter. Thus, our third hypothesis (H1.3) is validated; objects located at positions with a smaller angle relative to the player axis can be grabbed faster than objects located at a larger angle. 5.2 Sense of Embodiment and Agency 5.2.1 H2.1 We tested whether the different trial configurations (metaphor, arm, angle, distance) have an effect on the sense of agency. We found no significant effects for the independent variables “arm“ and “distance“ (see Figure 4.5). For the variable “metaphor“, however, the results showed that Go-Go conveyed a stronger sense of agency than Gaze-and- Manipulate (see Figure 4.5). The same was true for the variable “angle“: the smaller the angle, the greater the sense of agency. Participants rated agency lowest at the extreme angles (-90°/90°) (see Figure 4.5). We believe that Go-Go created a stronger sense of agency because the movements of the avatar and objects in this metaphor were completely 64 5.2. Sense of Embodiment and Agency controlled by the user, as opposed to Gaze, where sometimes a “virtual arm halo“ moved the object. Since studies have shown that the sense of agency is based on, among other things, efferent motor signals, reafferent feedback signals, and action intentions, which were not fully present in the case of the arm halo because the user did not actively control the arm halo, agency may have suffered [10]. Regarding the independent variable “angle“, we hypothesize that the smaller angles had higher agency scores because it felt more realistic for users to grab the objects from the front rather than from the side. The fact that the dissimilar avatar did not turn to the side as soon as the user turned around may have given users a strange sense of embodiment, which affected agency. This could also be seen from the fact that the task completion time was shorter for the smaller angles, which meant that it was easier for users to grab the object overall. We conclude that our hypothesis (H2.1) is partially validated, as only certain trial configurations significantly influence the sense of agency. 5.2.2 H2.2 Furthermore, we tested whether the sense of agency was influenced by the body con- figuration. The results showed that there was a significant effect of the independent variable “body configuration“ on the sense of agency (see Figure 4.7b). This could be due to the fact that the upper arms were more in line with the actual shoulders, resulting in a more human-like embodiment. For the lower arms, the sense of agency may have suffered because the virtual arms were positioned too far down the body, resulting in too much difference between the position of the real and virtual shoulders. Although the body configuration influenced the users’ agency, there were no significant results in terms of task performance. On average, users were similarly fast at collecting the objects with both the upper and lower arms. This could be due to the fact that users were very familiar with the movements of the virtual avatar due to the high agency, and therefore the body configuration did not hinder them much in performing the movements, resulting in similarly good performances for all configurations. This leads to the conclusion that our hypothesis (H2.2) is validated according to which body configuration influences agency. 5.2.3 H2.3 Regarding our third hypothesis (H2.3), we found an influence of the independent variable “body configuration“ on the sense of self-location (see Figures 4.8b, 4.8c, 4.8e, 4.8f, 4.9a, and 4.9b). The statements about the position of the arms and head were rated higher when the participants were in the body configurations mentioned in the statements. For example, SL2 (“I felt as if my body was located to the left of the virtual body.“ [43]) was rated higher for the body configurations “Left Upper“ and “Left Lower“ than for “Right Upper“ and “Right Lower“. This showed that users perceived the body configurations they embodied exactly as we intended and that the dissimilar avatars provided a consistent sense of self-location to the users. Therefore, our hypothesis (H2.3) is validated; the body configuration influences the sense of self-location. 65 5. Discussion 5.3 User preferences 5.3.1 H3 Finally, the results showed that there was a clear preference overall for one interaction metaphor. At both medium and long distances, participants preferred the Go-Go interaction over the Gaze-and-Manipulate interaction (see Figure 4.6). The same applied to the different angles of the positions at which the objects were spawned. While a clear preference for the Go-Go interaction could be observed at the smaller angles (at the 0° angle the number of responses in favor of the Go-Go interaction was almost twice as high as for the Gaze-and-Manipulate interaction), the larger angles (90°/-90°) no longer showed such a clear preference. In addition, when we looked at the factor “body configuration and angle“, we could see that for the extreme angles there was even a preference for the Gaze-and-Manipulate interaction over the Go-Go interaction (see Figure 4.6). One reason for this could be that the Go-Go stretch logic behaved differently depending on the direction in which the arm was stretched. Because the pivot point of the stretch logic was fixed, the virtual arm was stretched more when the real arm was moved forward than when it was moved sideways. It is possible that the objects spawned at a larger angle could not be reached as easily and without the user having to lean sideways as in the Gaze-and-Manipulate interaction. In summary, our hypothesis (H3) is validated by the results; objects located at positions with a smaller angle relative to the player axis are preferably grabbed with Go-Go than with Gaze-and-Manipulate. 5.4 Discussion To summarize, the developed interaction metaphors were an effective complement to the dissimilar avatar and, according to the results of the user study, definitely promoted a SoE. After Xu et al. [49] found that embodying and controlling a dissimilar avatar, such as a stray animal, elicited a strong SoE in users, we confirm that our dissimilar avatar also conveyed a decent SoE. It turned out that the Go-Go interaction was overall slightly more efficient and user-friendly than the Gaze-and-Manipulate interaction metaphor. Especially for close objects, Go-Go was an effective way to use the physical characteristics of the dissimilar avatar and to enrich the interaction with it. Gaze-and-Manipulate, on the other hand, had its strengths when interacting with distant objects, which was consistent with the results of previous work [46]. Therefore, we recommend considering the Go-Go interaction metaphor whenever interaction with close objects is required and the avatar allows non-linear distortion of the arms. In the course of the user study, we made interesting discoveries that could be useful for future development of the interaction for the dissimilar avatar, such as that the avatar should possibly rotate with the user to achieve a higher sense of agency (H2.1), or a Go-Go interaction logic with adaptive pivots (H3). In addition, when designing a co-embodied, dissimilar avatar, attention should be paid to where the arms are positioned on the virtual body. In the case of our dissimilar avatar, users perceived a stronger SoE 66 5.5. Limitations with the upper arms, which can be used as information for the design of future dissimilar avatars. The shoulder height of the avatar should be adjusted to the real height of the participant in order to improve the SoE and agency. Overall, we believe that with this work and the results obtained, we have created an important basis for further research on this topic. 5.5 Limitations While the developed interaction metaphors aim to provide an efficient way to interact with (distant) VR objects, there are also limitations in the described setting. A major limitation is that the avatar does not move or react to the user’s walking movements. A system would need to be developed that effectively transfers both users’ walking movements to the avatar without severely compromising the SoE or even creating an “out-of-body“ experience. Although the existing VR environment provides a basic solution for navigation, it would need to be reconciled with the two interaction metaphors in a future iteration to realize the full potential of co-embodiment. Furthermore, in our evaluation task, we only evaluated how efficient the interaction metaphors for the dissimilar avatar are for grabbing objects located in front of the user. We only tested a maximum field of view of 180°. However, it would be interesting to find out how the usability and SoE of users would behave with a 360° setup, i.e., if objects spawned not only in front of the user but also behind them, forcing them to turn around. Another limitation concerns the Gaze-And-Manipulate interaction metaphor. The se- lection of objects was found to be difficult and imprecise by some participants in the user study. In our implementation, we use the direction in which the user moves their head to determine the direction of gaze, which only works well if the user really aligns his head precisely with the object and rotates his head accordingly. However, this can lead to neck pain during prolonged interaction if objects are positioned very far to the side of the user. One solution would be to track the much more accurate and natural eye movement instead of the head movement and select objects based on the actual direction of gaze. In this way, the selection of objects would be much easier for the user. As there is no standard embodiment questionnaire for the use of co-embodied dissimilar avatars, we had to use a questionnaire developed for human avatars for the post-experiment questionnaires. This can be seen as a limitation in that the embodiment of a dissimilar avatar in a co-embodied scenario can be seen as a novel sensation, and the questions of the Virtual Embodiment Questionnaire [43] may not optimally capture the perceived agency and SoE. 67 CHAPTER 6 Conclusion 6.1 Summary In this thesis, we have presented an experimental platform where different interaction metaphors can be used to enable co-embodied experiences with a dissimilar avatar in VR. This area of research is still unexplored and offers exciting opportunities in the design of new ways of interacting with avatars, as well as the possibility of creating novel experiences with co-embodied, dissimilar avatars that can be used, for example, in collaborative learning environments or therapeutic applications. For the development, we used the Unity3D engine and the Photon Unity Networking 2 (PUN2) library 1 library for real-time synchronization of both users’ movements. Two users can launch the application and simultaneously control a dissimilar avatar in the form of an upright standing slug. The dissimilar avatar has four tentacles and two eye stalks. The body configuration, i.e., which user controls which limbs, can be set freely. Furthermore, the users can use two different selection and manipulation metaphors to grab, view, and move objects in the virtual environment. The use of a non-linear mapping technique in the form of the Go-Go interaction technique and a gaze-based interaction metaphor in the form of the Gaze-and-Manipulate interaction technique allows the user to interact with distant virtual objects and manipulate them in an efficient and intuitive way. By conducting a user study, the proposed experimental platform was evaluated in terms of user experience, task performance, and SoE. By comparing the two developed interaction metaphors, the usability and effectiveness of each technique were evaluated, as well as the co-embodiment experience when using a dissimilar avatar. The results showed that both tested interaction metaphors evoked a high sense of agency among the users. Furthermore, the usability of both techniques was tested and resulted in consistently positive feedback from the participants. There was a slight preference 1https://www.photonengine.com/pun 69 6. Conclusion for the Go-Go interaction, and it was shown that both interaction techniques have the potential to enrich user interaction with a dissimilar avatar and that especially the Go-Go interaction metaphor has a great potential for co-embodied interaction. 6.2 Future Work The experimental platform presented here can be further developed in the future. One possible improvement would be to investigate situations in which two users simultaneously control the dissimilar avatar and interact with the environment. This was not investigated in the user study conducted; instead, only a second instance of the application was launched, and the limbs of the second instance were set to a static position. Although this gave the participants a feeling of co-embodiment, it would be even more immersive if these limbs also moved as a result of user interaction. As described in the “Limitations“ section 5.5, the Gaze-and-Manipulate selection logic relies on the user’s head movements. However, it would be possible to determine the direction of gaze much more accurately with real eye movement detection. This is another possible improvement for future updates of the experimental platform. This would require appropriate hardware, namely HMDs that can record and interpret eye movements, such as the “HTC Vive Pro Eye“ 2 or the “Meta Quest Pro“ 3 headset. Overall, we have created a foundation for future development of interaction in a co- embodied environment with a dissimilar avatar. The avatar can be made even more user-friendly, and the topic of co-embodiment for a dissimilar avatar can be further investigated by developing additional interaction metaphors and working on the co- navigation of the avatar. 2https://www.vive.com/sea/product/vive-pro-eye/overview/ 3https://www.meta.com/at/en/quest/quest-pro/ 70 List of Figures 2.1 The three body representations to test body ownership (first-person perspec- tive) [34]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Types of virtual avatars. (a) and (b) are anthropomorphic avatars, (c) is a non-anthropomorphic (dissimilar) avatar [24]. . . . . . . . . . . . . . . . . 9 2.3 Different avatar representation reported in the literature. . . . . . . . . . 10 2.4 Proposed categorization system for dissimilar avatars applied to a virtual hand [7]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.5 Social presence configurations of agents [35]. . . . . . . . . . . . . . . . . . 12 2.6 Virtual avatar controlled based on the weighted average of the teacher’s and learner’s movements in a co-embodiment scenario. [29]. . . . . . . . . . . 13 2.7 Full iteration of the FABRIK algorithm consisting of a forward iteration (a)-(d) and a backward iteration (e)-(f) [3]. . . . . . . . . . . . . . . . . . 15 2.8 The mapping function F used in the Go-Go Interaction technique [41]. . . 18 2.9 The “Gaze-and-Pinch“ interaction with one or two hands: look at an object, pinch to select it, manipulate it with hand gestures [39]. . . . . . . . . . . 19 3.1 Left - Overview of the project’s architecture, including the co-embodied dissimilar avatar. Right - An example body configuration of the two users showing the limbs they control. . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 The interaction test scene in which the user can grab cabbages using the interaction metaphors (left - in third-person perspective, right - first-person perspective). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 Overview of the room creation and networking logic when starting the appli- cation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Overview of the logic of the synchronized and networked movements of the players that drive the dissimilar avatar limbs. . . . . . . . . . . . . . . . . 26 3.5 Left: Shaded model of the dissimilar avatar. Right: Transparent avatar with its rig and bones structure. . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.6 Go-Go interaction with different arm extensions. Left: real arm extension movement. Middle: movement of the virtual avatars’ arm. Right: First- person perspective of the virtual arm. The “wristband“ objects represent the positions of the real hands. . . . . . . . . . . . . . . . . . . . . . . . . . . 30 71 3.7 The nine rays that check for gazed objects. The gaze ray distance (GRD) defines the spacing between the individual rays. (Slightly tilted view to be able to see the rays.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.8 Animation of the arm halo retrieving the distant object. Once the object is close enough, the halo mesh is turned off. . . . . . . . . . . . . . . . . . . 32 3.9 The possible spawning positions of the objects for five angles and two distances (pink = medium distance, blue =long distance). . . . . . . . . . . . . . . . 36 3.10 The UI during each sub-trial to indicate the interaction metaphor and the arm to use. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.11 The UI after each sub-trial with a 7-point Likert scale slider. . . . . . . . 38 3.12 The UI after both sub-trials with a choice between the two interaction metaphors to indicate the preference. . . . . . . . . . . . . . . . . . . . . . 39 3.13 One full iteration of the evaluation system for one body configuration. Blue boxes: system actions. Green boxes: user actions. . . . . . . . . . . . . . . 41 4.1 Left: User wearing a HMD and controllers performing a grabbing task. Right: First-person perspective of the VR environment. . . . . . . . . . . . . . . 46 4.2 Participants’ experience levels with VR with an HMD and video games. . 47 4.3 Procedure of the study with the approximate time allocated to each part. 48 4.4 Task completion times for both interaction metaphors for the independent variables “distance“ (left) and “angle“ (right). . . . . . . . . . . . . . . . . 53 4.5 Scores for the agency statement displayed after each sub-trial for the inde- pendent variables “metaphors“ (top left), “arm“ (top right), “angle“ (bottom left), and “distance“ (bottom right). . . . . . . . . . . . . . . . . . . . . . 54 4.6 Preference scores for the independent variables “distance“ (top left), “angle“ (top right) and “body configuration and angle“ (bottom). . . . . . . . . . 56 4.7 Boxplots of the ownership statements (OW1-OW4), agency statements (AG1- AG4) and change statements (CH1-CH4) scores from the Virtual Embodiment Questionnaire [43]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.8 Boxplots of the self-location statements scores (SL1-SL6) grouped by body configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.9 Boxplots of the self-location statements scores (SL7-SL11). . . . . . . . . 59 4.10 Scores for the post-experiment questionnaire statements. . . . . . . . . . . 61 72 List of Tables 4.1 Order of the body configurations defined by a Balanced Latin Square design. For the fifth participant, the order of participant #1 was used again, and so on. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2 First set of statements about ownership (OW), agency (AG) and change (CH) of the post-trial questionnaire. The participants answered on a 7-point Likert-type scale indicating the extent to which the statement applied to them during the trial (1=strongly disagree, 7=strongly agree). Statements taken from [43]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3 Second set of statements about self-location (SL) of the post-trial questionnaire. The participants answered on a 7-point Likert-type scale indicating the extent to which the statement applied to them during the trial (1=strongly disagree, 7=strongly agree). Custom statements by the authors. . . . . . . . . . . . 51 4.4 Statements of the post-experiment questionnaire about the VR experience and interaction metaphor usability. The participants answered on a 7-point Likert-type scale indicating the extent to which the statement applied to them during the trial (1=strongly disagree, 7=strongly agree). Custom statements by the authors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 73 Bibliography [1] Ferran Argelaguet and Carlos Andujar. A survey of 3d object selection techniques for virtual environments. Computers & Graphics, 37(3):121–136, 2013. [2] Ferran Argelaguet, Ludovic Hoyet, Michaël Trico, and Anatole Lécuyer. The role of interaction in virtual embodiment: Effects of the virtual hand representation. In 2016 IEEE virtual reality (VR), pages 3–10. IEEE, 2016. [3] Andreas Aristidou and Joan Lasenby. Fabrik: A fast, iterative solver for the inverse kinematics problem. Graphical Models, 73(5):243–260, 2011. [4] Chris Auteri, Mark Guerra, and Scott Frees. Increasing precision for extended reach 3d manipulation. International Journal of Virtual Reality, 12(1):66–73, 2013. [5] Samantha Bond, Deepika R Laddu, Cemal Ozemek, Carl J Lavie, and Ross Arena. Exergaming and virtual reality for health: implications for cardiac rehabilitation. Current Problems in Cardiology, 46(3):100472, 2021. [6] Matthew Botvinick and Jonathan Cohen. Rubber hands ‘feel’touch that eyes see. Nature, 391(6669):756–756, 1998. [7] Antonin Cheymol, Anatole Lécuyer, Jean-Marie Normand, Ferran Argelaguet, et al. Beyond my real body: Characterization, impacts, applications and perspectives of “dissimilar” avatars in virtual reality. IEEE Transactions on Visualization and Computer Graphics, 2023. [8] Carlos Coelho, Jennifer Tichon, Trevor J Hine, Guy Wallis, and Giuseppe Riva. Media presence and inner presence: the sense of presence in virtual reality technologies. From communication to presence: Cognition, emotions and culture towards the ultimate communicative experience, 11:25–45, 2006. [9] Nicolas Courty and Elise Arnaud. Inverse kinematics using sequential monte carlo methods. In International Conference on Articulated Motion and Deformable Objects, pages 1–10. Springer, 2008. [10] Nicole David, Albert Newen, and Kai Vogeley. The “sense of agency” and its underlying cognitive and neural mechanisms. Consciousness and cognition, 17(2):523– 534, 2008. 75 [11] Diane Dewez, Rebecca Fribourg, Ferran Argelaguet, Ludovic Hoyet, Daniel Mestre, Mel Slater, and Anatole Lécuyer. Influence of personality traits and body awareness on the sense of embodiment in virtual reality. In 2019 IEEE international symposium on mixed and augmented reality (ISMAR), pages 123–134. IEEE, 2019. [12] Diane Dewez, Ludovic Hoyet, Anatole Lécuyer, and Ferran Argelaguet Sanz. Towards “avatar-friendly” 3d manipulation techniques: Bridging the gap between sense of embodiment and interaction in virtual reality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–14, 2021. [13] Lisa A Elkin, Matthew Kay, James J Higgins, and Jacob O Wobbrock. An aligned rank transform procedure for multifactor contrast tests. In The 34th annual ACM symposium on user interface software and technology, pages 754–768, 2021. [14] Guoxin Fang, Yingjun Tian, Zhi-Xin Yang, Jo MP Geraedts, and Charlie CL Wang. Efficient jacobian-based inverse kinematics with sim-to-real transfer of soft robots by learning. IEEE/ASME Transactions on Mechatronics, 27(6):5296–5306, 2022. [15] Chlöé Farrer, M Bouchereau, Marc Jeannerod, and Nicolas Franck. Effect of distorted visual feedback on the sense of agency. Behavioural neurology, 19(1-2):53–57, 2008. [16] Rebecca Fribourg, Nami Ogawa, Ludovic Hoyet, Ferran Argelaguet, Takuji Narumi, Michitaka Hirose, and Anatole Lécuyer. Virtual co-embodiment: evaluation of the sense of agency while sharing the control of a virtual body among two individuals. IEEE Transactions on Visualization and Computer Graphics, 27(10):4023–4038, 2020. [17] Andrew Goldenberg, Beno Benhabib, and Robert Fenton. A complete general- ized solution to the inverse kinematics of robots. IEEE Journal on Robotics and Automation, 1(1):14–20, 1985. [18] Teresia Gowases, Roman Bednarik, and Markku Tukiainen. Gaze vs. mouse in games: The effects on user experience. In Proceedings of the International Conference on Advanced Learning Technologies, Open Contents & Standards (ICCE), pages 773–777, 2008. [19] Tovi Grossman and Ravin Balakrishnan. Pointing at trivariate targets in 3d envi- ronments. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 447–454, 2004. [20] Harin Hapuarachchi and Michiteru Kitazaki. Knowing the intention behind limb movements of a partner increases embodiment towards the limb of joint avatar. Scientific Reports, 12(1):11453, 2022. [21] Ludovic Hoyet, Ferran Argelaguet, Corentin Nicole, and Anatole Lécuyer. “wow! i have six fingers!”: Would you accept structural changes of your hand in vr? Frontiers in Robotics and AI, 3:27, 2016. 76 [22] Hamid Hrimech, Leila Alem, and Frederic Merienne. How 3d interaction metaphors affect user experience in collaborative virtual environment. Advances in Human- Computer Interaction, 2011(1):172318, 2011. [23] Yu Jiang, Zhipeng Li, Mufei He, David Lindlbauer, and Yukang Yan. Handavatar: Embodying non-humanoid virtual avatars through hands. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–17, 2023. [24] Dominic Kao. The effects of anthropomorphic avatars vs. non-anthropomorphic avatars in a jumping game. In Proceedings of the 14th international conference on the foundations of digital games, pages 1–5, 2019. [25] Ben Kenwright. Inverse kinematics–cyclic coordinate descent (ccd). Journal of Graphics Tools, 16(4):177–217, 2012. [26] Konstantina Kilteni, Raphaela Groten, and Mel Slater. The sense of embodiment in virtual reality. Presence: Teleoperators and Virtual Environments, 21(4):373–387, 2012. [27] Chang-Seop Kim, Myeongul Jung, So-Yeon Kim, Kwanguk Kim, et al. Controlling the sense of embodiment for virtual avatar applications: methods and empirical study. JMIR Serious Games, 8(3):e21879, 2020. [28] Daiki Kodama, Takato Mizuho, Yuji Hatada, Takuji Narumi, and Michitaka Hirose. Effect of weight adjustment in virtual co-embodiment during collaborative training. In Proceedings of the Augmented Humans International Conference 2023, pages 86–97, 2023. [29] Daiki Kodama, Takato Mizuho, Yuji Hatada, Takuji Narumi, and Michitaka Hirose. Effects of collaborative training using virtual co-embodiment on motor skill learning. IEEE Transactions on Visualization and Computer Graphics, 29(5):2304–2314, 2023. [30] Andrey Krekhov, Sebastian Cmentowski, Katharina Emmerich, and Jens Krüger. Beyond human: Animals as an escape from stereotype avatars in virtual reality games. In Proceedings of the annual symposium on computer-human interaction in play, pages 439–451, 2019. [31] Andrey Krekhov, Sebastian Cmentowski, and Jens Krüger. Vr animals: Surreal body ownership in virtual reality games. In Proceedings of the 2018 Annual Symposium on Computer-Human Interaction in Play Companion Extended Abstracts, pages 503–511, 2018. [32] R. Likert. A Technique for the Measurement of Attitudes. Number Nr. 136-165 in A Technique for the Measurement of Attitudes. Columbia university, 1932. [33] Christos Lougiakis, Akrivi Katifori, Maria Roussou, and Ioannis-Panagiotis Ioannidis. Effects of virtual hand representation on interaction and embodiment in hmd-based 77 virtual environments using controllers. In 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pages 510–518. IEEE, 2020. [34] Jean-Luc Lugrin, Maximilian Ertl, Philipp Krop, Richard Klüpfel, Sebastian Stier- storfer, Bianka Weisz, Maximilian Rück, Johann Schmitt, Nina Schmidt, and Marc Erich Latoschik. Any “body” there? avatar visibility effects in a virtual reality game. In 2018 IEEE conference on virtual reality and 3D user interfaces (VR), pages 17–24. IEEE, 2018. [35] Michal Luria, Samantha Reig, Xiang Zhi Tan, Aaron Steinfeld, Jodi Forlizzi, and John Zimmerman. Re-embodiment and co-embodiment: Exploration of social presence for robots and conversational agents. In Proceedings of the 2019 on Designing Interactive Systems Conference, pages 633–644, 2019. [36] I Scott MacKenzie. Fitts’ law as a research and design tool in human-computer interaction. Human-computer interaction, 7(1):91–139, 1992. [37] Ramakrishnan Mukundan. A robust inverse kinematics algorithm for animating a joint chain. International Journal of Computer Applications in Technology, 34(4):303– 308, 2009. [38] Solène Neyret, Anna I Bellido Rivas, Xavi Navarro, and Mel Slater. Which body would you like to have? the impact of embodied perspective on body perception and body evaluation in immersive virtual reality. Frontiers in Robotics and AI, 7:492886, 2020. [39] Ken Pfeuffer, Benedikt Mayer, Diako Mardanbegi, and Hans Gellersen. Gaze+ pinch interaction in virtual reality. In Proceedings of the 5th symposium on spatial user interaction, pages 99–108, 2017. [40] Thibault Porssut, Olaf Blanke, Bruno Herbelin, and Ronan Boulic. Reaching articular limits can negatively impact embodiment in virtual reality. Plos one, 17(3):e0255554, 2022. [41] Ivan Poupyrev, Mark Billinghurst, Suzanne Weghorst, and Tadao Ichikawa. The go-go interaction technique: non-linear mapping for direct manipulation in vr. In Proceedings of the 9th annual ACM symposium on User interface software and technology, pages 79–80, 1996. [42] Anna Samira Praetorius and Daniel Görlich. How avatars influence user behavior: A review on the proteus effect in virtual environments and video games. In Proceedings of the 15th International Conference on the Foundations of Digital Games, pages 1–9, 2020. [43] Daniel Roth and Marc Erich Latoschik. Construction of the virtual embodiment questionnaire (veq). IEEE Transactions on Visualization and Computer Graphics, 26(12):3546–3556, 2020. 78 [44] Linda E Sibert and Robert JK Jacob. Evaluation of eye gaze interaction. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems, pages 281–288, 2000. [45] William Steptoe, Anthony Steed, and Mel Slater. Human tails: ownership and control of extended humanoid avatars. IEEE transactions on visualization and computer graphics, 19(4):583–590, 2013. [46] Matthias Weise, Raphael Zender, and Ulrike Lucke. How can i grab that? solving issues of interaction in vr by choosing suitable selection and manipulation techniques. i-com, 19(2):67–85, 2020. [47] Jacob O Wobbrock, Leah Findlater, Darren Gergle, and James J Higgins. The aligned rank transform for nonparametric factorial analyses using only anova procedures. In Proc. of the ACM SIGCHI conference on human factors in computing systems, pages 143–146, 2011. [48] Erik Wolf, Nathalie Merdan, Nina Dölinger, David Mal, Carolin Wienrich, Mario Botsch, and Marc Erich Latoschik. The embodiment of photorealistic avatars influences female body weight perception in virtual reality. In 2021 IEEE Virtual Reality and 3D User Interfaces (VR), pages 65–74. IEEE, 2021. [49] Yao Xu, Ding Ding, Yongxin Chen, Zhuying Li, and Xiangyu Xu. istraypaws: Immersing in a stray animal’s world through first-person vr to bridge human-animal empathy. In 30th ACM Symposium on Virtual Reality Software and Technology, pages 1–11, 2024. [50] Laura Zapparoli, Eraldo Paulesu, Marika Mariano, Alessia Ravani, and Lucia M Sacheli. The sense of agency in joint actions: A theory-driven meta-analysis. Cortex, 148:99–120, 2022. 79 Appendix Appendix A 1 private void MapHandPosition(Vector3 pivot, PlayerArmStruct arm) 2 { 3 // calculate the direction and distance between real arm position and pivot 4 Vector3 handPivot = arm.RealHand.position - pivot; 5 float R_r = Vector3.Distance(arm.RealHand.position, pivot); 6 Vector3 handPivot_norm = handPivot/R_r; 7 8 if (R_r >= _D) 9 { 10 // non-linear part of the mapping function 11 float R_r_ = R_r + _coeffK * Mathf.Pow(R_r - _D, 2); 12 13 // set the new position 14 arm.VirtualHand.position = pivot + handPivot_norm * R_r_; 15 } 16 else 17 { 18 // lienar part of the mapping function 19 arm.VirtualHand.position = arm.RealHand.position; 20 } 21 22 // set the new rotation 23 arm.VirtualHand.rotation = arm.TentacleTip.rotation; 24 } Listing 1: The MapHandPosition function that maps the real hand position to the Go-Go hand position based on a non-linear mapping function. 80 Appendix B 1 Vector3 targetPosition; 2 Quaternion targetRotation = GetRotationRootSpace(Target); 3 4 // check which target has to be used based on the selected interaction metaphor 5 if (_isVirtualArm && RetrievingTarget != null) 6 { 7 // this is the character halo representation 8 targetPosition = GetPositionRootSpace(RetrievingTarget); 9 } 10 else if (_characterController.ArmUsesGoGoInteraction(GoGoTarget)) 11 { 12 // follow the go-go target (stretch) 13 targetPosition = GetPositionRootSpace(GoGoTarget); 14 } 15 else 16 { 17 // default interaction (no stretch) 18 targetPosition = GetPositionRootSpace(Target); 19 } Listing 2: Check for end-effector targets in the FABRIK IK logic. Appendix C 1 private void StretchIK(Vector3 targetPos) 2 { 3 var direction = (targetPos - Positions[0]).normalized; 4 5 // calculate ’tentacle tip/end-effector’ and ’root bone/end-effector’- distances 6 float remainingLen = (targetPos - Positions[Positions.Length - 1]). magnitude; 7 float totLength = (targetPos - Positions[0]).sqrMagnitude; 8 9 for (int i = 0; i < BonesLength.Length; i++) 10 { 11 // calculate the desired length of the current bone 12 float desiredLength = (BonesLength[i] / CompleteLength) * remainingLen; 13 14 // check if the arm should stretch or shrink 15 bool stretch = totLength > CompleteLength * CompleteLength; 16 if (stretch) 17 BonesLength[i] += desiredLength; // Stretch 18 else 19 BonesLength[i] -= desiredLength; // Shrink 20 } 21 22 // update complete length variable 23 CompleteLength = BonesLength.Sum(l => l); 24 25 //set everything after root 26 for (int i = 1; i < Positions.Length; i++) 27 Positions[i] = Positions[i - 1] + direction * BonesLength[i - 1]; 28 } Listing 3: The logic that stretches or shrinks the bones. Appendix D 1 private List<(Vector3, int, string)> CalculatePositions() 2 { 3 List<(Vector3, int, string)> pos = new List<(Vector3, int, string)>(); 4 System.Random rnd = new System.Random(); 5 6 foreach (float angleDeg in _anglesInDeg) 7 { 8 float angleRad = angleDeg * Mathf.PI / 180f; 9 10 foreach (float dist in _distances) 11 { 12 float yPos = _height + ((float)rnd.NextDouble() * (2 * _randomHeightOffset) - _randomHeightOffset); 13 pos.Add((new Vector3(dist * -Mathf.Cos(angleRad), yPos, dist * -Mathf. Sin(angleRad)), (int)angleDeg, dist.Equals(_distanceMedium) ? "Medium" : "Long")); 14 } 15 } 16 return pos; 17 } Listing 4: Calculating grabbable positions based on the angles and distances chosen. Appendix E 1 private IEnumerator ShowPreferredMetaphorPanel() 2 { 3 // ask question which metaphor was better 4 _textManager.SetPreferredTechniquePanel(UserStudyData[_currentTrial. TrialName].SubtrialOne.Metaphor, UserStudyData[_currentTrial.TrialName]. SubtrialTwo.Metaphor); 5 6 while(!_textManager.PreferredValueSelected) 7 yield return new WaitForEndOfFrame(); 8 9 UserStudyData[_currentTrial.TrialName].PreferredInteractionMetaphor = 10 _textManager.GetPreferredMetaphorValue() == 1 11 ? UserStudyData[_currentTrial.TrialName].SubtrialOne.Metaphor 12 : UserStudyData[_currentTrial.TrialName].SubtrialTwo.Metaphor; 13 14 SpawnNextSubTrial(); 15 } Listing 5: The ShowPreferredMetaphorPanel function to obtain the user’s preference. Appendix F 1 private bool GetOneShotDirectionValueFromJoystick(InputAction.CallbackContext context, bool left, ref bool joystickPerformed, float threshold) 2 { 3 float xValue = context.ReadValue().x; 4 if (joystickPerformed && Mathf.Abs(xValue) < 0.01f) 5 { 6 // reset joystickPerformed when near the center 7 joystickPerformed = false; 8 return false; 9 } 10 11 if (left && xValue < -threshold && !joystickPerformed) 12 { 13 joystickPerformed = true; 14 return true; 15 } 16 17 if (!left && xValue > threshold && !joystickPerformed) 18 { 19 joystickPerformed = true; 20 return true; 21 } 22 23 return false; 24 } Listing 6: Reading the joystick input and converting it to a “one shot“ boolean value. Appendix G * Erforderlich Dissimilar Avatar Selection Metaphor user study form Demographics Der Wert muss eine Zahl sein. UserID * 1. Woman Man Non-binary Prefer not to disclose Prefer to self describe What is your gender? * 2. Please self describe your gender3. Geben Sie eine Zahl größer als 17 ein. Age * 4. Left Right Both What is your dominant hand * 5. What is your arm length (filled by experimenter) * 6. Please answer the following questions * 7. 0 Never 1 2 3 4 Die Zahl muss zwischen 0 und 10 liegen On a scale of 0–10, 0 being how you felt coming in, 10 is that you want to stop the experiment, where are you now? * 8. Have you experienced virtual reality with a head mounted display? Have you experienced videogames? Condition 1 Left Right Eye Condition tested (filled by experimenter) * 9. Upper Lower Body Condition tested (filled by experimenter) * 10. Die Zahl muss zwischen 0 und 10 liegen On a scale of 0–10, 0 being how you felt coming in, 10 is that you want to stop the experiment, where are you now? * 11. Please read each statement and answer on a 1 to 7 scale indicating how much each statement applied to you during the experiment. There are no right or wrong answers. Please answer spontaneously and intuitively. Scale example: 1–strongly disagree, 4–neither agree nor disagree, 7–strongly agree. * 12. Strongly disagree Disagree Somewhat disagree Neither agree nor disagree Somewhat a It felt like the virtual body was my body. It felt like the virtual body parts were my body parts. The virtual body felt like a human body. It felt like the virtual body belonged to me. The movements of the virtual body felt like they were my movements. I felt like I was controlling the movements of the virtual body. I felt like I was causing the movements of the virtual body. The movements of the virtual body were in sync with my own movements I felt like the form or appearance of my own body had changed. I felt like the weight of my own body had changed. I felt like the size (height) of my own body had changed. I felt like the width of my own body had changed. Please read each statement and answer on a 1 to 7 scale indicating how much each statement applied to you during the experiment. There are no right or wrong answers. Please answer spontaneously and intuitively. Scale example: 1–strongly disagree, 4–neither agree nor disagree, 7–strongly agree. * 13. Strongly disagree Disagree Somewhat disagree Neither agree nor disagree Somewhat a I felt as if my body was located in the center of the virtual body. I felt as if my body was located to the left of the virtual body. I felt as if my body was located to the right of the virtual body. I felt as if my head was located in the center of the virtual body. I felt as if my head was located to the left of the virtual body. I felt as if I my head located to the right of the virtual body. I felt as if my arms were where I saw the upper arms of the virtual body to be. I felt as if my arms were where I saw the lower arms of the virtual body to be. I felt as if my arms were to the left from where I saw the arms of the virtual body to be. I felt as if my arms were to the right from where I saw the arms of the virtual body to be. It was easy to grab the cabbages. Dieser Inhalt wurde von Microsoft weder erstellt noch gebilligt. Die von Ihnen übermittelten Daten werden an den Formulareigentümer gesendet. Microsoft Forms Post experiment questionnaire Die Zahl muss zwischen 0 und 10 liegen On a scale of 0–10, 0 being how you felt coming in, 10 is that you want to stop the experiment, where are you now? * 29. Please answer the following questions *30. Strongly disagree Disagree Somewhat disgree Neither agree or disagree Somewhat a Do you have any additional comments?31. I liked my virtual body. My virtual body was disturbing. It was easy to interact with the Gogo technique. It was easy to interact with the Gaze and Pinch technique.