Tennis Motion Learning in Virtual Reality Rule-based motion analysis and feedback administration for the forehand topspin DIPLOMARBEIT zur Erlangung des akademischen Grades Diplom-Ingenieurin im Rahmen des Studiums Visual Computing eingereicht von Anna Sebernegg, BSc Matrikelnummer 01526184 an der Fakultät für Informatik der Technischen Universität Wien Betreuung: Dr. Peter Kán Wien, 5. März 2025 Anna Sebernegg Peter Kán Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.at Tennis Motion Learning in Virtual Reality Rule-based motion analysis and feedback administration for the forehand topspin DIPLOMA THESIS submitted in partial fulfillment of the requirements for the degree of Diplom-Ingenieurin in Visual Computing by Anna Sebernegg, BSc Registration Number 01526184 to the Faculty of Informatics at the TU Wien Advisor: Dr. Peter Kán Vienna, March 5, 2025 Anna Sebernegg Peter Kán Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.at Erklärung zur Verfassung der Arbeit Anna Sebernegg, BSc Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die verwen- deten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der Arbeit – einschließlich Tabellen, Karten und Abbildungen –, die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter Angabe der Quelle als Entlehnung kenntlich gemacht habe. Ich erkläre weiters, dass ich mich generativer KI-Tools lediglich als Hilfsmittel bedient habe und in der vorliegenden Arbeit mein gestalterischer Einfluss überwiegt. Im Anhang „Übersicht verwendeter Hilfsmittel“ habe ich alle generativen KI-Tools gelistet, die verwendet wurden, und angegeben, wo und wie sie verwendet wurden. Für Textpassagen, die ohne substantielle Änderungen übernommen wurden, haben ich jeweils die von mir formulierten Eingaben (Prompts) und die verwendete IT- Anwendung mit ihrem Produktnamen und Versionsnummer/Datum angegeben. Wien, 5. März 2025 Anna Sebernegg v Danksagung Zuallererst möchte ich mich herzlich bei meinem Betreuer Peter Kán für seine wertvolle Un- terstützung, seine Geduld und das konstruktive Feedback während der Forschungsprojekte und insbesondere im Rahmen dieser Masterarbeit bedanken. Für sein außergewöhnliches Engagement und seine herausragende Betreuung bin ich sehr dankbar. Ein herzlicher Dank gilt auch dem gesamten Team von VR Motion Learning GmbH & Co KG, die mir das Thema der Bewegungsanalyse nähergebracht und mich während des gesamten Projekts unterstützt haben. Ich schätze die tolle Zusammenarbeit sehr. Ich möchte mich bei allen Tennisexperten bedanken, die dieses Projekt mit ihrer Zeit und Expertise unterstützt haben. Ihr Wissen und Feedback waren wesentlich für diese Masterarbeit. Ganz besonders bedanken möchte ich mich bei Dieter Mocker und Peter Hatina (Smashing Suns) sowie Peter Lehrner (House of Tennis) für ihre engagierte Unterstützung und wertvollen Beiträge. Vielen Dank! Ein großes Dankeschön an alle Teilnehmenden der Nutzerstudie, deren Feedback maß- geblich zur Evaluation beigetragen hat. Ein besonderer Dank gilt denjenigen, die an der Pilotstudie teilgenommen und damit bei der Verbesserung des Studiendesigns mitgewirkt haben. Mein persönlicher Dank gilt meiner Familie und meinen Freunden. Besonders dankbar bin ich meinen Eltern für ihre unermüdliche Unterstützung in all meinen Lebensphasen; meinem Partner, der mir stets zur Seite steht; und meinem Opa, dessen Anerkennung und Zuspruch mich auf meinem Weg bestärkt. In liebevoller Erinnerung an Ingeborg Sebernegg, die stets ihr Lachen, ihre Wärme und ihre Herzlichkeit geteilt hat und ein Licht auf meinem Weg bleiben wird. vii Acknowledgements First and foremost, I wish to thank my supervisor, Peter Kán, for his unwavering support, patience, and invaluable feedback throughout my research journey and academic studies. I am incredibly grateful for his dedicated and exceptional mentorship during this thesis. Additionally, I am deeply thankful to the team at VR Motion Learning GmbH & Co KG, who introduced me to the topic of human motion analysis and supported me throughout this project. I truly appreciated the wonderful time and the opportunity to work with them. I would also like to thank all the tennis coaches and players who supported this master’s thesis by sharing their time and expertise. Their knowledge and feedback were central to this work. In particular, I would like to thank Dieter Mocker and Peter Hatina from Smashing Suns, as well as Peter Lehrner from House of Tennis, for their invaluable contributions and support—thank you! Many thanks to all the participants for joining the user study and providing the feedback integral to my evaluation. A special thanks to those who participated in the initial pilot study to refine the study design. My personal thanks go to my family and friends. I am especially grateful to my parents for their endless support through every phase of my life; to my partner, who is always there for me, and to my grandpa, whose appreciation encourages me throughout my journey. In loving memory of Ingeborg Sebernegg, who always spread joy, heartfelt kindness, and love—and who will remain a guiding light on my path. ix Kurzfassung Bei korrekter Ausführung hat Tennis, als internationale Sportart, positive Auswirkungen auf die Gesundheit. Allerdings erhöht eine übermäßige, fehlerhafte Durchführung der Tennistechnik das Risiko von Verletzungen wie beispielsweise einem Tennisarm. Um dem vorzubeugen, sind effektive Trainingsroutinen, Anleitungen zur richtigen Ausführung tennisspezifischer Bewegungen und eine frühzeitige Fehlerkorrektur notwendig. Diese Maßnahmen dienen nicht nur der Verletzungsprävention, sondern tragen auch zur Leis- tungssteigerung bei. Regelmäßiges Tennistraining erfordert jedoch Ambition und wird durch begrenzte Ressourcen wie verfügbare Tennisplätze, qualifizierte Trainer, sowie die verbundenen finanziellen und zeitlichen Mittel erschwert. Daraus ergibt sich ein Bedarf an zugänglichen Trainingslösungen, die kurze, regelmäßige Übungseinheiten ab- seits des Tennisplatzes ermöglichen, motorisches Lernen unterstützen und klassische Trainingsmethoden sinnvoll ergänzen. In dieser Arbeit stellen wir eine ergänzende Methode zum Tennistrainings vor, die ein selbständiges Üben eines korrekten Vorhandschlags in der virtuellen Realität (VR) er- möglicht. Unsere Methode liefert unmittelbar nach jedem Vorhandschlag multimodales Feedback zur Tennistechnik mittels automatisierter, expertenbasierter Bewegungsanalyse. Dabei wird die erfasste Bewegung anhand definierter Bewegungsmerkmale in einzelne Phasen unterteilt und etablierte Trainingsregeln aus dem traditionellen Tennis-Coaching ausgewertet. Basierend auf dieser Analyse werden im Anschluss an den Schlag audi- tives und visuelles Feedback zu einer Trainingsregel präsentiert. Das Feedback liefert zeitnahe Fehlerkorrekturen und adressiert Verbesserungen. Eine visuelle Wiedergabe des Bewegungsablaufs ergänzt das Feedback, um Selbstbeobachtung und Reflexion zu fördern. Wir haben unser VR-basiertes Tennistraining in einer Nutzerstudie mittels Pretest- Posttest-Design (N = 26) evaluiert. Der Vergleich der Leistungsmetriken vor und nach einer 10-minütigen VR-Trainingseinheit zeigt signifikante Verbesserungen in der Ten- nistechnik, was auf einen möglichen kurzfristigen Lerneffekt hindeutet. Zudem hatten Teilnehmenden ein größeres Vertrauen in ihr eigenes Können nach dem VR-Training und ihre Motivation zum Tennisspielen hat sich deutlich gesteigert. Eine qualitative Analyse hebt sowohl die Stärken als auch die Schwächen unserer Methode hervor. Die Teilnehmenden äußerten die Überzeugung, dass unser VR-Tennistraining eine sinnvolle Ergänzung zum traditionellen Coaching darstellen kann. xi Abstract Tennis, as an international sport, has health benefits when performed correctly. However, the overuse of poor technique raises the risk of injuries, such as tennis elbow. Appropriate training routines, instructions on proper technique, and early error correction are crucial steps to injury prevention and beneficial to performance. However, regular tennis training takes effort and motivation and is impeded by limited resources such as available tennis courts, professional coaches, and the associated financial and time commitments. These challenges raise interest in accessible solutions that facilitate regular, short practice sessions, aid motor learning, and effectively complement traditional coaching methods. This thesis presents a novel method for self-training a correct tennis forehand technique through target practice in virtual reality (VR). Our method provides immediate post- action multimodal feedback on a user’s technique by utilizing partial motion tracking of the VR headset and controllers combined with automated, expert-driven motion analysis. Each captured forehand stroke is segmented into individual phases based on defined motion features. Specific aspects of tennis technique associated with these phases are analyzed using coaching rules established in traditional coaching. After each shot, users receive auditory and visual feedback, concentrating on one coaching rule at a time. This feedback delivers timely corrections on identified mistakes and positively reinforces technical improvements. A motion replay accompanies the provided information to facilitate self-monitoring and reflection. Using a within-group pretest-posttest design, we evaluated our VR forehand tennis training in a supervised user study with 26 participants. A comparison of performance metrics measured before and after a 10-minute VR training session revealed significant improvements in participants’ technique, indicating a potential short-term learning effect. Questionnaire responses demonstrate significant improvements in participants’ motivation to play tennis and confidence in their tennis technique from the pre-test to the post-test. A qualitative analysis further highlights both the strengths and limitations of our method. Participants expressed the belief that our approach to VR tennis training can complement traditional coaching. xiii Contents Kurzfassung xi Abstract xiii Contents xv 1 Introduction 1 1.1 Motivation &Problem Statement . . . . . . . . . . . . . . . . . . . . . 2 1.2 Aim of the Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Design Requirements . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Methodological Approach . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.1 Design-Oriented Research Questions . . . . . . . . . . . . . . . 5 1.3.2 Evaluation-Oriented Research Questions . . . . . . . . . . . . . 6 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Background &Related Work 9 2.1 Kinematics of Human Motion . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Motor Skills &Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 Motion Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Motion Capturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 Placement of Sensors and Sources . . . . . . . . . . . . . . . . 13 2.3.2 Motion Capturing Techniques . . . . . . . . . . . . . . . . . . . 14 2.4 Mocap Data, Motion Features, and Feature Extraction . . . . . . . . . 15 2.4.1 Cardinal Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.2 Human Motion Features . . . . . . . . . . . . . . . . . . . . . . 16 2.5 Motion Segmentation into Phases . . . . . . . . . . . . . . . . . . . . . 19 2.6 Analysis of Human Motion . . . . . . . . . . . . . . . . . . . . . . . . 20 2.6.1 Qualitative vs Quantitative Approach . . . . . . . . . . . . . . 20 2.6.2 Quantitative Motion Assessment . . . . . . . . . . . . . . . . . 21 2.6.3 Expert-driven vs Data-driven Methods . . . . . . . . . . . . . . 24 2.7 Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.7.1 Feedback Source . . . . . . . . . . . . . . . . . . . . . . . . . . 28 xv 2.7.2 Feedback Administration . . . . . . . . . . . . . . . . . . . . . 28 2.8 Motor Learning Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.8.1 Motor Learning in Tennis . . . . . . . . . . . . . . . . . . . . . 33 2.8.2 Tennis Training in Virtual Reality . . . . . . . . . . . . . . . . 34 3 Design Guidelines and Considerations 37 3.1 System Setup, Tracking, and Mocap Data . . . . . . . . . . . . . . . . 38 3.2 User-Centered Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3 Optimizations and System Robustness . . . . . . . . . . . . . . . . . . 41 3.4 Assessment, Coaching, and Feedback Design . . . . . . . . . . . . . . . 42 4 Methodology for Tennis Forehand Motion Learning in VR 47 4.1 Tennis ESports as Foundation . . . . . . . . . . . . . . . . . . . . . . . 49 4.2 Design of the Motor Learning System . . . . . . . . . . . . . . . . . . 50 4.2.1 Partial Mocap . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2.2 Tennis Forehand Motion Analysis Method . . . . . . . . . . . . 52 4.2.3 Tennis Forehand Motion Phase Definition . . . . . . . . . . . . 54 4.2.4 Tennis Forehand Coaching Rule Definition . . . . . . . . . . . . 58 4.2.5 Feedback Design . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5 Implementation 65 5.1 Upper Body Motion Capturing &Feature Extraction . . . . . . . . . . 66 5.1.1 Anthropometric Feature Estimation . . . . . . . . . . . . . . . 66 5.1.2 Critical Points and Curvature . . . . . . . . . . . . . . . . . . . 67 5.2 Upper Body Motion Phase Segmentation . . . . . . . . . . . . . . . . 69 5.3 Upper Body Motion Analysis Via Coaching Rules . . . . . . . . . . . . 72 5.3.1 Distance-based Implementation . . . . . . . . . . . . . . . . . . 73 5.4 Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.5 Multimodal Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.5.1 Motion Replay . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.5.2 Verbal Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.5.3 Interactive UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.5.4 Color Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6 Evaluation &Results 83 6.1 Iterative Expert Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 83 6.2 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.2.1 Pilot Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.2.2 User Study Protocol . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.3.1 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . . 87 6.3.2 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . 95 6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7 Conclusion &Future Work 109 7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 7.3.1 Potential Future Improvements and Features . . . . . . . . . . 111 7.3.2 Additional Experiments . . . . . . . . . . . . . . . . . . . . . . 112 A Appendix: User Study 113 A.1 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 A.2 Experimental Procedure and Verbal Instructions . . . . . . . . . . . . 115 Overview of Generative AI Tools Used 119 List of Figures 121 List of Tables 125 List of Algorithms 127 Acronyms 129 Bibliography 131 CHAPTER 1 Introduction Recent advances in motion capture technology and head-mounted displays (HMDs) have laid the foundation for immersive virtual training environments that enable practicing motor skills in controlled and engaging settings [WHP+15, CCL+19, LWMK20]. Virtual reality (VR) enables the simulation of realistic training scenarios, such as varying incoming ball trajectories in tennis, which can be reproduced consistently to help train specific strategies and techniques [LNBRF21, MSL19]. Furthermore, because the information presented is simulated and can be manipulated, mixed and virtual reality can deliver feedback mechanisms that are otherwise impossible in real-world settings [MKM+22, SRRW13, GK22]. Feedback administration via multiple sensory modalities—such as visual, haptic, and temporal cues—is possible and can guide training, as presented by Wu et al. [WPNK21]. Supplemental computer-aided training methods are applicable in many fields, including sports [GK22, CCL+19, WPNK21], rehabilitation [BDZC19, LVX+20], and emergency response training [KRK23, DMSD22]. As a result, research is conducted on motor learning systems across various domains that help users learn movement patterns through practice [CCL+19] and automatic feedback [DMSD22]. Motor learning systems are also of interest to tennis because they have the potential to provide an accessible alternative to on-court tennis training and could complement traditional coaching. While methods tailored to tennis have been proposed [SMS+18, LNBRF21, MKM+22, MAKD17, OIMK18, KGSK24], they have drawbacks that reduce their affordability, effectiveness, and applicability. The proposed methods either do not simulate ball-racket interactions, rely on stationary hardware setups, or are limited in the feedback and guidance they provide. We present a novel method for practicing the modern tennis forehand in VR, utilizing automated motion analysis for immediate post-action multimodal feedback. The VR- based tennis training is structured as variable target practice [Dou05], where the action goal is to strike the incoming ball using a proper modern forehand, imparting topspin, and 1 1. Introduction hitting the marked target on the other side of the net. We present an expert-driven motion analysis method for informative technique assessment. The captured forehand stroke is first segmented into individual phases—ready position, forward swing, backswing, contact point, follow-through, and execution—based on defined motion features. Our method then evaluates each phase according to a set of coaching rules drawn from traditional tennis coaching. After each completed shot, our motor learning system selects a relevant coaching rule based on the analysis results and delivers corresponding verbal and visual feedback, accompanied by a motion replay. This feedback is designed to provide timely corrections on identified mistakes and positively reinforce improvements. Our approach overcomes the limitations of previous methods by utilizing a minimal portable hardware setup and integrating our tennis training methodology into an existing virtual tennis environment that simulates ball physics and enables racket-ball interaction. The resulting VR-based motor learning system runs on a Meta Quest 2 VR headset with two controllers, one of which is mounted on a racket handle, and uses the device’s inside-out tracking to capture the headset’s and controllers’ motion. The exclusive use of the Meta Quest 2 for motion capture poses technical challenges due to the limited amount of tracked joints. However, it enables a minimal hardware setup required to provide an accessible alternative to on-court training and existing self-training setups. 1.1 Motivation & Problem Statement Maintaining regular tennis practice requires considerable effort and motivation [CR07], hindered by limited resources such as qualified coaches [Fed24, p. 49], vacant tennis sites [Fed24, p. 48], and the associated financial and time commitments involved (including travel times, scheduling of training hours, membership fees, and equipment costs). Especially in countries with a cold season like Austria, a deficiency of indoor tennis courts can make regular on-court training difficult all year round. As a result, there is growing interest in tools that offer more flexible, practical, and affordable tennis training to complement traditional practice methods. In related work, virtual reality was applied for tennis self-training, providing a potentially viable alternative to training on the tennis court that combines various training equipment in one solution [SMS+18, LWMK20, JR20, LNBRF21, MKM+22, NSBB23]. However, most methods use simplified ball physics and do not integrate automated motion analysis, which limits their ability to provide guidance and feedback. Unguided self-training for tennis—whether in VR or on a real court—has several challenges that limit its practicality and effectiveness. Without tools for self-monitoring, progress is hard to gauge, and the absence of external encouragement or a competitive element can lead to disengagement [YWW+20, NMT+18]. Moreover, without guidance, tennis players may adopt improper techniques; without feedback, errors may go unnoticed, causing them to become ingrained in muscle memory over time. Improper technique not only hinders performance but can also contribute to injury [RG01]. For example, the follow-through phase of a tennis forehand stroke plays a key role in injury prevention by guiding the player to decelerate the motion gradually. However, unnecessary stress is applied to 2 1.2. Aim of the Work muscles and joints when incorrectly executed by abruptly stopping the stroke motion by sheer muscle strength [KM02, KE04]. Over time, repeated use of poor technique can increase the risk of chronic injuries, in particular in the upper extremities, such as tennis elbow and shoulder injuries [DBW+15]. Therefore, appropriate training routines, early error correction, and instructions on proper technique are crucial elements for performance and injury prevention in tennis training. While virtual training environments have several advantages, they alone cannot address these limitations without built-in mechanisms for automated technique evaluation and feedback administration. Combining motion capture (mocap) technologies, which track and record a user’s move- ments, with motion analysis methods enables the automated assessment of tennis tech- niques. Tailored feedback, progress tracking, and guided instructions can then be provided in VR based on the results and identified errors. We believe that such VR-based motor learning systems have the potential to encourage frequent, short tennis practice and help users refine their physical skills, deepen their knowledge, and reinforce coach-taught techniques in muscle memory. 1.2 Aim of the Work The goal of this thesis is to design, implement, and evaluate a fully automated motion learning methodology for training a correct tennis forehand technique in virtual reality. We aim to provide immediate post-action multimodal feedback on a user’s technique based on partial motion capture via the Meta Quest 2 and an expert-driven motion analysis method. The feedback should highlight areas for improvement, reinforce proper movement patterns, and provide timely corrections to facilitate motor learning. In the future, we envision a tennis motor learning system that can complement traditional tennis training methods. As a step toward realizing this vision, we focus solely on the tennis forehand stroke (specifically, the modern forehand topspin with an eastern grip) to ensure a feasible project scope. 1.2.1 Design Requirements Our objective is to provide a complementary way of learning and practicing a tennis forehand in virtual reality that addresses barriers to regular training in existing methods. We want to utilize the concept of self-training as it offers autonomy and convenience, allowing tennis players to decide when, where, and how to train—independent of others. However, since the lack of guidance and feedback risks improper technique and injury, our main objective in developing this methodology is to reinforce proper movement patterns and provide timely corrections during training via automated motion analysis and tailored feedback. The resulting motor learning system should provide an accessible and flexible approach to short tennis training sessions with the aspiration to facilitate motor learning and encourage frequent practice with an enjoyable and motivating training experience. The applied motion analysis should provide a detailed and informative assessment that 3 1. Introduction can provide helpful insights into a user’s technique. The administered feedback ought to be comprehensible and contain actionable instructions to aid users in improving their performance without causing feedback overload or discouragement. The feedback design should accommodate sensory impairments and learning preferences, as well as provide means for self-assessment and progress monitoring. Taught techniques and provided feedback must not contradict established biomechanical foundations and domain expertise. These lead to our prerequisites of closely aligning our teaching methodology with established coaching principles and feedback strategies and providing a deterministic motion assessment to increase its testability, comparability, and, hopefully, understandability as well as user acceptance and trust. By minimizing hardware requirements and non-training-related tasks (e.g., setup and calibration), we hope to increase the accessibility and convenience of short training sessions. The portable hardware of the Meta Quest helps eliminate the need for a dedicated training space. Additionally, as we utilize VR as the training environment, skill transfer to the real world is necessary to successfully improve tennis skills for real-life tennis scenarios, not just in VR. While skill transferability is not evaluated within the scope of this master’s thesis, precautions that may positively influence it should be taken, such as eliminating mobility barriers and providing specialized equipment like a racket handle to facilitate natural tennis movements. 1.3 Methodological Approach To achieve the objectives outlined above, a central focus of this thesis is developing an automated expert-driven motion analysis method based on the partial motion capture of the Meta Quest 2, as it is integral to the motor learning systems’ functionality and feedback administration. Coaching rules formulated by domain experts define the fundamental guidelines for a proper tennis forehand technique and form the basis for our motion analysis. These coaching rules are already established in traditional coaching and are provided to us by tennis trainers or sourced from existing literature. We aspire to integrate them into our movement analysis framework to ensure our feedback is consistent with established principles of proper tennis forehand techniques. This endeavor aims to create a training methodology that can complement traditional coaching, allowing self-training at home to reinforce the techniques taught by coaches while helping players avoid bad habits by monitoring and guiding their training. Therefore, part of this work’s scope addresses evaluating applicable coaching rules for the tennis forehand with the limitation of having only partial mocap data available. As coaching rules concentrate on technical aspects during specific phases of the forehand stroke, an essential part of this thesis is the automated segmentation of the tennis forehand into upper body motion phases based on defined features. Another important element of our VR-based tennis training is presenting the motion analysis results to the user. A crucial aspect is the feedback design to assist the user in motor learning. The following methodological steps were performed in the Master thesis in order to achieve our outlined objectives: 4 1.3. Methodological Approach 1. Literature research on motor learning systems and coaching principles to iden- tify key considerations and general guidelines for aiding our design process and formulating the teaching methodology. This step also includes gathering knowledge on feedback strategies applicable in VR. 2. Design of a VR-based methodology for training the tennis forehand topspin. This process incorporates the definition of coaching rules and upper-body motion phases of a tennis forehand stroke applicable to our partial motion capture. It also includes the feedback design. 3. Implementation of our motor learning methodology and its integration in an existing virtual tennis training environment realizing our design. The resulting motor learning system comprises the following modules: a) Motion capturing and recording b) Motion feature extraction c) Motion segmentation into individual phases d) Coaching rule evaluation e) Selection of the most relevant coaching rule f) Feedback administration 4. Evaluation of our designed methodology for VR-based tennis forehand training and the corresponding motor learning system as part of a user study to gain insights into the effects, applicability, as well as potential benefits and limitations. The user study follows a within-group pretest-posttest design to compare participants’ responses and measurements taken before and after interventions. We have two distinct sets of research questions: design-oriented research questions, denoted as RQD, that guide the design and implementation of the VR motor learning system, and evaluation-oriented research questions, denoted as RQE , formulated after implementation for evaluation purposes. 1.3.1 Design-Oriented Research Questions During the design of our methodology, we investigated the following research questions: RQD 1 : Motion Phase Segmentation. How can the partial mocap data provided by the Meta Quest 2 be segmented into upper-body tennis motion phases (ready position, backswing, forward swing, contact point, and follow through), ensuring the calculation time of the segmentation allows immediate post-action feedback? RQD 2 : Coaching Rule Evaluation. Given the limitations of partial motion capture, what types of coaching rules can be evaluated automated, and what considerations and adjustments are needed in the selection and evaluation to ensure a valid analysis? RQD 3 : Recommendation and Feedback. How can the motion analysis results be presented to users in a way that is clear, transparent, and helpful, allowing them to identify necessary actions to improve performance and adherence to the coaching rules? 5 1. Introduction 1.3.2 Evaluation-Oriented Research Questions Our user study and hypotheses for quantitative evaluation are designed and formulated according to the following questions: RQE 1 : Feedback Helpfulness & Preference. To what extent do participants per- ceive our seven feedback modalities—motion replay, color coding, auditory feedback, haptic feedback on hit, performance scores, textual feedback with illustrations, and a detailed list of coaching rules—as helpful? Is there a clear preference among these modalities, and what factors influence participants’ preferences? RQE 2 : User Experience and Judgment. How do participants experience our VR training regarding enjoyment, motivation, learning value, usability, accessibility, and overall helpfulness? What specific aspects do users find beneficial or challenging? Do participants believe our VR training can effectively complement traditional tennis training? RQE 3 : Adherence to coaching rules and learning effect. Over a short training ses- sion, does our VR tennis training enhance users’ confidence in performing tennis- related tasks and improve adherence to coaching rules and measurable performance gains, as assessed by pre- and post-test results? Additionally, does the training lead to a perceived learning effect and a subjective sense of progress or skill improvement? RQE 4 : Trust and Alignment. Do participants trust the underlying swing analysis of the VR training and the resulting feedback and scores? Do participants subjectively perceive the motion analysis and feedback as accurate and aligned with their own perception? 1.4 Contributions This thesis contributes to the field of motor learning systems by informing the design of future motor learning systems through our findings from the user study and by providing a comprehensive overview of the fundamental properties and design considerations for developing effective systems. While existing literature discusses various guidelines, these are fragmented across studies and reports, each focusing on specific aspects of system design or particular applications. This work combines and extends upon these insights, offering a detailed collection of design guidelines to serve as a more complete and practical resource for future motor learning system design. Moreover, the main contributions to VR motor learning and immersive sports applications are: VR-based tennis training: A methodology for forehand topspin training in VR with motion analysis and multimodal feedback on a user’s technique. Our tennis training methodology is integrated into the Tennis Esports application and available on the Meta Quest Store1. The published version includes improvements based on the user study results and is extended to the One-Handed Backhand Topspin. 1Meta Store Page for the Tennis Esport DLC “TechZone Forehand Topspin” https://www.meta. com/experiences/techzone-forehand-topspin/584962687016217/ (04/25/2025) 6 1.5. Outline Rule-based motion analysis method for the tennis forehand: Our expert-driven approach demonstrates how traditional coaching principles can be applied in computer-aided motor learning. Specifically, it presents how the concept of coaching rules can be utilized for automated motion analysis and feedback generation based on partial motion capture from the Meta Quest. Prioritization Process for Feedback Administration: We provide an algorithm for diagnosing the most relevant coaching rule and type of feedback for feed- back administration based on a decision tree. Our approach may be applicable to other motor learning systems and can be improved upon in future work. User Study Results: The results suggest a potential motivational and short-term learning effect of our VR-based tennis training, supported by increased participants’ confidence in their technique. Insights from qualitative analysis highlight the strengths and weaknesses of our approach, which may inform the design of future VR-based motor learning systems to complement traditional training effectively. 1.5 Outline The remainder of this thesis is structured as follows: Chapter 2 outlines the fundamentals of motor learning systems, covering key concepts in kinematics, motion capture, and motion analysis techniques, along with possible feedback strategies. This chapter also reviews related work, primarily focusing on tennis training. Chapter 3 outlines general guidelines and important considerations for designing motor learning systems. Derived from these principles, Chapter 4 presents the design concept of our motion learning methodology for tennis forehand training in VR. Here, we look into our rule-based motion analysis method and address the limitations and implications of partial motion capture of the Meta Quest 2, which tracks only three body joints. We define the upper body motion phases—applied to segment each captured forehand stroke—and discuss the applicable coaching rules. The chapter concludes with the design and main components of our feedback administration. Chapter 5 explains the technical aspects and implementation of our proposed motor learning system based on the design and definitions presented in the previous chapter. The evaluation of our method and results are described in Chapter 6. In this chapter, the user study’s design, the participants’ demographic data, and the quantitative and qualitative outcomes of the study are reported. Additionally, it includes a discussion of the advantages and limitations of our tennis training approach and its implementation. Finally, Chapter 7 concludes the thesis by summarizing our findings, identifying limitations, and suggesting future work. 7 CHAPTER 2 Background & Related Work Virtual motor learning applications enable the practice and learning of motion skills without the physical presence of a professional trainer or physiologist by providing a virtual training environment for practicing. Advanced motor learning applications go even further and analyze the user’s movements to give feedback on their performance. Motor learning systems are applicable in many areas, from supporting the rehabilitation process to enhancing the performance of professional athletes in competitive situations to preparing individuals for challenging scenarios such as firefighting. While these different applications require domain-specific considerations in the design process of motor learning systems, they share fundamental commonalities. This chapter addresses the fundamentals of motor learning systems and examines the universal principles that underlie their design. This chapter further reviews related work to identify and evaluate past approaches to the problem. Overall, it provides the background knowledge necessary to understand the design decisions behind our method for learning a tennis forehand motion in VR. 2.1 Kinematics of Human Motion “Kinematics is used to describe motion without regard to the force producing motion.” [Zat98](Zatsiorsky, 1998, p. 3) Kinematics is fundamental to the description and analysis of human motion and is applied excessively in this thesis. Forces generate motion, manifesting in space and time as changes in an object’s position and orientation over time [McG13]. A simple form of motion is the displacement of an object from its initial point to another. Kinematics describes motion by its spatial and temporal components without considering the forces that cause the motion [Zat98, MA10, HKD15]. Kinematic characteristics for describing motion include position, rotation, displacement, speed, velocity, angular velocity, acceleration, and trajectory, which are also critical quantitative measurements for the numerical analysis of motion [McG13]. 9 2. Background & Related Work 1) rectilinear 2) curvilinear 3) angular p1 p2 q2q1 p1 p2 q2q1 p1 p2 q2 q1 Figure 2.1: The first two sketches illustrate the properties of linear motion in its two types: (1) rectilinear motion, which follows a straight path, and (2) curvilinear motion, which follows a curved path. In both motions, points p and q travel the same distance at the same speed, where the lines connecting p and q stay parallel and remain the same length throughout the entire motion. The third sketch (3) illustrates the properties of angular motion. The points p and q rotate at the same angle around the center (axis of rotation) simultaneously. As point p is farther away from the center, it needs to cover a longer distance at the same time than q, which is closer. The lines connecting p and q remain the same length but not parallel [HKD15, McG13]. Human motion refers to the movement of the human body or its many parts—the body’s joints and segments. Due to the interplay of motions across multiple body parts, human movement patterns can become intricate [KM02]. A whole research field is dedicated to analyzing human gait, encompassing only one of many possible human movement pattern categories [McG13, HKD15]. When analyzing complex human motions, it is beneficial to break them into smaller, more manageable parts and examine them separately for simplicity. Human motion can be categorized into linear motion, a translation in a specific direction, and angular motion, a rotation about a single axis. The combination of both is referred to as general motion [HKD15]. Linear Motion. As described by McGinnis [McG13] and Hamill et al. [HKD15], linear motion is a translation along a path in which all points on a moving object move the same distance in the same direction simultaneously. During linear motion, two arbitrary points on the moving object always move in parallel lines, keeping the same distance from each other and moving in the same direction at all times, as depicted in Figure 2.1. Linear motion can only occur if the orientation of the moving object in question does not change. Examples of linear motion are free falling, the path covered when running, or the trajectory of the racket top during a tennis stroke. Linear motion can further be divided into uniform linear motion, where velocity remains constant throughout (i.e., a uniform speed with no acceleration or deceleration), and non-uniform linear motion, where velocity changes due to the acceleration or deceleration of the object. Angular Motion. As described by McGinnis [McG13], angular motion is a rotation in which all points on the moving object move around the same central point or axis in 10 2.2. Motor Skills & Style a circular path, as shown in Figure 2.1. During angular motion, specific points on the object need to move varying distances in the same amount of time due to the changing orientation of the moving object. Points farther away from the axis of rotation need to travel at a higher speed than those closer. The following example, as described by Hamill et al. [HKD15], further illustrates this phenomenon: Imagine a gymnast executing a swing around a high bar, maintaining a fully extended posture throughout the revolution. The gymnast’s feet must cover a much greater distance around the high bar than the hands, which are clasped around the bar and, therefore, nearer to the axis of rotation. In the example above, the high bar is the axis of rotation—a fixed external axis. Angular motion can also occur around an axis within the body, such as an axis running through the joint, for example, when bending our knees, where the legs move about the knee joint, or an axis through the center of gravity when performing a somersault. The points of the object undergoing angular motion themselves produce curvilinear motion. General Motion. As described by McGinnis [McG13], general motion is a combination of linear (translation) and angular (rotation) motion. General motion is common in complex human movement patterns. As human segments (e.g., forearm, tight, or torso) rotate around joints, they undergo angular motion. When multiple segments move in sequence, their combined angular motions can produce linear motion. For example, when throwing a ball, the arm segments rotate around their respective joints, resulting in a linear hand motion when releasing the ball. Another example of general motion is walking. 2.2 Motor Skills & Style Motor skills - “activities or tasks that require voluntary control over movements of the joints and body segments to achieve a goal.” [MA10](Magill and Anderson, 2010, p. 3) As described by Magill and Anderson, human motion encompasses all movements of the human body, including both voluntary actions and involuntary reflexes. On the other hand, motor skills or actions specifically denote learned motor abilities that are voluntarily executed to achieve a specific goal—an action goal. An example of an action goal is to reach the finish line in a marathon, where running is the motor skill. A specific movement pattern can accomplish different goals, while at the same time, many movement patterns can achieve the same goal. Although the action goal of a tennis forehand topspin is always the same—to hit the shot and hopefully score a point—the actual movement and swing pattern can vary significantly due to various aspects, such as environmental factors and a person’s unique movement style. Performing a motor skill effectively and efficiently requires adapting movement patterns to environmental factors and individual needs. This point is relevant for teaching motor skills and the study of motor learning—the acquisition and enhancement of motor skills [MA10]. To optimize the performance or outcome of a motor skill, trainers and trainees must figure 11 2. Background & Related Work out effective movement patterns without risking injuries. One essential part of this is to account for motion style. 2.2.1 Motion Style “Style aspects of movement are personal differences, idiosyncrasies, or actions related to a specific performer.” [KM02](Knudson and Morrison, 2002, p. 80) Motion style affects how motor skills are taught and how motions are analyzed and compared. Each person has a personalized way of moving and a unique body language due to various factors such as individual capabilities and experiences. Movements also depend on one’s energy levels, mood, and state of mind on any given day. Müller and Röder [MR08] refer to these personalized aspects of motion or their emotional expressiveness as motion style. In their paper, they provide an example of different walking motions: Walking can be performed in various ways, such as tiptoeing, marching, or running. Individual characteristics such as injuries might lead to limping. Different moods can also influence the performed motion, and a walk can seem cheerful with lively and animated gestures, sad with slouched shoulders and slower, more subdued movements, or furious with angry stomps. Such movement style aspects are specific enough to identify individuals by their gait [LE04]. Changes in movement patterns and capabilities occur naturally throughout our lifespan [KM02]: Grow spurts in children may amplify strength and coordination. At the same time, movement changes also occur in older adults due to various factors such as joint stiffness, reduced muscle mass, and decreased flexibility, which can impact mobility and balance. However, movement patterns and capabilities depend on more than age. Lifestyle choices, physical activity levels, physical capacities (strength, endurance, flexibility, balance), genetics (gender, anthropometrics), and overall health, to name a few, also significantly influence these aspects [KM02]. As emphasized by Knudson and Morrison, motion style is an essential topic for motor skills teaching and human movement analysis. Physical educators, like tennis coaches, must individualize teaching and feedback according to motion style and individual capabilities. To be effective, the design of practices and training plans should consider factors such as previous injuries, experience, and physiology [KM02]. While style aspects can be utilized to recognize or track individuals automatically, they can present complications for other tasks, such as comparing two movements or classifying action patterns. For some applications, it may be interesting to separate the content and style of the motion. For example, the different variations of a walking action described above can be classified as locomotion regardless of style. 2.3 Motion Capturing “Motion capture is the process of recording a live motion event and translating it into usable mathematical terms by tracking a number of key points in space over time and com- 12 2.3. Motion Capturing bining them to obtain a single three-dimensional (3D) representation of the performance.” [Men11](Menache, 2011, p. 2) To analyze motion, one must observe it. Sensors and motion capture devices enable the digitizing of movements, allowing for systematic and automatic observation of kinematic measures and subsequent motion analysis. Motion capture (mocap) is a process for recording the motion of living beings and objects by tracking specific features and signals to collect motion data [Men11]. It results in a digital approximation of the captured motion [Röd06]. The recorded motion data, such as kinematic measures like joint position in space and time, is called mocap data. Mocap finds widespread use in the entertainment industry, notably for animating computer game characters, movie production, and in virtual reality, where it provides real-time input to animate player avatars and enables natural interaction with the virtual environment. It also serves as the foundation for automatic motion analysis, which finds application, for instance, in biomechanics and sports analysis to understand and optimize athletes’ movement skills [Sch17] and in security to track individuals’ whereabouts based on their gait. The quality and extent of the recorded motion depend on the selected motion capture technology and hardware. Hence, a vast range of mocap systems exist, each tailored to specific objectives across diverse applications in industry and research. The primary differences between mocap systems are their principal capturing techniques, such as optical and non-optical techniques, and sensor placement relative to the motion source [Men11]. Both characteristics are used to categorize mocap systems, although many more aspects and capabilities may be relevant to consumers. Notable factors influencing systems’ ease of use and results include accuracy, frame rate, real-time processing capabilities, setup time, potential manual calibration and post-processing steps, cost, portability, space limitations, and sensitivity to environmental factors such as light [FMS+22]. 2.3.1 Placement of Sensors and Sources Mocap systems consist of two main components, referred to by Menache [Men11] as sensors and sources, terms we also adopt, and can be classified according to their placement. Mocap data sources are features and signals measured to collect motion data. Internal sources are on or within the moving body, including inertia, wearable markers, and natural facial or anatomical landmarks. External sources are found in the surrounding environment, including stationary visual reference points and signals like electromagnetic fields. On the other hand, sensors are devices that collect motion data by measuring sources. Cameras, for instance, are sensors that track visual features. Other examples are Inertial Measurement Units (IMU), such as accelerometers that measure acceleration and gyroscopes that measure orientation and angular velocity. Internal sensors are worn on the body, while external sensors are installed in the environment. As mentioned above, mocap methods can be classified based on the placement of the sensors and sources—namely, outside-in, inside-out, and inside-in systems [Men11], which we describe below. Figure 2.2 illustrates the differences in sensor and source placement for each method. 13 2. Background & Related Work 1) Outside-in 2) Inside-out 3) Inside-in SensorsSources Figure 2.2: Illustration of the sensors and sources placement in outside-in, inside-out, and inside-in motion capture setups. Outside-in. Tracking with outside-in systems uses external sensors that observe the subject’s movements from the outside. The sensors are placed stationary around the perimeter of the tracking volume and view inward toward the moving subject [Men11]. Hence, outside-in systems track from the outside-in. Example: markerless systems based on stationary video cameras such as SIMI Motion1. Inside-out. These systems use internal sensors placed on the moving subject to track outward and collect data from external sources [Men11]. Here, the sensors move with the subject inside the tracking volume, while the sources are located outside—hence, these systems track from the inside out. Example: Meta Quest2 (formerly known as Oculus Quest). Inside-in. These methods rely solely on internal sensors and data sources on the moving subject [Men11]. They are independent of external sources. The sensors are body- worn and track inward to measure sources located on the moving subject. Example: IMUs, gloves, and inertial suits. 2.3.2 Motion Capturing Techniques Technologies for mocap include acoustic, electromagnetic, inertial, optical, mechanical, and time-of-flight systems [Men11, Nog12]—a blend of these principal techniques leads to hybrid solutions like VR setups. Based on the mocap data sources, capturing techniques can be further divided into marker-based methods, in which markers such as reflectors or LEDs are placed on the moving body and serve as sources, and markerless methods [Nog12]. Menache [Men11] and Nogueira [Nog12] provide a more comprehensive overview of the different technologies and their advantages and disadvantages. Venek et al. 1Simi Reality Motion Systems GmbH, Germany. https://www.simi.com 2Reality Labs, Meta Platforms, Inc., United States. https://www.meta.com/quest/ 14 2.4. Mocap Data, Motion Features, and Feature Extraction [VeaKSS22] contribute a scoping review offering insight into sensor technologies applied to evaluate human movement quality in recreational and professional sports. 2.4 Mocap Data, Motion Features, and Feature Extraction Motion capture systems produce high-dimensional time series data. They typically record motion at regular intervals with high frame rates, creating a sequence of frames representing the motion over time. Each frame captures a snapshot of the positions and orientations of multiple joints and body segments, resulting in numerous variables per time point and, consequently, high dimensionality and complexity of raw mocap data [Val16]. The complexity and potential redundancy of raw mocap data pose a challenge for subsequent analysis and make direct processing computationally expensive [LVX+20]. Information in the raw mocap data can even be counterproductive and misleading for certain tasks and must be omitted [Val16]. Thus, introducing a higher-level motion rep- resentation tailored to the subsequent computational task and reducing dimensionality is desirable. Consequently, dimensionality reduction techniques, which exclude unimportant information from the data, and feature engineering, where the most relevant information and higher-level features are derived from raw mocap data, are commonly employed as preprocessing steps before further analysis [Val16, LVX+20]. Human motion features encompass temporal, spatial, kinematic, kinetic, and anthropometric properties and can be categorized as such. Before delving deeper into specific motion features, it is helpful to establish a standardized frame of reference that helps to categorize and describe human movements and is also used to derive motion features: the cardinal planes. 2.4.1 Cardinal Planes “Rotations occur around specific axes of rotation and within specific planes of movement.” [McG13](McGinnis, 2013, p. 180) Movements can be described and categorized with the help of axes. Yaw, pitch, and roll, for example, describe rotations around the respective principal axes of a rigid body, x, y, and z. Any rigid body has three principal axes, which are orthogonal and run through the body’s center of gravity. The principal axes have corresponding planes perpendicular to them that bisect the rigid body into halves. In human anatomy, the principal planes intersecting in the center of gravity are called cardinal planes and provide a frame of reference to locate anatomical structures and to describe relative movements of joints and limbs [McG13]. The three cardinal planes and their respective principal axes are illustrated in Figure 2.3 and are described by McGinnis [McG13] as follows: The cardinal—or principal— transverse plane transects the human body horizontally and divides it into its upper (superior) and lower (inferior) parts. Its corresponding principle axis is the longitudinal axis that runs vertically through the center of gravity. The cardinal frontal plane splits the body vertically into its front (anterior) and back (posterior) regions. Its perpendicular 15 2. Background & Related Work Sagittal Plane Mediolateral Axis Frontal Plane Anteroposterior Axis Transverse Plane Longitudinal Axis Figure 2.3: Cardinal planes and their respective principal axes. axis is the anteroposterior axis. The principal sagittal plane divides the body into equal left and right halves. Medial describes motion toward the sagittal planes along a mediolateral axis; the counterpart is lateral motion away from the plane. An infinite number of axes and planes parallel to the cardinal planes pass through the body and its joints [McG13], called noncardinal planes [HKD15]. Any plane parallel to the cardinal transverse plane is called a transverse plane. The same naming principle applies to the other two cardinal planes and their parallel noncardinal equivalents [Zat02]. “Movement is said to occur in a specific plane if it is actually along that plane or parallel to it.” [HKD15](Hamill et al., 2015, p. 18) Movements originating from rotations about an axis always lie in its corresponding perpendicular plane and can be categorized by it. A pirouette is an example of a transverse plane movement involving the whole body, and flexion and extension are examples of joint actions [McG13]. Although complex human motions are usually not plane-specific, the plane where a movement primarily occurs provides the best viewpoint for video recording or 2D motion analysis [McG13]. For example, the principal components of gait can be viewed and analyzed best in the sagittal plane, even though not all joint actions involved in gait occur in the sagittal plane. Sometimes, the cardinal planes do not provide a vantage point for recording and planar analysis, but another diagonal plane might. McGinnis [McG13] provides an example of a golf swing in a diagonal plane. The works of McGinnis [McG13] and Hamill et al. [HKD15] provide a more detailed discourse on this topic, including the terminology of relative limb movements such as abduction and flexion and illustrations of various joint actions in the cardinal planes. 2.4.2 Human Motion Features Human motion features describe characteristics of human motion and serve as an abstrac- tion of raw mocap data [Val16]. These include lower-level features such as joint angles and trajectories, which are easily extractable from ordinary mocap data but also entail more abstract and interpretable higher-level qualities such as rhythm, smoothness, and ges- 16 2.4. Mocap Data, Motion Features, and Feature Extraction tures. There are various ways to categorize motion features. For instance, Valčík [Val16] proposes the categorization into subject features (i.e., anthropometric properties), pose features (characteristics of a single pose or frame), transition features (transformations and differences between poses), and action features (features of a whole semantic action). Oshita et al. [OIMK18] use a similar classification. Tao et al. [TPD+16] differentiate between low- and high-level features [JGZ+13, TPD+16]. One can also differentiate between quantitative and qualitative features [MRC05] or classify them based on their reference system, e.g., global and local coordinate systems [ASL+19, Val16]. A more detailed summary of motion features can be found in our state-of-the-art report [SKK20]. Below, we provide a more detailed description of anthropometric features, geometric properties, and boolean relations, as we believe these terms are less commonly used compared to others, such as kinematic features, but are important in our implementation. 2.4.2.1 Anthropometric Features Anthropometry measurement of the human body (derived from the Greek word anthropos [ἄνθρωπος]: human, and the suffix -metry to denote the process of measuring, derived from -metria [-μετρία]) [ADDR04]. Anthropometric features are subject-related and are quantitative measurements that describe body dimensions [Val16]. Examples include a person’s height and width, weight, body proportions such as the length of certain bones, and the range of motion of specific joints (i.e., joint rotation limits). Some anthropometric information related to the human bone structure can be extracted or estimated from 3D skeleton mocap data; others, such as weight, need to be measured separately if they are relevant for motion analysis. Subject-specific features are relevant when wanting to take individuality and motion style into account. They play an essential role in sports-related biomechanical analysis, as the ideal form in sports and exercises varies depending on individual body properties [KM02], as explained in Section 2.2. To further illustrate this point, consider a tennis-related example we encountered during our implementation: The optimal contact point for a tennis forehand in many situations is said to be around waist height. However, “waist height” varies depending not only on a player’s current posture but also on their height and the proportions between their upper and lower body. In order to provide feedback on the contact point—as we do not track the waist or pelvis—our method requires estimating anthropometric characteristics related to height and body proportions. Anthropometric features can also be misleading in applications where only the similarity factor between two movements is required. In this case, it is necessary to normalize the 3D skeleton data, as addressed by Vox and Wallhoff [VW18]. 2.4.2.2 Geometric Properties Points and lines in skeletal data models represent body joints and segments. These are geometric primitives from which geometric properties (regarding, e.g., size, shape, posture) can be measured and derived. Examples of geometric properties are joint parameters such as relative positions and rotations [ZLX17]. Simple joint parameters 17 2. Background & Related Work True True TrueFalseFalse False 3) Right foot behind the left?3) Jumping?1) Right arm bending? Figure 2.4: Illustration of boolean relational features. and bone lengths are usually directly tracked during mocap and can be used as features. Other geometric features concern the spatial relationship [FMS+22], such as distance measures between non-adjacent, arbitrary joints or joint angles, which refer to the angles formed between bones at a joint. Distances can also be measured from a plane (e.g., a cardinal plane, a plane depicting the floor) to a joint, referred to as joint-plane distances by Valčík [Val16]. Geometric features are used in various works on motion analysis [ZLX17, YGVG12, TF18, ACF+18]. Particular forms of geometric features are geometric relations, which can be expressed as boolean statements. These are explained below. 2.4.2.3 Boolean Relational Features Relations provide information about relationships between objects. Math examples are inequality (e.g., “X is greater than Y ”) and geometric relations, which describe spatial relationships between geometric primitives such as orthogonality, parallelism, and coplanarity. In motion analysis, this concept is applied in the form of boolean relational features that denote whether or not a specified relationship holds as boolean expressions [MRC05, MR08, TCKL13]. These qualitative features provide a semantic representation of action characteristics and are suitable for logical similarity detection [MRC05]. Examples of boolean relational features are illustrated in Figure 2.4. Müller et al. [MRC05] utilize a boolean geometric feature set for content-based motion indexing and retrieval by encoding whether or not characteristic geometric relations of motion actions—such as “Is one foot in front of the other?”, which is a relation typically arising in walking—occur in a motion. Müller and Röder [MR08] provide the example of kicking motions: while various kicking motions differ in direct numerical comparison, they share common logical features, such as the chronological order of knee bending/stretching- and foot raising/lowering events. These patterns can be effectively represented as Boolean relations, as demonstrated by the authors. 18 2.5. Motion Segmentation into Phases 2.5 Motion Segmentation into Phases “Motion segmentation is the process of identifying the temporal extents of movements of interest, breaking a continuous sequence of movement data into smaller components, termed movement primitives [1], and identifying the segment points, the starting and ending time instants of each movement primitive.” [LKK16](Lin et al., 2016, p. 1) Motion segmentation—also known as Phase Analysis [Lee02]—of time-series refers to the process of identifying the temporal extents, i.e. the start and end frames, that encapsulate a particular motion of interest [LKK16]. It is an optional pre-processing step to motion analysis and serves to break down complex movements into smaller units that can then be analyzed separately. In its simplest form, segmentation is used to remove irrelevant frames at the beginning and end of a recording. A more complex use case is to split a recording into smaller components, such as separating individual repetitions of an exercise, identifying distinct activities within a single recording, or isolating relevant movement patterns within an action [FMS+22]. A concrete example of the latter is the segmentation of a complex skill, such as a tennis forehand stroke, into phases and sub-phases. We divide the upper body movement of a forehand swing into the ready position, the backswing, the forward swing, the contact point, and the follow-through. In this thesis, we use the term ‘motion phases’ to refer to the units into which a skill or complex movement is broken down. In literature, distinct movement phases or phases of a skill are sometimes also referred to as movement primitives [LKK16], critical features, or functional parts of a skill [Lee02]. Main phases have been identified that can be generally applied to skills, for example, preparation, action, and follow through, as described by Lee et al. [Lee02]. However, how detailed motion segmentation needs to be and how phases are defined for a particular analysis depends, among other things, on the analyzed skill, its domain, and the objective of the analysis. Automated motion segmentation is based on the inherent motion characteristics and patterns of mocap data. There are various segmentation approaches that depend on the respective application and requirements. A common approach, also applied in our implementation, is segmentation based on detecting critical points, such as zero crossings or peak values of certain mocap data [FMS+22, KJR13]. To illustrate, let us consider the flight phase of a vertical jump, which we define as the time in which both feet are in the air. The flight phase, therefore, begins as soon as the altitude of both feet is greater than zero and ends in the frame before the altitude of both feet reaches zero again. In this example, the altitude or vertical position of the feet joints can be used as a parameter for segmentation of the flight phase. Other approaches to segmentation include pattern matching algorithms such as the combination of Dynamic Time Warping (DTW) with similarity measures between body joints to detect predefined reference positions and machine learning approaches [FMS+22, KP20]. 19 2. Background & Related Work 2.6 Analysis of Human Motion Human motion analysis is a broad field that encompasses many techniques and appli- cations to understand, teach, and optimize human movement. One area is predictive analysis, in which simulations of human movements are systematically analyzed to explore hypothetical questions and scenarios—such as how specific techniques affect performance— by creating a controlled environment where various theoretical changes can be applied to the model. These simulations can help build a better understanding of biomechanics and contribute to performance optimization in sports [Lee02, SMS+18]. In contrast to predictive analysis, other areas of human motion analysis mostly revolve around observing and analyzing actual movements. Human activity recognition is a notable example, which aims to identify actions and gestures within recorded motion capture data or video footage [FMS+22]. Other examples are methods for detecting motion errors—such as technical flaws—and for assessing the quality of motion. These methods are prominent in physical therapy and sports science, where they serve as a basis for interventions and feedback [VeaKSS22, FMS+22, RF20]. Lastly, motion analysis plays a role in content-based motion retrieval, where recorded or synthesized motion clips in a database are filtered based on a search query or an example motion [MRC05]. This section focuses on motion error detection and motion quality assessment. 2.6.1 Qualitative vs Quantitative Approach In the literature, a distinction is made between qualitative and quantitative approaches for analyzing actual movements [Lee02, KM02]. The traditional approach in coaching and physiotherapy is qualitative motion analysis, as humans can carry it out without additional aids of motion recording or other computer-aided methods [Lee02]. Knudson and Morrison define qualitative analysis “as the systematic observation and introspective judgment of the quality of human movement for the purpose of providing the most appropriate intervention to improve performance” [KM02](Knudson and Morrison, 2002, p. 4, as cited in Knudson and Morrison, 1996, p. 17). This evaluation approach is characterized by its subjective nature, relying on the analyst’s interpretation and judgment of motion quality rather than precise measurements. However, it follows systematic methods to identify strengths and weaknesses in movement execution and to draw conclusions about motion quality. Furthermore, it requires interdisciplinary and vast knowledge and experience about the performed action and its underlying biomechanical principles from the human analyst, as the core method of qualitative analysis involves comparing the observed performance (or motion phases thereof) to an ideal model of good form—often a mental or conceptual model. The aim is to identify discrepancies or errors in the execution of the movement before making a diagnosis, which can lead to interventions and feedback [KM02]. In their book “Qualitative analysis of human movement”, Knudson and Morrison provide more insight into this topic and describe their four-task model of qualitative analysis (depicted in Figure 2.5) that involves preparation, observation, evaluation with diagnosis, and finally, intervention, which has the same structure as our quantitative methodology. 20 2.6. Analysis of Human Motion Evaluation & Diagnosis Evaluating the performed activity, followed by diagnosis to prioritize intervention. Preparation Gathering task-specific knowledge (movement goal, training strategies, information about trainee, etc.). Systematically gathering, organizing, and interpreting sensory information about the performed activity. Observation Guidance and administration of feedback through an appropriate feedback strategy. Intervention Figure 2.5: The four-task model of qualitative motion analysis, as presented by Knudson and Morrison [KM02]. In contrast, quantitative analysis is based on quantified data and relies on the collection of numerical data and precise measurements of movement characteristics such as joint parameters [Lee02, KM02]. Quantified motion data has been made more accessible by the development of mocap technologies, and as they become more affordable and convenient, quantitative analysis is becoming more feasible [Lee02]. While quantitative analysis is generally considered objective, there can be some subjectivity in decision- making processes, for instance, due to the placement of measuring devices or sensors, as mentioned by Knudson and Morrison [KM02], or due to the manual data collection and labeling process as noted by Venek et al. [VeaKSS22]. Subjectivity may also be involved in several steps of automated quantitative analysis methods. Key examples include the manual selection of features and thresholds, the definition of optimal movement or performance standards such as coaching rules, and the process of labeling data. 2.6.2 Quantitative Motion Assessment Quantitative assessment of human motion quality “aims at quantifying the motion quality from a functional point of view by assessing its deviation from an established model.” [TPD+16](Tao et al., 2016, p. 136) In this thesis, we will collectively refer to evaluating human motion quality in terms of outcome and technique using computational techniques on numerical mocap data as “quan- titative motion assessment”. Many other terms are used in the literature, but their gist is mostly the same, although approaches to this task vary [FMS+22, TPD+16, VeaKSS22]. How motion quality is quantified is domain and application-specific; various aspects such as efficiency, accuracy, agility, balance, replication, rule compliance, and correctness can 21 2. Background & Related Work be taken into consideration for this purpose. What all approaches have in common, however, is the need for a ground truth to establish the ‘perfect’ quality of movement to strive for [VeaKSS22, SPB+20, GDF23]. For example, the ground truth could be either an ‘optimal’ execution performed by an expert—serving as an ideal template to replicate—or a set of defined rules that must be adhered to. More on data-driven and expert-driven realizations in Section 2.6.3. Within the context of motor learning systems, the results of quantitative motion assessment are utilized to fabricate feedback and monitor performance improvements. As discussed in Chapter 3, the assessment must provide sufficient relevant insights to support the feedback that the user needs to receive while using the system. The following three methods are assessment approaches commonly found in motor learning systems—namely, the correctness assessment of per- formed actions, an overall performance score that evaluates how well the movement was executed, and the identification of specific error patterns in the motion. Action Correctness Classification (Binary Classifier). A binary classifier can evaluate whether a movement was performed correctly by categorizing it into one of two classes: correct or incorrect [LVX+20]; or, in the case of motion abnormality detection, as normal or abnormal [TPD+16]. Complexity in this approach lies in the definition of correct versus incorrect motions, as—depending on the movement intricacy—the action may have a wide range of possible motion errors [TPD+16], while correct movements may be performed with individual motion styles that do not lead to fundamentally wrong or problematic movement patterns in terms of injuries [KM02]. The challenge is to cover this wide range of correct and incorrect movement patterns with the classifier. Some approaches do not require any prior knowledge of possible motion errors [TPD+16], for example, data-driven approaches that evaluate the degree of deviation from the performed motion to a pre-recorded ‘correct’ reference motion (or a more complex model derived from a collection of motions) and then determining whether the extent of the deviation falls within an acceptable range. However, these approaches still need to capture a wide variety of correct motions. While binary classifiers provide assessments of action correctness that are straightforward to interpret, their single outputs lack nuance and do not supply detailed insights into what was done incorrectly and how the motion can be improved. Furthermore, as Liao et al. [LVX+20] point out, binary classifiers for action correctness are limited in their ability to capture continuous variations in movement quality. A movement is labeled as either correct or incorrect, with no middle ground. Consequently, any gradual performance improvement is not reflected in the output. Overall Performance Score. Another way to evaluate action correctness is by assessing an overall performance score that captures not only correct and incorrect motions but also nuances in between. The outputted performance score quantifies motion quality by mapping the continuum between a correct and incorrect motion to a numerical range (typically a continuous range of [0,1] or [0,100]) or ordinal values (e.g., poor, moderate, ideal—or for an error pattern: inadequate, within the desirable range, excessive [KM02]) [FMS+22, LVX+20]. To illustrate the former, let’s consider a 22 2.6. Analysis of Human Motion scoring function that rates archery shots based on their outcome—or, more precisely, their accuracy. The function maps the distance from the arrow to the bullseye to a performance score ranging from 0.0 to 1.0, whereby a perfect shot into the bullseye is depicted by 1.0, while 0.0 portrays a missed shot that went off the target. Scores between 0.0 and 1.0 reflect varying levels of accuracy—neither perfect nor terrible—with closer shots yielding higher scores; thereby, this scoring function represents a smooth progression from poor to ideal performance. As illustrated above, score functions can be used to map outcome measures or other metrics, such as the deviation from a desired template, to a more interpretable and comparable form that can be presented to the users which summarizes their performance quality [LVX+20, HKB17, TPD+16]. Ranked-based classifiers are another approach that ranks the performance as ordinal values with richer detail than just correct vs incorrect [FMS+22]. Performance scores provide a more comprehensive assessment of an individual’s movement quality than a binary classifier and enable progress tracking. However, overall performance scores also do not give information about motion errors, their extent, or their causes, which is necessary to provide corrective feedback [HGH+18, HKB17, FMS+22]. Error Pattern Classification. In order to provide corrective feedback, it is not sufficient to just determine that a motion error occurred; the type of error must also be classified [HKB17, ZLER14]. An example of corrective feedback in many ball-related sports is “Watch the ball!” or something along these lines, as this is necessary for timely and proper reactions to an incoming ball and for aim [KM02, IAR+23]. To provide this kind of feedback in an automated system, the system needs to first recognize whether the user focused on the ball during the action, as well as determine if a detected error pattern is severe enough to warrant feedback. For our example, the respective error pattern that the system must detect could be labeled “Insufficient ball focus”. What ‘insufficient’ in this context means must be defined or learned. Multiple error patterns can occur simultaneously in a motion [TAHK12], which might or might not be independent (e.g., follow-up errors). Approaches to error pattern classification categorize motion in one—usually the most probable or severe error—or several error categories [FMS+22]. Hülsmann et al. [HGH+18], for example, use a binary classifier for each specified error pattern that distinguishes between ‘error occurs’ and ‘error does not occur’. Taylor et al. [TAHK12] use a multi-label classification as well. Both papers present data-driven approaches, but classification can also be expert-driven and based on a set of predefined rules, as presented by Zhao et al. [ZLER14] and De Kok et al. [DKHH+15]. In-depth Motion Assessment. The above-described assessments can be combined to provide a more in-depth motion assessment [FMS+22, TPD+16]. An example is a multi-label classification of error patterns and the presentation of a performance score for each of them. In the rule-based approaches presented by Hülsman et al. [HFS+16] and De Kok et al. [DKHH+15], explicit quantitative error values are provided for error patterns. However, quantitative motion assessment is not limited to these methods, and more detailed biomechanical insights and performance-related metrics can be presented 23 2. Background & Related Work Table 2.1: Performance outcome and production measures in the tennis context based on examples provided by Magill and Anderson [MA10]. Outcome Measures Examples Reaction time how long it takes a tennis player to initiate the swing after the ball is shot Number of (un-)successful attempt how often a tennis player missed the ball or shot out- side a target Number of trials how many repetitions were necessary until the ball reached the desired speed Consistency a player’s ability to repeatedly perform successful shots over time Accuracy how far away a shot was from the target Production Measures Examples Amount of errors during motion the stance is X cm narrower than tolerated, as in the ready position, the stance of a tennis player should be approx. shoulder-width or slightly wider [KE04] Absence of a motion phase tennis player did not follow through Velocity racket velocity at the contact point Trajectory pattern of the swing path Joint angle degree of knee bend during the stance to the user. One possibility is to provide multiple motor performance metrics to gain a deeper understanding of progress and assessment of learning. Therefore, one relevant problem of motion analysis, especially in motor learning systems, is how to measure specific aspects of motor performance. Magill and Anderson [MA10] classify motor performance measures into two main groups: Performance outcome measures, which are outcome-related, and performance production measures, which are related to the motion itself. In their book, the authors give many examples of performance measures, some of which are paraphrased in Table 2.1 and extended with tennis-related illustrations. Other useful metrics described by Magill and Anderson are error measures such as absolute error, constant error, and variable error [MA10]. 2.6.3 Expert-driven vs Data-driven Methods There are three main ways to conduct quantitative motion assessment: data-driven, expert-driven, or a hybrid approach. While data-driven approaches rely on datasets of historical or exemplar data to interpret and analyze the motion, expert-driven approaches assess the motion based on the direct encoding of expert knowledge. A paper that compares a data-driven approach with an expert-driven approach is written by Richter et al. [RWHH19]. Frangoudes et al. [FMS+22] present machine learning methods (data- driven) for quality assessment. A list of advantages and disadvantages of expert-driven and data-driven methods can be found in Table 2.2. It must be noted that this list is not exhaustive, and some points might not apply to all implementations. 24 2.6. Analysis of Human Motion Table 2.2: Non-exhaustive list of advantages and disadvantages of expert- and data-driven approaches based on the following literature: [ZLER14, HGH+18, DKHH+15, GDF23, RWHH19, LVX+20, SPB+20, FMS+22]. Method Advantages Disadvantages Expert-Driven + Lightweight + Deterministic Results + Accessible In-depth Analysis + Interpretability − Manual Work − Bad Generalization Data-Driven + Generalizability + Handle Complexity + no prior knowledge − Data Bias − Dataset Generation − Lack of Transparency / In- terpretability Challenges − Computational Resource In- tensive 2.6.3.1 Expert-driven Methods Expert-driven approaches are based on the explicit encoding of human expertise [SPB+20]. The knowledge of domain experts must be captured and encoded—often in the form of decision rules represented as logical IF-THEN statements, in addition to threshold values—which are then used to interpret input data, create assessments, or make decisions based on how the input matches the conditions set by the rules [GDF23, ÖS20, SPB+20]. Methods described as rule-based [ZLER14], algorithmic [ZLER14], or knowledge-based [SPB+20] all fall into the category of expert-driven approaches. Examples of rule definitions can be found in several written works [ÖS20, GDF23, ZLER14, DKHH+15, KM02]. Returning briefly to the example of an ideal contact point of a tennis forehand introduced in Section 2.4.2.1, a simplified rule-based classifier for it could have the following form: Rule := IF contact point cp is around waist height THEN correct ELSE incorrect. To express around mathematically, one could choose a symmetric tolerance t and implement the rule as equation 2.1. What is more, this precise definition of a correctness rule—or, in other cases, error patterns—allows the calculation of the exact motion error, as shown in equation 2.2. Finally, a general correctness rule 2.3 can be expressed that takes any motion error and returns the classification label. Rule(cp, waist, t) = {︄ 1 if waisty − t ≤ cpy ≤ waisty + t 0 otherwise (2.1) 25 2. Background & Related Work tmin = waisty − t tmax = waisty + t error = Error(cp, tmin, tmax) =   0 if tmin ≤ cpy ≤ tmax cpy − tmin if cpy < tmin cpy − tmax if cpy > tmax (2.2) Rule(error) = {︄ 1 if error = 0 0 otherwise (2.3) A set of rules can be defined as a ground truth describing the ideal movement to analyze action correctness. A recorded motion can then be compared against these [ZLER14]. However, one would need an exhaustive and accurate set of rules to determine with absolute certainty that a motion is correct. Conversely, it is usually sufficient for motion error classification to define smaller, non-exhaustive sets containing key error patterns [ZLER14]. In the literature, expert-driven approaches are mainly applied to detect specific motion errors in order to provide more specific feedback [ZLER14, DKHH+15, HFS+16, ÖS20]. Expert-driven approaches have considerable advantages, as outlined in Table 2.2. Al- gorithmic approaches are usually lightweight and computationally efficient compared to some data-driven methods, such as similarity modeling, as they bypass expensive steps like scaling or motion synchronization (e.g., DTW), enabling real-time process- ing [ZLER14, HGH+18]. This real-time capability is demonstrated by De Kok et al. [DKHH+15]. Many expert-driven systems rely on precise calculations of motion errors or make such calculations accessible through well-defined metrics, as exemplified in the equations above. These error values enable detailed analysis and provide more specific feedback. In our example, a system could inform the user not only that the contact point was sub-optimal but also whether it should be adjusted upward or downward in the next trial and by how much. Moreover, the results of rule-based approaches are deterministic, making them robust and predictable [HGH+18, GDF23]. This allows rule-based approaches to be used as ground truth to benchmark data-driven approaches, as done by Richter et al. [RWHH19] and Hülsmann et al. [HGH+18]. Finally, the deterministic nature and the precise definition of rules by experts make results more comprehensible and improve the interpretability of expert-driven systems. A notable disadvantage of expert-driven approaches is the manual work required to realize them. Experts must precisely define rules. While some rules have been documented in the liter- ature, they still need to be encoded, and thresholds and tolerances must be determined or learned [ZLER14, RWHH19, HGH+18]. Moreover, these rules are often specific to particular contexts or exercises [LVX+20] and hard-coded into applications, which limits generalization and extensibility [ZLER14]. Zhao et al. [ZLER14] propose an approach to address this limitation. 26 2.7. Feedback 2.6.3.2 Data-driven Methods Data-driven approaches rely on data to derive knowledge from [GDF23]. This includes many machine-learning methods that learn from large datasets and pattern-matching approaches that directly compare the observed motion to a pre-recorded reference motion or to a reference model that is derived from multiple pre-recorded motions [FMS+22, RWHH19]. Methods described as template-based [FMS+22], similarity models [SKK20], distance function-based [LVX+20], or non-knowledge based [SPB+20] all fall into the category of data-driven approaches. A survey on machine-learning approaches for human motion assessment in the context of exercises is presented by Frangoudes et al. [FMS+22]. Since data-driven methods are based on exemplar data [ZLER14], they do not require precise definitions of optimal motions or error patterns. However, this also means that the quality and accuracy of their results and validation depend heavily on the underlying dataset [FMS+22], requiring it to be carefully collected and labeled to ensure that it is representative. The creation of datasets can be particularly challenging if the method requires sample data from error patterns, as “Most abnormalities are rare and difficult to capture during training.” [TPD+16](Tao et al., 2016, p. 139) Moreover, biases embedded in data directly affect the outcome of the methods, which introduces problems, especially if datasets are unbalanced, too small, or non-representative of the problem statement and target group [GDF23, FMS+22]. Although expert-driven methods are also subject to biases, e.g., due to the subjectivity of individual experts, we believe that these are easier to recognize or evaluate than those hidden in large data sets. Nevertheless, it is important to be aware of biases if either method is used. Data-driven methods themselves are typically not exercise or domain-specific (e.g., DTW and distance function-based approaches are used by many motion analysis applications [LVX+20, IHK+18, KGSK24], only the datasets are, making data-driven methods more generalizable and easier applicable to other domains compared to expert-driven approaches [LVX+20]. Lastly, some data-driven approaches are not deterministic, making them unpredictable and thus less verifiable, intuitive, and transparent [GDF23]. The logic underlying the decision-making process might be less intuitive than expert-driven systems that apply logical reasoning. Especially black-box models lack transparency and intuitiveness [SPB+20]. 2.7 Feedback Once the motion is analyzed, the system must effectively present the information found to the user to guide training, correct errors, prevent injuries, and refine skills [HGH+18]. Feedback is the key to this, providing information about the action or its result that is fed back to the actor either during or after the action [SW00, MA10]. The function of feedback can be versatile. It can serve as information, motivation, reinforcement, and guidance [SW00]. Moreover, it is a tool that can be used to focus the trainee’s attention or direct the behavior and help trainees grasp the adjustments necessary to go from their actual performance to the desired outcome [MA10, SW00]. This subsection provides 27 2. Background & Related Work an overview of feedback types and their administration, while Chapter 3 contains some design recommendations. To get a deeper insight into feedback and its effects in the context of motor learning, as well as a greater collection of design guidelines, we refer to the books written by Magill and Anderson [MA10], Schmidt and Wrisberg [SW00], and Knudson and Morrison [KM02], as well as the review of Sigrist et al. [SRRW13]. 2.7.1 Feedback Source Feedback can be broadly classified as intrinsic and extrinsic—this classification is based on the source of the feedback [SW00]. In short, intrinsic feedback originates from the performer’s sensory systems, while extrinsic feedback is provided by external sources such as a coach or training device. Intrinsic feedback. As described by Schmidt and Wrisberg [SW00], intrinsic feedback refers to sensory information inherent to the action, which individuals can perceive directly and in real-time during or after the performance. It includes the physical sensation experienced during the motion, such as muscle tension or forces, and other sensory cues generated by the action, like visual or auditory effects. For example, a tennis player can sense various intrinsic feedback during a forehand stroke, like the sound and sensation of the racket moving through the air, the forces and sound created upon impact between the racket and ball, and the visual change of the ball’s flight. Extrinsic feedback. On the other hand, extrinsic feedback is information provided by an external source [SW00]. It is also known as augmented feedback, as it is presented in addition to intrinsic feedback to enhance or convey additional information [MA10]. The outside source that administers the feedback can be an individual (like a coach, judge, therapist, or teammate) who observes the motion. It can also be provided through technology, such as a smartwatch, video replay, or a motor learning system. As Schmidt and Wrisberg point out, extrinsic feedback is under the control of the one administering it and can, therefore, be designed to accommodate aspects such as a trainee’s learning or motion style [SW00]. Examples of extrinsic feedback administered to a tennis player are displaying ball spin and speed, calling whether the shot was in or out, replaying the swing, praising posture, correcting the swing path, and so on. 2.7.2 Feedback Administration The administration of extrinsic feedback is an integral part of motor learning systems, and its design is a multi-layered consideration. Brennan et al. [BDZC19] describe four main components of feedback that influence motor learning: content, mode, timing, and frequency. We use the same distinction. An overview is provided in Figure 2.6. The content of feedback and how it is administered can convey different messages and direct the trainee’s attention, behavior, and emotion [SW00]. Feedback is a powerful tool, and its delivery can affect the learning progress; but not necessarily in a good way. Feedback can also hinder learning and have adverse effects instead of facilitating it [MA10]. 28 2.7. Feedback Mode Visual Haptic combining modalities Multimodal Auditory Content KR | KP descriptive prescriptive Informative Motivational Role: Type: Frequency during the action Concurrent after the action Terminal Aggregated after a set of attempts Fading decreasing over time Constant at a fixed rate Bandwidth tolerance dependent Timing Figure 2.6: Overview of the components of extrinsic feedback, with Knowledge of Results (KR) referring to the outcome of an action and Knowledge of Performance (KP) referring to the technique and quality of the movement. Inappropriate, incorrect, or too much intervention can all affect users negatively, such as evoking frustration, so the feedback design should be well thought out [BDZC19, KM02]. 2.7.2.1 Content The content of extrinsic feedback is the actual information conveyed, which can relate to the result of an action or the performance itself. Extrinsic feedback can, therefore, be divided into two types: Knowledge of Results (KR) and Knowledge of Performance (KP) [MA10, SW00, KM02]. This distinction is related to the categorization of performance measures made by [MA10] mentioned in the previous section. KR always refers to the outcome of an action by either describing the results, say the spin and speed of the outgoing ball in a tennis shot, or it can indicate whether the performance goal has been achieved, for example, whether the ball hit the intended target. On the other hand, KP provides information on the technique and quality of the performance leading to that outcome. Feedback can play various roles by being informative, motivating, reinforcing, or guiding when the content is adjusted accordingly [MA10, SW00]. Informative and Guiding Role. As explained by Schmidt and Wrisberg [SW00], the purpose of information feedback is, as the name suggests, to inform the trainee about their performance in order to aid the learning process and to help adjust and refine their motion. Information feedback can help trainees understand their current level by pointing out strengths and weaknesses. By explaining how to achieve a desired goal or correct errors, information feedback provides instructions that the user can follow and guides their next attempt. Additionally, Schmidt and Wrisberg distinguish between descriptive and prescriptive feedback: Describing the errors made during a performance is referred to as descriptive feedback [SW00]. Addressing mistakes in feedback draws attention to specific performance issues. Examples include visually highlighting deviations from the desired outcome in a replay, as done by Ikeda et al. [IHK+18] and others [WNPK20, HGH+18], or giving verbal descriptions like “you initiated your swing too early”. However, as pointed out by Magill and Anderson [MA10], a limitation of descriptive feedback is that trainees must already understand how to correct their 29 2. Background & Related Work movement for it to guide adjustments in subsequent attempts effectively. Therefore, providing additional information on how to correct errors is essential if this knowledge is lacking and if the goal is to help trainees improve in subsequent attempts [KM02, MA10]. In contrast to descriptive feedback, prescriptive feedback refers to feedback that not only identifies errors but also provides specific instructions on how to correct them. In other words, it prescribes a solution. This type of feedback is intended to guide the trainee by offering actionable advice on improving their performance. An example is “Initiate you swing as soon as the ball is shot” [SW00]. Verbal cues or short phrases such as “focus on the ball” can be established by a coach to direct attention [KM02, MA10]. Motivational Feedback and Positive Reinforcement. As stated by Schmidt and Wrisberg, motivation is dependent on the person’s perceptions of success. By informing users about their progress, highlighting achievements, and praising or emphasizing when they improve or implement instructions correctly, feedback takes on a mainly motivational role and can encourage continued effort [SW00]. On the contrary, if feedback on progress, strengths, and correct motions is omitted, this can have a demotivating or frustrating effect, as discussed by De Kok et al., where “participants complained that they were not informed about getting better, and thus lost motivation to try and improve” [DKHH+15](De Kok et al., 2015, p. 361). In their case, users did not know whether they were making progress or if their adjustments were correct, thereby feeling increasingly frustrated. Another role of encouraging and validating feedback is positive reinforcement, which strengthens the likelihood of a behavior being repeated [SW00]. Lastly, visual feedback can enhance the immersion in an otherwise dull task, making it more engaging and fun [FMS+22, BDZC19]. 2.7.2.2 Mode Feedback can be delivered through various sensory modalities, including visual, auditory, tactile, and proprioceptive channels [BDZC19, SW00]. In traditional coaching, feedback is often provided verbally via spoken instructions, but nonverbal methods, particularly visual feedback (e.g., through gestures or facial expressions), are also used [SW00]. A popular form of visual feedback is video replay, which allows trainees to observe their own movements while coaches highlight areas of strength and potential improvement [MA10, SW00]. When a coach physically adjusts a trainee’s posture, the feedback addresses tactile and proprioceptive sensory modalities, allowing the trainee to feel the correction through the coach’s touch and changes in position and movement of muscles and joints. Below is a small list of feedback strategies that address different sensory modalities. When feedback addresses a combination of sensory modalities, it is termed multimodal feedback [SRRW13]. Visual. Feedback provided through visual means, which includes text, replays, demon- strations, visual representation of deviation to correct motion, highlighted areas of motor errors, color coding, gestures, and facial expressions [SW00]. 30 2.7. Feedback Auditory. Feedback delivered through sound, either through verbal components or sonification, which includes attention cueing, spoken feedback, a metronome, and audio tones in response to something [MA10, KM02]. Haptic. This refers to feedback delivered through tactile sensations, such as vibrations or pressure [SRRW13]. 2.7.2.3 Timing Another component of feedback is its timing. Feedback can be categorized as concurrent, terminal, or aggregated based on when it is delivered in relation to the action performed [SRRW13, SW00, MA10]: Concurrent. Concurrent feedback (also known as real-time, simultaneous [BDZC19], or online feedback) is provided during the action, for examplee, at the moment of error occurrence [SRRW13]. Terminal. Terminal feedback, on the other hand, is provided after an action is executed and can be given immediately after the movement is completed or after a short delay [SRRW13, SW00]. Aggregated. Feedback can also be given in an aggregated form after a series of attempts or actions, rather than after each individual attempt. Schmidt and Wrisberg [SW00] discuss two ways of aggregating feedback. The first is summary feedback, where details on each attempt are presented concisely, such as graphs. The second is average feedback, where performance metrics are averaged over all attempts, and an overall evaluation is provided. Aggregated feedback can be presented at the end of a training session in addition to timely feedback to provide a broader overview of a trainee’s performance. It can also be the sole feedback in order not to interrupt a training session if no correction for injury prevention is warranted. 2.7.2.4 Frequency The last component covered is the frequency of feedback administration, which determines how often and how regular feedback is given. The most straightforward approach is to present feedback at a constant rate, after every attempt or selectively after certain attempts [SW00]. However, as feedback can create a dependency that causes learners to rely on it, excessive frequency may negatively impact long-term learning outcomes; to counteract this, methods that reduce the frequency of feedback are recommended [SRRW13, SW00, MA10]. One approach is to apply a fading frequency, where feedback is decreased over time as skill improves [SRRW13]. Another performance-based approach is bandwidth feedback, which is only provided when errors or performance metrics fall outside a predefined tolerance range or bandwidth [SW00]. Additionally, one can offer a customizable frequency, letting the trainees decide when to receive feedback [MA10]. 31 2. Background & Related Work 2.8 Motor Learning Systems Motor learning systems provide physical training where users learn movement patterns through practice [CCL+19] and automatic feedback [DMSD22]. They aim to provide complementary or alternative solutions to training methods that require the physical presence of a professional trainer to train, analyze, and provide feedback, with the goal of reducing the trainer’s/trainee’s workload or increasing efficiency in terms of learning time and quality in movement technique compared to existing training methods [RF20]. Designing a motor learning system is an extensive task. We, therefore, collected and expanded on design guidelines and considerations in Chapter 3. Examples of remote motor learning applications include martial arts, strength training, and physical exercise for rehabilitation. “ImmerTai” is a motion training system for Chinese Taichi described by Chen et al. [CCL+19] in which users learn a technique by imitating an expert within an immersive and collaborative environment. No direct feedback on the user’s performance is given. A Microsoft Kinect captures the user’s motion, allowing automatic assessment by calculating the similarity between the user’s movement and the prerecorded expert. The paper compares three motion assessment methods and evaluates the students’ learning outcomes on three platforms (PC, HMD, and Cave Automatic Virtual Environment (CAVE)), whereby Chen et al. used the resulting quality scores to monitor users’ performance. By utilizing a customized CAVE, their system accelerates the training time a user requires to memorize a motion and increases motion quality compared to using a non-immersive PC for training. Hülsmann et al. [HGH+18] propose a data-driven pipeline for detecting motor errors during strength training, focusing on squats and Tai Chi pushes in a VR coaching environment. Their system effectively classifies errors and provides real-time feedback, including predefined verbal cues and automatically generated visual augmentations. A rule-based assistance system for strength training that tracks exercises via the Kinect V2 and provides textual feedback to the user is presented by Örücü and Selek [ÖS20]. Other rule-based systems include the real-time assessment for rehabilitation exercises proposed by Zhao et al. [ZREL17] and an exergame for guiding seniors in exercising at home, as described by Fernandez-Cervantes et al. [FCNH+18]. Additionally, a concept for an assistance system for therapy exercises is proposed in a paper written by Richter et al. [RWA+17]. In a follow-up paper, Richter et al. [RWHH19] compare a data-driven with an expert-driven approach for this assistance system. Motor learning systems are also applied to other sports with more specific requirements, such as environmental dependencies in the case of alpine skiing, by using specialized equipment. For instance, Wu et al. [WNPK20] employ an indoor ski stand for a VR-based ski training simulator, in which users learn motion patterns from professional skiers by mimicking their prerecorded motions. An HTC Vive Pro and two VR trackers are utilized for partial motion capture. Additional visual feedback on the quality of the student’s motion (based on ankle rotation and lateral movement) supports the training process. In their paper, Wu et al. present six variants to aid learning visually, including 32 2.8. Motor Learning Systems visual feedback on the differences in the user’s and expert’s motion. Other papers on the same skiing system are written by Zhang et al. [ZWK21] (mainly focuses on the replay visualization), Hoffard et al. [HZW+22] (mainly focuses on the replay visualization and comparison between visual and haptic feedback), and Matsumoto et al. [MWK22] (focuses on time-distortion effects to supporting ski training, e.g., by slowing time or applying other temporal modifications, such as dynamically changing the simulation speed based on motion comparison). Motor learning systems are not limited to sports. Di Mitri et al. [DMSD22] provide information about the data-driven architecture and evaluation of the “CPR Tutor”, a tutoring system for the medical field in which the lifesaving technique Cardiopulmonary Resuscitation (CPR) is trained through real-time auditory feedback. The student’s motion is captured via a Microsoft Kinect v2 and a Myo armband. The system also requires a ResusciAnne manikin where CPR can be performed and a Simpad for data collection and providing feedback. The system automatically detects five types of mistakes and informs the user when one is performed. Their results indicate that the system’s feedback has a short-term positive impact on the user’s CPR performance as the error rate decreases during the 10 seconds after the prompt. A sport that shares a few similarities with tennis in terms of motion phases—particularly in the mechanics of generating power through the swing—and coaching rules is golf. Kooyman et al. [KJR13] propose a golf analysis tool that presents quantitative feedback on a graphical user interface based on video and gyroscopic data of a golf swing. Feedback includes color coding, a score ∈ [1, 20] based on putt tempo compared to the ideal ratio, and quantitative data such as Knowledge of Results and angular velocity. The motion phase segmentation is based on the putt’s angular velocity. An Augmented Reality golf learning system using decayed DTW to compare a user’s and an expert’s motion is outlined by Ikeda et al. [IHK+18]. Their system provides multimodal feedback (e.g., replay, visual feedback like trajectories, audio feedback where a pitch sound indicates the degree of motion error). 2.8.1 Motor Learning in Tennis Tennis, both as a recreational activity and a professional sport, has garnered considerable attention in the fields of biomechanics and motor learning, and there is a large body of literature on traditional training and definitions of proper technique [Knu06, KM02, KE04, RK19, REC15, LLS+10]. An early step toward an automated motor learning system in tennis involved the use of video recording to assist traditional training [KM02]. For example, Yu et al. [YWW+20] combined video recording with wearable sensors to facilitate reflective learning in traditional tennis training. Morel et al. [MAKD17] proposed a general method for movement assessment that identifies spatial and temporal errors by applying similarity modeling based on local and global DTW. The method compares a subject’s motion to a nominal motion learned from a set of experts’ correct motions while considering additional spatial tolerances. As a 33 2. Background & Related Work general solution, it can be applied to any sport as long as recordings of correct motions can be provided. For evaluation purposes, they applied it to the motions of a tennis serve and a karate tsuki. Oshita et al. [OIMK18] realized another prototype of a self-training system with broad applicability to improve trainees’ motion forms. The authors applied the system to the tennis forehand shot to evaluate their model’s motion feature assessment and prioritization. Trainees train by imitating a target motion and receiving visual and textual feedback on a 2D screen to indicate how to correct their motion form. The provided feedback conveys the disparities of one-dimensional spatial, rotational, and temporal motion features between the trainee’s full-body motion and a statistical model of correct expert motions. However, proper scaling or motion retargeting needs to be added to the current model to compare the motions of two different people validly. The authors mention this as part of future work. For evaluation purposes, they assumed that “the target motions were performed by experts whose skeletal models were sufficiently close to that of the trainee” [OIMK18](Oshita et al., 2018, p. 3). If the trainee’s movement deviates significantly from the correct model, the system visualizes the motion feature with the highest deviation. The authors conducted a user experiment to determine whether the participant could replicate the tennis motion forms using only the system’s feedback and how much the system’s selected motion features matched those deemed relevant by a tennis expert. Their setup uses an optical motion capture system with 12 cameras for full-body tracking. However, the system does not track or analyze the motion of the tennis racket, nor does it include real-ball interaction or ball simulation. For motion comparison, Oshita et al. detect three tennis key poses (take-back, impact, and follow-through). Their study had a positive outcome, and the single participant (N=1) could replicate the motion forms after 18 trials. However, the evaluation has several limitations, including the limited number of subjects and the evaluation of the short-term learning effect only. Notably, in a subsequent study [OII+19], Oshita et al. conducted further user experiments to demonstrate the feasibility of their self-training system. 2.8.2 Tennis Training in Virtual Reality Saito et al. [SMS+18] developed a virtual tennis environment in Unity to evaluate differences in service returns between expert and novice players. The simulation is set on a grass court and allows users to receive and return serves in VR. In their experiments, they used a partial mocap setup (tracking the waist, calves, and racket) to capture and virtualize serve movements and ball trajectories from a tennis expert to make them repeatedly reproducible in their virtual environment. Thereby, they created an environment where different players’ responses to identical serves can be analyzed in a comparable way. Masai et al. [MKM+22] presented a training system built upon Saito et al.’s [SMS+18] virtual tennis environment and partial motion capture framework. Their training method focuses on improving the hip movement of novice tennis forehand returns via auditory (sonification), textual (scores), and visual (ball, racket, and hip trajectories) feedback. The system generates feedback by comparing the user’s motion with that of 34 2.8. Motor Learning Systems experienced players and offers sonification as positive reinforcement when the user’s hip movements approach those of the experienced players. Overall, their study results are positive and suggest that sonification has a short-term learning effect on preparation timing for slow incoming balls. The limitations of their work consist of simplified ball physics, the absence of tactile feedback, and the use of sonification solely as a positive reinforcement and not to provide corrective feedback. Liu et al. [LWMK20] presented a VR exercise tool for racket sports with an algorithm that automatically creates and optimizes drills based on user-specified parameters, such as objectives and intensity levels. Their procedural approach facilitates exercise design in a customizable manner, taking into account users’ individual needs. Le Noury et al. [LNBRF21] investigated the representativeness of virtual environments for tennis, thereby providing essential groundwork for developing virtual tennis skill training. Their findings demonstrate that critical aspects of motion behavior in tennis, such as adapting the stance to incoming balls, can be reproduced in a virtual tennis simulation, and a sense of presence can be conveyed. These are promising results for simulating various training scenarios in VR, with the prospect of transferring skills to the real world. However, the topic of skill transfer warrants further thorough research. Jiang and Rekimoto [JR20] proposed a motor skill training process called ‘mediated- timescale learning’ that utilizes VR to manipulate time, thereby letting players concentrate more on their motion forms and decreasing frustration due to failed tasks at beginner levels. They applied it to a VR tennis training system by manipulating the speed of the incoming ball without altering its trajectory. To examine its efficacy, they conducted a cross-over study to compare VR and traditional training. Preliminary results suggest that their VR training can positively affect the number of forehand volley hits in real-world training but failed to improve motion forms. However, the efficacy of their approach needs to be investigated further with more participants to get deeper and more significant insights. Another study on time-distortion effects on learning was done by Matsumoto et al. [MWK22] in the context of skiing. Hiramoto et al. [HASN23] implemented a VR tennis serve training system where trainees can observe and mimic predicted 3D motion derived from a 2D video of a professional player. The key advantage of this approach is the ease of acquiring single-viewpoint videos of professional tennis players as opposed to 3D mocap data. However, the achievable accuracy of the 3D motion prediction from a single view is lower than that of other motion-capturing solutions, such as multi-viewpoint solutions. Multiple papers on VR motor learning systems for table tennis were published in the past [MSS+19, WPNK21], including the paper by Oagaz et al. [OSC22]. Their system offers real-time posture feedback via similarity modeling. Finally, VR is also used to preserve cultural heritage, for example, in the form of an immersive reconstruction of traditional sports and games such as “real tennis”—the precursor of modern tennis [GBDM+22, GSADMG23]. 35 CHAPTER 3 Design Guidelines and Considerations Designing a motor learning system requires careful consideration of various factors that, in turn, influence the choice of motion capture technology, analysis methods, and feedback mechanisms. This section outlines key considerations and guidelines to aid in the design process. Each motor learning application has unique requirements that necessitate tailoring the system based on the context of the assessed motor task, the application’s intended use, its goals, and the planned target audience. For instance, a system for rehabilitating elderly patients might prioritize safety, ease of use, encouragement, and continuous monitoring, while one for the performance training of elite athletes might focus on high-intensity training and performance metrics. Determining which design considerations to prioritize and which to neglect or disregard entirely depends heavily on the specific application and its objectives, and the final system design must find some balance. Existing literature provides and discusses valuable guidelines for designing motor learning systems—however, the sources we found address only specific aspects of system design or focus on certain applications, such as Waltemate et al. [WHP+15], who propose a set of requirements focusing on motion capturing and rendering for VR applications, or Frangoudes et al. [FMS+22], who concentrate on machine learning applications. To fill this gap, we provide an overview and expand upon existing design guidelines and requirements proposed or stated by the following authors: Waltemate et al. [WHP+15], Frangoudes et al. [FMS+22], Brennan et al. [BDZC19] (focus on feedback design), Knudson and Morrison [KM02] (focus on traditional qualitative motion analysis and feedback strategies), Magill and Anderson [MA10] (focus on motor learning, practice methods, and feedback design), and, finally, Sutton et al. [SPB+20] and Khairat et al. [KMCAS18] (both publications discuss strategies for clinical decision support systems). 37 3. Design Guidelines and Considerations The following guidelines are categorized into four sections and serve as broad principles to guide design decisions before deciding on the exact technologies. The first category lists guidelines concerning motion capturing and hardware, the second one focuses on user-centered design, the third, which is mainly related to the pre-processing steps and motion analysis, discusses system robustness and optimizations to allow real-time and scalability, and the final fourth examines guidelines for informative and effective coaching, involving feedback design. We will come back to them in Chapter 4 to list and argue our own design decisions, as well as in Chapter 6. 3.1 System Setup, Tracking, and Mocap Data This section describes guidelines centered around hardware setup and data acquisition. In particular, facilitating Natural Movement and Comfort, Streamline Calibration and Setup Processes, and Optimize Tracking and Mocap Data for enhanced performance. G1 Facilitate Natural Movement and Comfort “Participants should perform the motor actions as they would in a real training scenario.” [WHP+15](Waltemate et al., 2015, p. 141) Sufficient tracking volume and minimizing mobility barriers are essential to allow natural movements and the necessary range of motion for the trained motor tasks. Therefore, equipment’s impact on mobility, comfort, and safety should be considered when deciding on mocap technology. For instance, cables can pose tripping hazards for exercises that require the user to move around the space freely, and lightweight and wireless options are, therefore, best for such use cases. Specialized equipment may also be required to replicate real-world scenarios more accurately. For instance, Wu et al. [WNPK20] utilized an indoor ski stand to teach alpine skiing, while Di Mitri et al. [DMSD22] designed a tutoring system that simulated CPR using a ResusciAnne manikin. G2 Streamline Calibration and Setup Processes Motor learning systems ideally have a quick and easy setup process to minimize barriers to system usage. While a lengthy initial one-time setup may be tolerable for stationary systems, recurring setup requirements hinder and delay training. For example, setup time can be prolonged due to wearable sensors, markers, or motion suits, which must first be put on before training. Additionally, some mocap systems require regular recalibration (e.g., whenever lightning conditions change or due to the occlusion of too many markers), which disturbs training [WHP+15]. To reduce barriers and increase usability, it is advisable to consider hardware configurations with minimal setup requirements and to streamline setup and calibration through automation and intuitive procedures. G3 Optimize Tracking and Mocap Data for enhanced Performance Before developing a motor learning system, it is integral to define the critical information required—specifically, what data needs to be tracked and analyzed and the accuracy, precision, and robustness necessary—as these findings will determine the appropriate 38 3.2. User-Centered Design technologies and methods for the system. This definition, in turn, is highly application- dependent. For instance, fast-paced activities require tracking with a higher frame rate than slow-paced ones. It also depends on the motor task how fine or granular the captured data needs to be, as some body parts are unrelated to the successful execution and analysis of certain movements [HKB17] and, thus, do not need to be tracked. Hand rehabilitation exercises, for example, might require the ability to track subtle movements of the arms, hands, and fingers, but the rest of the body is mostly irrelevant. As processing a large amount of data is computationally intensive [LVX+20], balancing aspects like resolution and accuracy with processing demands is crucial when selecting motion capture technologies. Additionally, approaches such as dimensionality reduction, feature engineering, and motion segmentation can improve performance when working on extensive mocap data [LVX+20]. Another aspect to consider is the possible occlusion of body parts from the mocap systems, which is especially prominent in complex sports movements recorded by single-viewpoint optical systems [WHP+15]. Ultimately, the selected mocap technologies and preprocessing methods must provide the required level of detail, accuracy, and precision to enable a reliable skill assessment for the motor task while avoiding unnecessary complexity to allow a real-time evaluation. 3.2 User-Centered Design This section focuses on guidelines around user-centered design. Specifically, these guide- lines promote an inclusive design by outlining the importance of accessibility, inclusiveness, usability, and personalization, as well as providing an engaging learning experience in a transparent way. G4 Inclusive Design via Accessibility, Inclusivity, and Usability Inclusive de- sign principles aim to ensure that the system is usable by a wide range of individuals and involves informed design decisions to address diverse needs. A fundamental component thereof is early and recurring end-user involvement (e.g., via field research, usability test- ing, pilot studies, and expert consultation) [KMCAS18]. Furthermore, measures should be taken to improve accessibility. A key aspect of accessibility in motor learning systems is an appropriate feedback strategy that is effective, discernible, and understandable for many people. This includes presenting alternative feedback modalities, such as auditory, visual, and haptic feedback, to accommodate sensory impairments and different learning preferences [BDZC19]. Other aspects of feedback design influence its comprehensibility, such as variable feedback formulations or focus on one error at a time [KM02]. Visual aids and audio support can improve usability for users with different sensory abilities, e.g., through high-contrast interfaces, colorblind-friendly color schemes, vibrating haptics, audio descriptions, and subtitles. To ensure usability, motor learning systems should provide easy-to-navigate interfaces, intuitive controls, and clear instructions. Users who may struggle with the system, such as those with cognitive impairments or unfamiliarity with technology, can benefit from additional guidance like step-by-step tutorials, visual cues, or demonstrations. Allowing users to customize certain aspects to their liking can 39 3. Design Guidelines and Considerations improve inclusiveness and make the system more enjoyable, for example, by providing adjustable difficulty levels or assistance options to accommodate different motor skill levels. Furthermore, offering support for multiple languages can expand the system’s reach and usability. In terms of practicality, the system should ideally be easy to set up and use and have low space requirements [FMS+22]. Depending on the use case, a portable system might be considered. Lastly, the overall cost of a motor learning system (purchase price, maintenance, and operation costs such as power consumption) can be a limiting factor for widespread adoption and should be taken into account [FMS+22]. G5 Customization and Personalization As mentioned above, allowing users to customize the experience to suit their abilities and preferences or having experts customize it for them is closely linked to an inclusive design. In addition to the typical settings such as language, volume, etc., certain aspects of a motor learning system can be configurable when pertinent, such as the training program, difficulty levels of exercises, type of feedback, or personal goals. In rule-based systems, rule parameters can be made adjustable when applicable, and subsets of rules can be activated or deactivated [ZLER14]. Especially in healthcare, it might be necessary to customize training plans, exercises, and analyses to the needs of every patient [LVX+20, FMS+22]. Motor learning systems can also be personalized automatically, for example, by recognizing the user’s current skill level via performance metrics, adapting the difficulty levels of exercises accordingly, and providing tailored feedback to help them improve effectively. Most importantly, motion analysis must be personalized to the user’s anthropometric features if a generic model that accounts for unique physiological characteristics is unavailable or inapplicable [FMS+22]. This is particularly noteworthy for similarity models where the user’s motion is compared to a template. Here, normalization becomes necessary. Additionally, individual motion styles should be considered in a motor learning system [HKB17]. G6 Promote User Engagement and Motivation An overall enjoyable and im- mersive learning experience is important to foster system use and encourage adherence to training plans, especially when training plans contain repetitive or tedious exercises [BDZC19]. Incorporating engaging and motivating elements is a vital part thereof. To begin with, utilizing gamification by integrating features such as rewards, personalized goals, progress tracking, and scoring systems can have an engaging and motivational effect [BDZC19]. Next, carefully designed feedback mechanisms are essential for maintaining user interest and for the system to positively affect learning [BDZC19, KM02]. Visual feedback can enhance immersion and engagement [FMS+22, BDZC19], while relevant feedback and positive reinforcements encourage continued effort [KM02]. The last aspect comprises communicating and rewarding trainees when they get better and using clear instructions and understandable feedback [KM02]. Inadequate, incomprehensible, or incorrect feedback can decrease motivation and generate user frustration, for example, by not informing trainees about their progress [DKHH+15]. Finally, competitive elements such as leaderboards and other social aspects, like the presence of others in a virtual 40 3.3. Optimizations and System Robustness environment, can influence motivation [NMT+18], and have been incorporated in various applications [CCL+19, TNZ+24, DPdSYMM17]. G7 Enhance Transparency and Interpretability Transparency in motor learning systems is concerned with, among other things, the explainability of how the model works, the rationale behind its decisions, and the presentation, interpretability, and verifiability of results and feedback [FMS+22, SPB+20, HGH+18, GDF23]. Not only is transparency necessary to verify that the motion assessment and decision-making process works reliably and adequately [HGH+18], but it also influences user acceptance and trust, which in turn affects whether a system is adopted [KMCAS18, GDF23]. Consequently, designing motor learning systems as transparent as reasonably possible can be beneficial. This process starts with the transparent design of the underlying motion assessment model: Deterministic reasoning through rule-based models enables an explainable and verifiable process, leading to more intuitive and predictable outcomes. Such white box models are preferred regarding transparency, as black box models are inherently intransparent [KMCAS18], but sometimes, the capabilities of black box models (or models that are viewed as such) outweigh. The explainability of these methods can then be improved by providing insights into the concepts behind them, e.g., through visualizations [GDF23]. Visualizations can also be used to present results and feedback more intuitively [BDZC19]. Alongside clear and intuitive feedback, explanations can help users understand why specific recommendations are made and how to implement them. Referencing expert knowledge or providing verifiable sources can further improve trust [SPB+20]. Finally, transparency also implies that users are informed about how data is collected, used, and protected. G8 Assure Data Security and Privacy Sensible data must be handled with care to ensure ethical usage, transparency, data privacy, and security [KMCAS18, GDF23]. Preserving the privacy of users and ethical considerations are regarded as important in the healthcare domain [ZLER14, GDF23], but should also be complied with in other domains. Given that motion analysis often relies on data-driven approaches, including machine learning, that involve large volumes of data, we want to address the necessity of data anonymization, which protects individual identities. Hereby, it should be noted that mocap data can be used to identify a person [LE04]. 3.3 Optimizations and System Robustness This section presents guidelines on system robustness and optimizations to allow real-time feedback and scalability of the system. G9 Ensure Robustness and Error Tolerance A robust system ensures consistent performance across different users and functions reliably under varying conditions. Mocap technologies can cause small tracking and measurement errors, such as slight deviations in the path of a moving object and imprecision in data representations like noise, to arise 41 3. Design Guidelines and Considerations (e.g., due to occlusion or drifting) [ZLER14, FMS+22]. Motor learning systems should be robust to tracking inaccuracies to a certain degree. Since overly sensitive systems can have a discouraging effect, subtle user movement errors should be tolerated within reasonable limits [ZLER14]. On a related note, the system should also be capable of handling diverse individual movement styles and motion variations [HKB17], which may necessitate tolerances (i.e., range of correctness) to assess motion [KM02]. G10 Real-Time: Low Latency Timing of feedback is an integral part of the feedback design in motor learning systems. However, to allow real-time or timely post-activity administration of feedback, the performance and runtime of the preceding motion analysis and feedback generation processes must be fast enough [WHP+15, DKHH+15]. To enable the adaption of different timing strategies, Hülsmann et al. [HGH+18] suggest that the components of the motion analysis should deliver their results as quickly as possible. Additionally, Frangoudes et al. [FMS+22] recommend exploring ways to include real-time feedback in motor learning systems. For a responsive system with concurrent feedback, real-time analysis with low latency and presenting data with minimal delay and high frame rate is a fundamental prerequisite [WHP+15, DKHH+15]. G11 Optimize Automation, Scalability, and Generalization Creating and label- ing datasets for machine learning or constructing and implementing rule-based approaches are time-consuming and manual tasks that limit scalability [HGH+18]. Manually as- signing feedback modalities to motor errors or adapting rules and retraining models for individual users further increases the workload [HGH+18]. Another challenge is maintaining automated systems and keeping the coded knowledge up to date, as changes in coaching or medical practices and knowledge require corresponding updates [SPB+20]. When knowledge is hardcoded into the system or obtained through training, extending or modifying it becomes difficult, complicating maintenance further [ZLER14]. To address these challenges, careful consideration should be given to minimizing manual work when designing components or selecting approaches for motion analysis [HGH+18]. This can be achieved through various methods, such as reducing the amount of data necessary for data-driven approaches [HGH+18] or applying generalizable standard data models and interpretation for encoding rules in expert-driven systems [ZLER14, GDF23]. Generalized syntax, like the Arden Syntax [GDF23] or utilizing eXtensible Markup Language (XML) as presented by Zhao et al. [ZLER14], facilitates extensibility and readability of rule definitions and enables knowledge transfer between applications. Furthermore, developing more generalized components for motor learning systems, that can be more easily adapted and extended to different domains, reduces the repeated development process necessary for domain-specific models [FMS+22]. 3.4 Assessment, Coaching, and Feedback Design This section describes guidelines centered around the information that is presented to the trainee. It mainly focuses on coaching and feedback strategies. 42 3.4. Assessment, Coaching, and Feedback Design G12 Skill Transferability: Feedback Strategy For motor learning systems— especially those using extended reality (XR)—skill transfer to the real world is necessary to be an effective tool to support traditional methods and should, therefore, be evaluated [RAB+00, MSS+19]. The main questions popping up are whether training with motor learning systems has the desired effect and whether and to what extent the acquired skills or knowledge are effectively applied and reproduced in real-world situations. Of course, the transfer of desired training effects is intended, but sometimes no transfer or the transfer of undesired effects can be observed [RAB+00], e.g., as mentioned by Michalski et al. [MSS+19], inconsistencies in a training simulation compared to the real world can lead to unnatural movements during training in the application. A requirement proposed by Hülsmann et al. [HGH+18] that might help with transferability is the connection to feedback strategies already established in real-world scenarios to address and correct common mistakes of trainees. To realize an existing feedback strategy, a motor learning system should be able to identify common mistakes and assess them in the necessary level of detail to provide feedback aligned with traditional coaching practices. The design of the whole motor learning system could also lean on existing models applied in traditional coaching, such as the Qualitative Analysis of Human motion described by Knudson and Morrison [KM02]. This might also help heighten acceptance of the system as the decision-making process resembles common methods that might align with the expectations of trainers and trainees [KMCAS18]. G13 Clear Instructions and Demonstrations of Exercises “Skills should be demonstrated several times before a beginner practices a skill, with additional demonstra- tions during practice as needed.” [MA10](Magill and Anderson, 2010, p. 340) Before training a motor skill, trainees should become familiar with it and build an initial mental model of the correct execution. Clear instructions, step-by-step tutorials, or demon- strations can be used to communicate to the trainee how to perform the motor skill [MA10, ZLER14, DKHH+15]. Demonstrations during training sessions can help trainees directly compare their performance to the correct execution [GK22]. G14 Support Self-Monitoring and Reflection “Reflection is the process of an indi- vidual recapturing their experience, thinking about it, and assessing it.” [YWW+20](Yu et al., 2020, p. 64) Trainees should be able to observe and assess their motion in one way or another to facilitate reflection [WHP+15]. Playback should also provide knowledge of re- sults, such as the outcomes of a tennis stroke [YWW+20]. Self-observation makes it easier for trainees to link feedback to their performance, making feedback more understandable and helping trainees to make more precise adjustments based on what they see [YWW+20]. It also allows trainees to compare their technique to the correct execution (demonstration, recording, or mental model) and reflect on the differences [YWW+20, GK22]. Besides, ways to perceive one’s own technique and progress allow self-monitoring, which can build self-awareness about strengths and weaknesses and encourage self-correction of errors without constant external feedback. In real-life training scenarios, self-observation is achieved through mirrors [WHP+15] or video recording [YWW+20]. In motor learning 43 3. Design Guidelines and Considerations systems, there is the additional possibility of playing back the captured motion in the form of a 3D avatar, either in real-time as a virtual mirror or after the performance as a playback in third-person view [WHP+15, GK22, ZLER14, OIMK18]. On top of that, there is the ability to couple mocap replays with visual feedback mechanics, which is portrayed in various XR applications [IHK+18, OIMK18, FCNH+18]. Finally, mocap recordings can also be used for post-exercise reviews by coaches or clinicians [ZLER14]. G15 Preplanning The first step in the qualitative human movement analysis model described by Knudson and Morrison [KM02] is the preparation step, which focuses on preplanning before teaching an activity. This step involves gathering knowledge from diverse sources, including research and expert opinions about the domain, trainees, and activity or movement trained (e.g., optimal technique, outcome measures, key features, common errors or injuries). Additionally, it encompasses insights about effective instruction methods, such as suitable cues and the impact of different teaching styles on learning. We believe this is an integral step not only in traditional coaching but also essential for motor learning systems to ensure their accuracy and effectiveness. Accordingly, we recommend that the system’s motion analysis and feedback components be grounded on a knowledge base gathered prior to the design process. Furthermore, knowledge in the form of data or rules is required to develop an automated motor error analysis. G16 Provide Informative Assessment Automated motion analysis allows the monitoring of motion error and feedback generation. However, the assessment must deliver enough relevant insights for informative and corrective feedback [FMS+22, HKB17, DKHH+15]. The level of detail and informative value of the motion analysis results depends on what information and feedback the user needs to receive while using the system and should be determined at an early stage as it influences the design and development of the motion analysis method [FMS+22]. Suppose corrective feedback and interventions are required in the event of movement errors. In this case, it is not sufficient to classify only the correctness of the movement or to make an overall assessment of the movement quality as discussed in Section 2.6—instead, motion errors must be recognized, and further analysis of these errors may be necessary to provide instructions for improvement. G17 Prioritizing Interventions and Feedback “The amount of information included in verbal instructions should take into account learners’ attention-capacity limitations.” [MA10](Magill and Anderson, 2010, p. 340) Too much feedback or technique adjustments at once can overwhelm the user and lead to analysis paralysis, hindering their ability to improve [KM02, BDZC19]. A related issue is alert fatigue, which arises when alerts (or interventions) of minor importance are provided too frequently, desensitizing users and leading to the potential dismissal of critical alerts [SPB+20]. “An analyst who focuses the performer’s attention on minor or symptomatic errors at the expense of more important problems may indirectly contribute to injury.” [KM02](Knudson and Morrison, 2002, 44 3.4. Assessment, Coaching, and Feedback Design p. 112) It is suggested in the literature that feedback should be limited and that only the most relevant intervention be prioritized and selected at any given time to mitigate these issues and to allow users to focus on meaningful improvements [KM02, SPB+20]. Furthermore, minimizing disruptive, excessive, or unnecessary interventions can help prevent alert fatigue [SPB+20]. The primary design challenge lies in identifying the most relevant intervention and determining the optimal timing for intervention. As an illustrative guideline, McGinnis [McG13] suggests that movement errors that pose a risk of injury demand immediate correction. Knudson and Morrison [KM02] summarize rationals for prioritizing certain interventions from other literature and conclude that there are “six logical rationales for prioritizing corrections to select the best intervention: relating actions to previous actions [i.e., follow-up errors], maximizing improvement, making the easiest corrections first (working in order of difficulty), correcting in sequence [e.g., correcting errors in the earlier motion phases first, as they may relate to errors in later phases], moving upwards from the base of support, and fixing critical features first” [KM02](Knudson and Morrison, 2002, p. 125) G18 Key Considerations in Feedback Design Choosing a feedback strategy and feedback design is a multi-layered consideration and an important part of motor learning systems as feedback delivery can affect motor learning both positively and negatively [BDZC19, KM02]. Knudson and Morrison [KM02, p. 132–137] suggested seven key considerations for providing extrinsic feedback to guide the design of feedback, summarized in the following: Limit Feedback. Feedback should be limited to one specific aspect at a time and provide relevant, clear, and direct information to avoid an information overload. Specific and Individualized Feedback. Feedback should be specific to the individual performance and trainee. General feedback may be less helpful and actionable. Immediate Feedback. Feedback administration should not be delayed for a prolonged time and should be kept close to the performance or set of trials. Instantaneous or near-immediate feedback may help trainees link it to their own perception. Aggregated feedback may also be effective. Positive and Actionable Cues. The feedback should highlight both strengths and weaknesses. It should inform what trainees should do, not what they should not do. Frequent Feedback. High-frequency Feedback can help beginners by providing con- stant guidance, whereas more advanced learners might benefit from reduced feedback frequency, promoting self-reliance and internal error detection. A common approach is to start with high-frequency feedback and gradually decrease it as the learner progresses. Concise Cue Words and Phrase. The meaning of short cues can be established with a trainee and can be an effective and concise way to provide feedback. Variety. A repertoire of cues for the same intervention may help communicate and clarify the feedback, and various modes can be used to support different learning preferences. 45 CHAPTER 4 Methodology for Tennis Forehand Motion Learning in VR Target Group: Beginner to intermediate tennis players Motion Task: Modern tennis forehand topspin (Eastern Grip) Our tennis learning methodology combines the versatility of a virtual training environment with principles of motor learning and automated motion analysis, resulting in a VR- based tennis motor learning system for the modern forehand groundstroke. The core components for practicing tennis in VR—including a virtual environment, input handling, and realistic ball physics—are provided by a virtual tennis simulation released on Meta Quest called Tennis Esports1, which forms the basis for our motor learning system. Tennis forehand training with our motor learning system is conceptualized as target training, where multimodal feedback is given after each completed forehand stroke, focusing on one error or improvement at a time. A training session is structured as follows: To participate in the training, the trainee wears a head-mounted display (HMD) and holds a racket handle with an attached controller in their dominant hand and another controller in their non-dominant hand. The training takes place on a virtual indoor tennis court (Figure 4.1). The trainee stands on one side of the court, facing the net, while a ball machine shoots a ball with predetermined parameters in their direction. The ball machine’s position and the configuration of the ball’s flight path (e.g., ejection angle, speed, and spin) change throughout the session. A rectangular target is marked on the other side of the net; its position and area also change during training. The action goal for the trainee is to strike the incoming ball before it bounces more than once on the court with a ‘proper’ forehand, imparting topspin and hitting the marked target on the opposite side. The definition of a ‘proper’ forehand is encoded in the system as five consecutive motion phases that the trainee must carry out and a set of coaching rules 1Tennis Esports Website https://www.tennis-esports.com (11/14/2024) 47 4. Methodology for Tennis Forehand Motion Learning in VR Figure 4.1: Screenshot of the virtual environment from the trainees point of view. for each of these phases that must be satisfied. The motion phases and coaching rules were defined in collaboration with domain experts. The system utilizes the Meta Quest 2 sensor data to track the trainee’s motion, resulting in mocap data for the racket, weak hand, and head. During the execution and recording of the forehand stroke, features are extracted, and the stroke is segmented into motion phases. Subsequent rule-based motion analysis assesses the technique and performance of the forehand stroke. After each completed shot, the ball machine pauses, and the system selects a coaching rule to provide multimodal feedback. The choice of the coaching rule depends on several factors described in Section 5.4 and should guide the training step-by-step, help the trainee recognize mistakes, and reinforce improvements positively. A motion replay, showing the racket’s path segmented into motion phases and animating the mocap data, is presented to the trainee (Figure 4.2). An interactive user interface (UI) complements this by displaying textual feedback. On-demand, the trainee can view performance metrics and detailed descriptions of all motion phases and coaching rules, accompanied by illustrative images. Additionally, color coding in the replay and UI highlights errors in the motion. Finally, the ball machine and target training can be resumed anytime through controller input or the UI. Building upon Tennis Esports as the virtual application and selecting the Meta Quest 2 as the motion-tracking device helps maintain a manageable project scope. This choice allows us to focus on other critical aspects of the learning methodology, such as motion analysis and feedback design. However, they also introduce technological boundaries—particularly the partial motion tracking—which guides and constrains our design choices. This chapter outlines the design of our VR-based tennis forehand motion learning methodology and addresses our design-related research questions (see RQD 1–3 in Section 1.3). It also explains the rationale behind critical design decisions and highlights the guidelines we prioritized or neglected in our final design. 48 4.1. Tennis ESports as Foundation Figure 4.2: Screenshot of the visual feedback components from the trainee’s point of view. 4.1 Tennis ESports as Foundation Tennis Esports2 is a virtual tennis application developed in Unity by the Austrian company VR Motion Learning GmbH & Co KG3, focusing on realistic ball physics and immersive gameplay. The application provides a virtual environment where players can compete in tennis matches against each other or train against AI opponents. It also contains other single-player features, such as arcade tennis modes or target training, where players can practice their aim with different tennis strokes. At the time of writing, Tennis Esports is available for the Meta Quest (formerly known as Oculus Quest) via the Meta Quest Store4. Additionally, an evolved version of our tennis learning methodology is publicly available as a purchasable add-on named TechZone5, with some free content available for trial use. By providing a virtual environment and tennis simulation with the essential elements for playing tennis in VR, such as ball physics and input handling, Tennis Esports is a great foundation for realizing and evaluating our tennis learning methodology. The application includes the following functionalities, which are necessary as a basis for implementing our tennis learning methodology: 3D Models & Virtual Environment The Tennis Esports team designed a custom environment for the TechZone (Figure 4.1), which we also used for our study. It features an indoor tennis court, a resizable target, a configurable ball machine, and info panels to display, for example, the current ball speed and spin. Ball Physics and Collision Detection Ball collisions are detected with the net, ground, targets, and racket. The ball behavior is simulated with an in-house developed ball 2Tennis Esports Website https://www.tennis-esports.com (04/29/202) 3Company’s Homepage https://www.vr-motion-learning.com (04/29/2025) 4Store Page https://www.meta.com/experiences/tennis-esports/4872542182873415 (04/29/2025) 5TechZone’s Webpage https://www.tennis-esports.com/techzone (04/29/2025) 49 4. Methodology for Tennis Forehand Motion Learning in VR physics. Simulation data such as ball spin or the optimal hit point can be accessed. Mocap and Input Handling Mocap via the Meta Quest 2, player height estimation, and input handling, such as teleportation or ray interaction, are implemented. Racket Handle Tennis Esports offers support for racket handles. Furthermore, at the time of writing, a custom Tennis Esports racket handle for the Meta Quest 2 was available, which we used for our study. Environmental Sounds This includes background music and sound effects for the ball machine, ball collisions, and for the racket air resistance. Feedback (KR) Haptic feedback indicates a racket ball collision. For target training, visual and auditive feedback is provided when a player hits or misses the target. 4.2 Design of the Motor Learning System The design of our motor learning systems is guided by the objectives and requirements established throughout Chapter 1 and influenced by the predefined software and hardware components. While predefining the infrastructure (particularly the mocap technology) results in disregarding guideline G3 ‘Optimize Tracking and Mocap Data for Enhanced Performance’, knowing the given technological boundaries, such as those imposed by partial mocap, upfront enables us to design more effectively within them. Based on our prerequisites, we prioritize the following guidelines (described in Chapter 3) in our design to achieve the previously outlined objectives: • Facilitate Natural Movement – G1 Facilitate Natural Movement and Comfort • Ease-of-Use, Portability, Flexibility, and Accessibility – G2 Streamline Calibration and Setup Processes – G4 Inclusive Design via Accessibility, Inclusivity, and Usability • Motivation and Enjoyment – G6 Promote User Engagement and Motivation • Independence (Self-Training with guidance) – G10 Real-Time: Low Latency – G14 Support Self-Monitoring and Reflection – G16 Provide Informative Assessment – G17 Prioritizing Interventions and Feedback • Determinism and closely aligned with existing tennis training methods – G7 Enhance Transparency and Interpretability – G9 Ensure Robustness and Error Tolerance – G12 Skill Transferability: Feedback Strategy – G15 Preplanning One prerequisite is to align the teaching methodology with traditional coaching practices and feedback strategies established in real-world tennis training. While we utilize quanti- tative analysis to evaluate the user’s forehand technique and performance, our motor learning system’s general design and analysis cycle closely follows the four-task model 50 4.2. Design of the Motor Learning System for qualitative human motion analysis described by Knudson and Morrison [KM02]. As illustrated in Figure 2.5, this model breaks down the coaching process into four sequential tasks: preparation, observation, evaluation with diagnosis, and intervention. Figure 4.3 depicts our adapted version of this model for automation and quantitative analysis. Preparation in our model summarizes the preplanning, design, and implemen- tation efforts that went into developing our motor learning system. This step includes gathering domain expertise, defining the parameters and coaching rules of a ‘proper’ tennis forehand technique, designing the teaching methodology, encoding the knowledge base, and developing the primary software component. The subsequent steps are part of the system software component itself. Each forehand stroke during target training is observed by capturing partial mocap data, pre-processing and organizing it, and extracting higher-level motion features. Based on this data, the system automatically evaluates the technique and performance of the shot. Once all motion phases are identified and analyzed, the system diagnoses the most suitable intervention – in our case, selecting a coaching rule through a simple decision tree. Finally, intervention occurs, which is the “administration of feedback, corrections, or other changes in the environment to improve performance” [KM02](Knudson and Morrison, 2002, p. 128) before target training continues and the process repeats. Domain Expertise Design Implementation Preparation Gathering domain expertise, defining phases, rules, and features Design learning methodology and decide on feedback strategies Development of our motor learning system Motor Learning System Forehand topspin target practice in which each stroke is automatically observed, analyzed, and followed by multimodal feedback upon completion. Mocap Preprocessing Feature Extraction Observation Capturing sensory information of the performed forehand stroke Preprocessing of recorded data, including filtering and interpolation Extracting and estimating features from the recorded data Phase Segmentation Rule-based Motion Analysis Diagnosis Evaluation & Diagnosis Segmentation of mocap data into motion phases Evaluation of coaching rules per phase and calculation of performance scores Selection of the most relevant rule based on performance Motion Replay Feedback Administration Detailed Analysis Insights Intervention Replay the mocap data and show motion phases Providing multimodal feedback on the selected coaching rule Providing scores and insights for all phases and rules on demand Figure 4.3: Schematic overview of our design process and motor learning system. 51 4. Methodology for Tennis Forehand Motion Learning in VR 4.2.1 Partial Mocap The Meta Quest 2 is a standalone, wireless VR headset with controllers available for the consumer market. It meets our requirements for portable, easy-to-setup hardware for motion tracking and feedback delivery (G2) without introducing tripping hazards or mobility barriers from motion suits or cables (G1). The device uses inside-out tracking via four built-in cameras in the HMD and IMUs in the controllers to track the hands and head. Unlike single-camera tracking solutions (e.g., smartphone-based systems), the Meta Quest 2 can capture the full range of a tennis racket swing. However, two main limitations of using the Meta Quest 2 for motion tracking (without the hand and body tracking feature active or additional tracking methods) must be addressed in the design of our learning methodology. 1. Per default, the Meta Quest 2 tracks only the HMD and controllers, providing rotational and positional tracking in 3D space for each (six degrees of freedom). Consequently, only partial mocap data for the trainee’s hands and head is available for motion analysis and feedback generation. 2. Tracking relies on the controllers remaining visible to the HMD’s built-in cameras. While the IMUs provide data that help estimate motion when a controller moves out of the cameras’ line of sight, if a controller remains in a blind spot for a few seconds, tracking may be temporarily lost. In our setup, one controller is held in the non-dominant hand while the other is mounted on a racket handle. This configuration results in pose data (rotation and position) for the non-dominant hand, racket, and HMD. Given the known dimensions of the virtual racket, additional data points (e.g., the position of the racket’s top, butt cap, or mid-stringbed) can be derived. Furthermore, instantaneous velocity, speed, and acceleration can be estimated for each tracked or derived point. Since the Meta Quest 2 does not feature eye tracking, the HMD’s forward direction estimates the user’s view direction. This direction likely does not correspond to the actual gaze direction due to independent eye movement. Hence, gaze-related tolerances should not be too strict but must remain within the HMD’s field of view (FOV) limits, as values beyond these are not visible to the user. Tennis Esports supplies a user-height estimation, from which other height-related anthropometric features can be estimated based on average human body proportions (see Section 5.1). While these values are rough approximations, they can provide valuable contextual information for analysis. Additionally, input data from the controller buttons and data from the tennis simulation, such as delta time, frames, ball state, trajectory prediction, and collision information, is available. 4.2.2 Tennis Forehand Motion Analysis Method There are various ways how a tennis player can execute a tennis forehand groundstroke. We decided to teach the modern forehand topspin with an eastern grip for our learning methodology. In a modern forehand topspin, the racket draws a loop, as illustrated in Figure 4.4, to facilitate a smooth motion and continuous generation of racket speed. The 52 4.2. Design of the Motor Learning System Figure 4.4: Illustration of the C-looped stroke pattern characteristic of a modern forehand topspin stroke, spanning from the Ready Position up to the Contact Point. C-looped pattern of the swing has multiple advantages, including the utilization of gravity to accelerate the racket [KE04] and the low-to-high motion at contact to impart topspin. Continuous racket motion is considered ‘ideal’ [KE04, RW11], but sometimes, small pauses in the back or during the backswing can help correct timing issues. However, as the controllers of the Meta Quest 2 lose tracking after prolonged time located outside the view of the HMD’s cameras, we determine the proper technique to have ‘no or minimal pause in the back or during the backswing’, which may be very strict but a necessity to reduce tracking problems in VR. Additionally, we decided to teach a follow-through over the opposite shoulder as it is considered beginner-friendly [RW11]. We choose a rule-based motion analysis approach to analyze the technique of a forehand stroke. The reasons for adopting an expert-driven method are manifolded. First, as stated in Chapter 1, a deterministic motion assessment is required to support a testable, transparent, and interpretable analysis, aligning with guideline G7. Second, to provide timely feedback outlined in guideline G10, straightforward coaching rules allow for a lightweight, rule-based implementation, enabling near-real-time feedback after each shot without required delays due to the runtime of the analysis. Third, the available resources make adopting a rule-based approach practical. Coaching rules co-designed with professional tennis trainers are accessible from a prior research project [KGSK24], and multiple domain experts are also actively supporting this project by sharing insights, assisting with refining thresholds, and offering valuable feedback. Additionally, various coaching rules for tennis have been published in the literature [KE04, BS13, REC15, SW00, MA10, And09, IAR+23, RW11]. While the coaching rules are not encoded for direct use in automatic motion analysis and require manual implementation, we lack a suitable and sufficiently large dataset to adopt a data-driven approach for our partial mocap setup. Collecting and labeling such data would be necessary, rendering the advantage of reducing manual work when working with an existing dataset and a data- driven method void. Moreover, the limitations of our partial motion tracking system prevent implementing complex rules that analyze interlinked joint motions, which would warrant a data-driven approach. Our tennis coaching rules are tied to specific motion phases, meaning prior segmentation of these phases is required for evaluation. For this segmentation, the definitions of the phase’s bounds are adapted to the limitations of the partial mocap. The set of applicable coaching rules and the corresponding tolerances 53 4. Methodology for Tennis Forehand Motion Learning in VR Ready Position Backswing Forward Swing Contact Point Follow Through Execution Figure 4.5: Illustration of the six upper-body motion phases of the forehand topspin. and thresholds have also been adapted for our infrastructure (G9). 4.2.3 Tennis Forehand Motion Phase Definition There are different definitions for the tennis forehand motion phases in the literature, with varying levels of detail [LLS+10, KE04, ŠŠPM19]. The limitations of partial mocap require us to address RQD 1 in our design and establish custom definitions for the temporal boundaries of motion phases, ensuring they can be detected near real-time using the limited data available. For instance, segmentations based on foot positions or body rotations are not applicable within our infrastructure. In addition, we cannot assume a proper technique for the segmentation. Since our system should detect errors and guide correct technique, the motion phases cannot be defined based on an optimal pose or pose transit, as trainees might not achieve these states. Our definition and segmentation of motion phases must account for various motion errors and individual motion styles. Therefore, other aspects must be found instead of relying on optimal poses or motions to identify motion phases. Based on the racket’s movement, the upper body’s forehand topspin stroke is divided into five motion phases. Additionally, outcome-related performance aspects, such as ball spin and accuracy, are grouped under the category of execution. This results in the following six motion phases, also illustrated in Figure 4.5, for which coaching rules are evaluated and performance metrics calculated. The definitions of the phases are explained in the sections below. 1. Ready Position (RP) 2. Backswing (BS) 3. Forward Swing (FS) 4. Contact Point (CP) 5. Follow Through (FT) 6. Execution (EX) According to this segmentation, the groundstroke starts from the ready position. The racket is then brought back during the backswing. During the forward swing, the racket is accelerated towards the ball, where it should eventually hit it. This moment is called the contact point. After impact, the player follows through and, ideally, finishes with the racket over the opposite shoulder. In our segmentation, we are ignoring the recovery 54 4.2. Design of the Motor Learning System phase, i.e., the phase after a completed forehand stroke where the player transitions back to the ready position. 4.2.3.1 Ready Position The initial phase of the forehand stroke is the ready position, which describes the posture and stance a player adopts between tennis shots to be ready and in an optimal starting position for the next incoming ball [RW11]. It occurs after recovery from the previous stroke and before the actual swing to hit the next ball. In simple terms, it describes the moment just before the actual forehand stroke begins. During this timeframe, the player should prepare their body and racket for the incoming shot. Accordingly, this phase is also called preparation in literature [KE04]. We define this phase more precisely as the short period where the racket undergoes minimal local displacement before the racket is moved back, away from the incoming ball, in the subsequent swing. While tennis players move a lot on the court between shots and make a slight jump during the split step [KE04], the local movement of the racket during a proper ready stance is relatively low. A correct preparation consists of the body facing the opponent straight ahead, knees bent, feet about shoulder width apart to ensure a stable and balanced stance, and arms in a steady position with both hands holding the racket in front of them [KE04, RW11]. There is no significant body rotation or arm movement involved. Therefore, the racket stays almost stationary with negligible velocity relative to the player during the temporal extent of a proper ready position. The relative speed of the racket also stays low when a player lets the racket dangle around in between shots. We did not want to take factors of a proper technique for detecting the bounds of the ready position into account. Our more general definition makes it possible to detect a proper ready position, where the racket is held upwards in front, while also detecting rest periods with errors in posture, like letting the racket dangle or not facing the opponent. However, we assume that tennis players hold the racket at rest at some point in between shots, at least for a short period, even when they do not have a proper technique. Hence, no ready position will be detectable with this definition if a player moves the racket around a lot between shots. In this case, our system will determine the ready position as absent and rate it with the lowest score. 4.2.3.2 Backswing As soon as the ball approaches, the racket is brought back through body rotation and arm movement to prepare for the stroke toward the incoming ball [KE04, GRCR20, RW11]. This movement is referred to as backswing. The backswing begins with the unit turn, initializing the backward motion of the racket [RW11]. This phase continues until the racket reaches its farthest point back and “ends at the time preceding the first forward motion of the racket.” [GRCR20](Genevois et al., 2020, p. 2) 55 4. Methodology for Tennis Forehand Motion Learning in VR Figure 4.6: Examples of different possible racket paths during the backswing of a tennis forehand. They range from a high take-back characteristic of a modern topspin forehand over a flat forehand, a pendulum-like swing, and less conventional patterns such as multi-swings and looping motions. In our learning methodology, we decided to teach the modern forehand as the proper technique to play topspin, characterized by a C-shaped racket path and high take back, as illustrated in the leftmost image in Figure 4.6. However, the actual way a tennis player performs the backswing can vary greatly, as exemplified in Figure 4.6. Not only does the size of the loop of a modern forehand vary between players, but the trajectory of the racket can also take on different forms. An example is the flat forehand stroke, where the backswing is almost parallel to the court. Another example is a pendulum-like swing, where the racket swings back and forth, beginning with a drop from the ready position before being brought back up during the backswing. Additionally, less conventional patterns may occur. For instance, a multi-swing can arise when a player initiates the swing too early and compensates by executing an additional corrective swing. Similarly, players may exhibit looped swing patterns. Furthermore, players might pause the racket during the backswing or in the back. Our phase segmentation approach must be capable of recognizing all these variations as part of the backswing phase. Given that our mocap data does not include shoulder or trunk rotation, we define the backswing based on racket motion. Our definition is as follows: The backswing starts when the racket begins moving back in space, away from the net and ball (specifically, along the direction of the player’s swing motion at contact). The backswing ends with the first frame of the forward movement toward contact. If the swing contains no distinct forward movement (i.e., when the ball is hit with a backward motion), the backswing ends with the contact. Any pause that occurs during the backward movement of the racket, such as a pause at the back, is considered part of the backswing. Similarly, additional swings or looping motions are also counted as backswing. The only part excluded from the backswing is the final forward motion leading up to contact. Suppose the backswing is absent or not detectable, for example, when the player swings the racket directly forward from the ready position. In that case, our system will rate the backswing with the lowest score without evaluating its associated coaching rules. 4.2.3.3 Forward Swing The forward swing refers to the phase during which the racket moves toward the ball until contact. This phase is also known as the acceleration phase, as the racket is accelerated to build up speed and power to impart on the ball [ŠŠPM19]. Like the backswing, the 56 4.2. Design of the Motor Learning System racket trajectory of the forward swing can take on different forms. For a topspin shot, the racket must drop below the oncoming ball and then swing upward to impart spin, while, in a flat forehand shot, the racket path is nearly horizontal [GRCR20]. We define the bounds of the forward swing based on racket motion. The forward swing starts with the first frame of the forward movement that leads up to contact and ends just before the racket makes contact with the ball. If the system cannot segment the forward swing, for example, if the ball is hit with a backward motion and the racket path does not contain a pronounced forward motion, the phase will be rated with the lowest score. 4.2.3.4 Contact Point The contact point, sometimes referred to as ‘point of contact’ or just ‘contact’, refers to the short moment where the racket makes contact with the ball. The duration of the phase can be defined as the “brief moment in which the ball remains on the strings.” [ŠŠPM19](Šlosar et al., 2019, p. 3) In our approach, the boundary frames of the contact point are defined as the two consecutive frames that encapsulate the racket-ball collision. A valid contact point is only detected if the racket successfully contacts the ball before it bounces a second time on the court. If no collision occurs, the contact point is counted as absent and assigned the lowest score. However, the system will still evaluate the associated coaching rules if the player barely misses the ball and the most likely collision frame can be estimated. 4.2.3.5 Follow Through “The final temporal phase of tennis strokes is the follow-through” [KE04](Knudson and Elliott, 2004, p. 171), sometimes called ‘finsh’. It starts immediately after the contact and lasts until the stroke’s completion, when the racket stops, before recovery [GRCR20, ŠŠPM19]. There are various correct and incorrect ways to follow through on a forehand topspin stroke. An advanced technique is the reverse forehand [RW11]. Our system teaches a finish over the opposite shoulder and slightly different variations thereof as a proper technique, as it is a popular technique for the modern forehand topspin that is considered beginner-friendly and is advised in literature [RW11]. As with the previous phases, proper technique cannot be assumed when defining and segmenting the follow- through. We defined the follow-through as the deceleration phase of the racket, beginning immediately after the contact point and ending when the racket has fully decelerated. Suppose the racket does not come to a complete stop before returning to the ready position. In that case, the follow-through is considered to end at the moment the racket’s speed and path change to initiate the recovery transition. The segmentation of the follow-through is detailed in Section 5.2. 4.2.3.6 Execution The execution refers collectively to the outcome and results of the shot. This includes the spin and speed of the outgoing ball and the shot accuracy. The execution of a forehand 57 4. Methodology for Tennis Forehand Motion Learning in VR stroke can only be evaluated after a successful contact point. 4.2.4 Tennis Forehand Coaching Rule Definition For a previous research project [KGSK24], tennis coaches helped us break down the fundamental technical and biomechanical aspects of the modern forehand topspin, formu- lated as coaching rules and thresholds. Additional coaching rules were extracted from the literature during the research phase of this thesis [KE04, BS13, REC15, SW00, MA10, And09, IAR+23, RW11]. Together, these rules form the knowledge base necessary for a rule-based analysis of the tennis forehand. Each coaching rule is associated with a motion phase and defines either the optimal state of features or characteristics of a specific error during that phase. However, not all of the collected coaching rules are directly (or at all) applicable to our system. The partial mocap setup limits the available data for automatic motion assessment, thus restricting the coaching rules to those that can be evaluated or reasonably estimated from the available information. RQD 2 is addressed as part of the design process for our motion learning methodology in order to select a set of applicable coaching rules. As discussed earlier, our system tracks only the racket, weak hand controller, and HMD. Although the HMD captures head movements that can provide indications about a person’s general posture and motion—such as the general direction of movement, trunk rotation [MA10], direction of gaze, or if standing upright—the data is insufficient to evaluate lower-body or full-body dynamics such as footwork or court positioning, which are critical for many coaching rules. For instance, a rule like ’the feet should be shoulder- width or slightly wider apart during the ready position’ cannot be assessed due to the lack of lower-body tracking. Consequently, the set of applicable coaching rules is narrowed down to those focusing on racket and upper-body movement. The absence of mocap data for upper-body joints other than the hands and head introduces additional constraints. While inverse kinematics could estimate some joints (e.g., elbows), we opted against it as the estimations do not represent the actual movement and are too imprecise. Furthermore, as the system cannot distinguish whether captured racket movements stem from wrist rotation or other joint motions, coaching rules on wrist rotations are also excluded. Additionally, the system does not track how the racket handle or controller is held, preventing the assessment of grip-related rules. Based on these limitations, we identify the following categories of coaching rules that can be evaluated using the available partial mocap data. Focus. Evaluate a player’s visual attention by comparing their estimated view direction to a defined optimal direction they should face or look at during any given frame. The HMD’s forward direction estimates the user’s view direction. Coaching rules concerning a player’s focus evaluate whether a user is watching the ball (illustrated in Figure 4.7b), focusing on the opponent, or the contact point. The relevant features used to evaluate focus are the HMD’s pose and the position of the ball, opponent, and contact point. 58 4.2. Design of the Motor Learning System Posture. Assess the posture and hand placement. Due to the limited tracking of body joints, this category contains only a few coaching rules that are applicable to our setup. These coaching rules include the indication of over-rotation during the backswing when the estimated strong hand or racket reaches too far behind the head and hand placement relative to the head during the follow-through. Key features are extracted from the mocap data. Timing. Evaluate temporal aspects such as reaction time to the ball machine, timing of phase transitions relative to the ball, continuity of the racket swing path, and timing/placement of the contact point relative to the HMD and ball. Most coaching rules in this category depend on features from the mocap data and simulation data, including ball position, collisions, and timing of the ball machine. Swing Path. Analyze the racket trajectory and assess key aspects of a C-shaped swing characteristic of a modern forehand topspin. This category consists of coaching rules mainly concerned with the shape and placement of the swing path relative to other features, such as the head and ball, using critical points on the racket’s path to assess its shape. Racket. Evaluate the racket’s pose and velocity throughout the swing. Examples include the analysis of the racket’s position relative to the HMD during the ready position to determine whether the racket impedes the player’s view of the opponent (illustrated in Figure 4.7a), analysis of the racket angle to assess whether the racket’s tip is below or above the wrist at a particular time, and the angle of the racket face at impact. While this category focuses on the racket, other features are used to assess its pose relative to them. Control. Concerned with outcome measures of the shot, such as shot accuracy during target training and the spin imparted on the ball. Although not included in this study, this category could also encompass assessing the ball’s collision point on the racket, specifically whether it occurred at the sweet spot. Phase. Assess whether a motion phase was performed or absent. This category becomes relevant when the system cannot detect a motion phase, either because the player did not perform it or because the system failed to detect it. Examples include the inability to recognize a ready position with our definition when the racket moves around non-stop or the absence of a backswing if a player swings the racket forward to hit the ball immediately after the ready position. Combinatorial Rules. They combine two or more coaching rules into a general guideline to provide broader feedback and a better overview of the forehand technique. Combinatorial rules do not have their independent definition or implementation; instead, they summarize the results of their component rules. Scores are combined by weighted arithmetic mean, with custom weights for each rule, and the rule is evaluated with a score of 100% only if all component rules reach 100%. For example, the rule CP:COM:IdealContact defines the ideal contact point in a forehand by combining three specific rules: hitting 59 4. Methodology for Tennis Forehand Motion Learning in VR (a) RP:RKT:ClearView (b) BS:FOC:WatchBall Figure 4.7: Example images illustrating coaching rules for the Ready Position (RP) and the Backswing (BS). Table 4.1: Lists the number of coaching rules implemented in our system per motion phase (Ready Position (RP), Backswing (BS), Forward Swing (FS), Contact Point (CP), Follow Through (FT), and Execution (EX)) and category. #Cb. denotes the number of combinatorial rules. Phase #Rules (#Cb.) Categories RP 8 (2) Phase [1], Focus [1], Racket [6] BS 18 (3) Phase [1], Focus [1], Racket [1], Posture [4], Swing Path [7], Timing [4] FS 9 (2) Phase [1], Focus [1], Racket [3], Swing Path [4] CP 10 (2) Phase [1], Focus [1], Racket [4], Timing [4] FT 7 (1) Phase [1], Focus [1], Racket [3], Posture [2] EX 6 (2) Control [6] the ball at waist height (Height), contacting it in front of the body (Front), and striking on the dominant side (Side). Our system implements 58 coaching rules for the forehand topspin, including 12 com- binatorial rules. The distribution of these coaching rules across the motion phases and categories is detailed in Table 4.1. Examples of coaching rules for each category and motion phase are provided in Table 4.2. 4.2.5 Feedback Design The rule-based motion assessment provides detailed insights into the trainee’s tennis forehand technique. While presenting all results at once may serve as a diagnostic tool, it does not comply with our requirements for a motor learning system that provides actionable instructions and clear feedback. For the insights to effectively guide skill improvement, the system must prioritize and communicate them in a way that is accessible, comprehensible, and actionable to users while neither overwhelming nor discouraging. The primary goal of our feedback design is for users to perceive the provided insights and feedback as helpful for their skill development and learning. This challenge leads to our third research question, RQD 3 , and guidelines to inform the design of our motion 60 4.2. Design of the Motor Learning System Table 4.2: Examples of coaching rules used for our rule-based motion analysis of the modern tennis forehand topspin. Each rule is linked to a motion phase and category. Rule ID Phase Cat. Description Main Features RP:PHA:GetReady Ready Position Phase The racket is held stable (in front of the body) in preparation for the next shot, allowing a quick reaction to either the forehand or backhand side [BS13]. Segmentation result RP:FOC:FaceOpponent Ready Position Focus Facing the opponent before initiating the swing helps anticipate their shot and optimal positioning [BS13]. View dir, HMD & Ballmachine pos RP:RKT:ClearView Ready Position Racket Positioning the racket head slightly below eye level in the ready position ensures a clear view of the opponent and the ball [BS13]. Racket pos, HMD pos BS:FOC:WatchBall Back- swing Focus Maintaining focus on the ball during the backswing supports the prediction of its trajectory and proper positioning [REC15, SW00, MA10]. View dir, Ball & HMD pos BS:TIM:Continuous Back- swing Timing Timing should be adjusted to avoid pausing the racket in the back, as this disrupts the swing’s flow, reduces power [KE04, RW11], and risks losing motion tracking. Racket speed BS:SWP:Pendulum Back- swing Swing Path The racket is raised during a modern forehand’s take-back to build power, whereas pendulum-like motions, where the racket is first dropped from the ready position and only then brought up, reduce momentum [REC15]. Classifier based on swing path shape BS:SWP:MultiSwing Back- swing Swing Path Starting the swing from a proper ready position avoids unnecessary racket movements that could disrupt preparation for the shot. Excessive racket movement can lead to bad timing or the need to rush the stroke. Classifier based on swing path shape BS:POS:OverRotation Back- swing Posture The upper body should not be turned too far. Over-rotation increases the swing path and can lead to poor timing [RW11, KE04, REC15]. HMD & Strong Hand pos, Swing dir FS:SWP:LowToHigh Forward Swing Swing Path A low-to-high forward swing path facilitates generating topspin. The racket is dropped below the intended contact point from the high backswing position and then moved upward through the ball [KE04, RW11, REC15, And09]. Classifier based on swing path shape FS:RKT:BelowWrist Forward Swing Racket Dropping the racket head below the wrist during the forward swings allows an upward acceleration of the racket head just before contact. This motion helps impart topspin by brushing the strings over the ball upward [RW11]. Racket rot CP:PHA:Hit Contact Point Phase The ball must be hit with the racket. CP CP:RKT:FaceVertical Contact Point Racket The orientation of the racket face contributes to the direction and spin imparted on the ball. For most topspin shots, the racket face should be nearly perpendicular to the court at impact [KE04, REC15]. Racket rot CP:COM:IdealContact Contact Point Comb. The ideal contact is the position and timing where one can comfortably hit the ball with power and control. It depends on the incoming ball and desired outcome, but should be on the dominant side, slightly in front of the body, and around waist height [RW11, And09]. CP, HMD pos, estimated waist height based on HMD CP:TIM:WaistLevel Contact Point Timing The ball should be hit around waist height, which can be adjusted through court positioning or bending the knees [And09]. CP, estimated waist height CP:TIM:InFront Contact Point Timing A contact in front of the body (usually slightly in front of the front foot) ensures sufficient space between the body and the ball while facilitating imparting power and control [RW11, REC15]. CP, HMD pose CP:TIM:DominantSide Contact Point Timing The forehand stroke has evolved to hit the ball on a tennis player’s dominant side when leveraging a proper stance, court position, and timing. Reaching across the body to strike the ball lessens power and control. CP, HMD pose FT:FOC:KeepStable Follow- Through Focus Sustained focus on the contact point throughout impact keeps the head still, supporting body stability and control at impact. Lifting the head too soon may change the posture and can negatively affect the shot [IAR+23, REC15, MA10]. View dir, HMD pos, CP FT:POS:AcrossBody Follow- Through Posture A long follow-through across the body helps to gradually decelerate the movement, reducing muscle strain and minimizing injury risks [KE04]. HMD & Strong Hand pos, Swing dir EX:CTL:HitTarget Execution Control Ball should hit inside the marked area on the other side of the net. Ball-court collision, Target EX:CTL:HitTopspin Execution Control Topspin should be imparted on the ball by utilizing an up-and-forward swing while focusing on the racket angle and wrist motion throughout contact. Outgoing ball spin 61 4. Methodology for Tennis Forehand Motion Learning in VR Table 4.3: Main components of our feedback design. Component Feedback Design Timing Terminal (after movement is completed and outcome is observable) Frequency Constant rate after every attempt Content Extrinsic feedback (Type: Knowledge of Results (KR) and Knowledge of Performance (KP); Role: Informative and Motivational) Mode Multimodal (auditory, visual, haptic) Figure 4.8: Snapshot from the motion replay that shows the entire captured ball trail. The recording ends with a short delay after the outcome of the shot is visible. learning methodology (G4, G6, G7, G14, G16, G17, G18). Table 4.3 summarizes the main components of our feedback administration, which are explained in more detail below. The realization of the feedback design is detailed in Section 5.5. Timing. Feedback is given after the forehand stroke is completed and its outcome observed. A brief delay after the outgoing ball lands allows the player to assess whether the shot successfully hit the intended target (see Figure 4.8). Terminal feedback is chosen for three main reasons: 1) the fast nature of a tennis forehand in the context of target training, where providing feedback mid-stroke could disrupt the movement and divert the player’s attention; 2) the technical limitations of our motion analysis method, which generates results only after a motion phase is completed; and 3) we want to accompany the feedback administration with a motion replay as a means for self-observation to support self-monitoring and reflection (G14). Concurrent feedback is only provided for KR. In future work, we plan to add aggregated feedback over multiple shots. Frequency. For our user study, we opted for constant feedback administered after each shot, as the training phase lasts only ten minutes. Content. After each forehand stroke, our system provides feedback to either positively reinforce performance improvements or inform about errors and give instructions on correcting them. An important aspect of feedback design is prioritizing individual interventions and feedback (G17) so as not to overwhelm users. Therefore, we opt for feedback that focuses on a single coaching rule at a time to guide the user step-by-step through the training process and provide more information on demand. The decision for which coaching rule feedback is provided is handled by the recommendation module, as detailed in Section 5.4, but is based mainly on the following elements: 62 4.2. Design of the Motor Learning System Figure 4.9: Snapshots from the motion replay. Aligned approximately with the sagittal, transverse, and frontal cardinal planes. The motion replay shows the captured joints (HMD, racket, and non-dominant hand) and the racket’s path, segmented into the detected motion phases. • Has the user improved based on the previous feedback? • Which coaching rule is rated worst weighted by additional factors such as priority? • Are there any relevant follow-up errors, as some motion errors lead to others? The type of feedback depends on the selected coaching rule. Our set of coaching rules evaluates KR and KP, whereby the focus of most lies on KP. The feedback’s role depends on the user’s performance and the previously administered feedback but focuses mainly on an informative role. In the case of corrective feedback, we try to administer prescriptive feedback in order to make the feedback actionable. If the user implements correction based on feedback in the next attempt or improvement is apparent, the feedback takes on a motivational role to positively reinforce the correction and enhance motivation (G6). More detailed explanations and motion analysis results are provided on demand to enhance transparency and interpretability (G7). In addition to the feedback around coaching rules, KR is presented during the target training (provided by Tennis Esports). Mode. As verbal feedback and instructions are often used in traditional coaching, we opted for an auditory component to provide guidance. A replay of the captured motion (see Figure 4.9) accompanies the auditive feedback to support self-monitoring and reflection (G14) by facilitating users to review their actions and link verbal cues to their performance. The motion replay can be coupled with visual feedback mechanics. In our case, we display the detected motion phases, with the phase referenced in the verbal feedback highlighted. Additionally, color coding of motion errors and performance metrics serves as a visual cue to complement the verbal feedback and direct the user’s attention. In order to realize a more inclusive design (G4), we adopted colorblind-friendly color schemes, variable feedback formulations including verbal cues and phrases used in traditional coaching, and supplemental verbal feedback with text and images. Gamification is applied in the form of performance scores to try to promote user engagement and motivation (G6). 63 4. Methodology for Tennis Forehand Motion Learning in VR To provide informative assessment (G16), all motion analysis results are accessible on demand. Finally, the following feedback features are already integrated into Tennis Esports: 1) haptic feedback is provided when the racket collides with the ball; 2) visual and auditive feedback is realized in response to any ball collisions; 3) whether the target is hit is indicated by an audio tone and color coding; and 4) display of the ball’s spin and speed. 64 CHAPTER 5 Implementation Hardware: Meta Quest 2, Tennis Esports Racket Handle Software: Tennis Esports (virtual environment and tennis simulation), Unity (game engine), ElevenLabs (audio generation) The implementation of our motor learning system is based on the hardware and software components listed above. On top of this framework, we implemented a rule-based motion analysis and feedback to realize the motor learning system. Figure 5.1 illustrates the implementation as a sequential pipeline of modules—a simplification, as some processes run in parallel or intervened. Each forehand swing is captured and preprocessed during target training in the mocap module. Simultaneously to the mocap process, features are selected, calculated, and estimated by the feature extraction module. Based on these features, the phase detection module segments the mocap data into the motion phases. This process begins when the system registers a racket-ball collision or identifies a miss. This segmentation is a preprocessing step to the subsequent rule-based motion analysis. The rule evaluation module initializes an evaluation process as soon as a motion phase is detected. This process analyses each coaching rule associated with the motion phase, stores the resulting performance scores, and aggregates them to calculate one representing the motion phase. The module calculates an overall performance score once all motion phases have been identified and analyzed. The diagnosis module waits until all running processes are completed or aborted before selecting a coaching rule to recommend to the user. Based on this rule, the feedback module generates feedback and presents it. Most computations happen before the action is completed to guarantee timely feedback. Mocap Feature Extraction DiagnosisRule Evaluation Motion Phase Detection Feedback Figure 5.1: Simplified diagram illustrating the sequence of modules. 65 5. Implementation 5.1 Upper Body Motion Capturing & Feature Extraction Mocap can generate large amounts of data, so it is not sensible to iterate separately for each calculation or coaching rule, as this would be computationally expensive and negatively impact performance. Feature extraction is necessary as part of our pipeline to realize the requirement of near-real-time feedback on the Meta Quest 2. Furthermore, the recorded data contains jitter and outliers, which must be accounted for to achieve a more precise analysis. Input handling, access to the device’s mocap data, and some preprocessing steps are performed by Tennis Esports and were not part of our implementation. The mocap module collects all available data and stores it for further processing. Additional features—such as data points on the racket as discussed in Section 4.2.1 or boolean relational features (e.g., Is topspin?, Is racket head above shoulder?)—are extracted and estimated in the feature extraction module. This module estimates, for instance, instantaneous velocities using the central difference approximation. The estimation of anthropometric features and the extraction of critical points are explained in more detail below. 5.1.1 Anthropometric Feature Estimation Tennis Esports provides the estimated body height of users. In addition, waist and shoulder heights must be estimated to evaluate the contact point and follow through (e.g., for coaching rule CP:TIM:WaistLevel). To estimate these values as functions of the body height, we consider body proportions used in modern figure drawing [She13, Bar15] and experimentally derived ratios that approximate lengths of body segments [Ren72]. We define waist height as the vertical distance from the floor to the navel when the individual is standing upright. Shoulder height is defined as the vertical distance from the floor to any acromial landmark, located at the tip of each shoulder when the individual is standing upright. Let b denote the body height of an individual when standing upright, measured (or, in our case, estimated) as the vertical distance from the floor to the top of the head. The vertical length of a human head, h, measured from the chin up, is approximated as a fraction of b (see Equation 5.1), whereby we assume that the average head-to-body ratio in adults, regardless of sex, is 1:7.5 [Bar15]. This is a rough approximation, as body proportions depend on factors such as head shape [LHSI06], age, and gender, and are therefore not easily generalized. The resulting factor of 0.13 closely matches the value of 0.13 measured by Renato Contini for a group of male U.S. citizens published in 1972, but deviates from the factor for female participants of 0.125. The estimation for waist height in Equation 5.2 is additionally based on the assumption that the waist is located approximately three head-lengths below the top of the head [She13, Bar15]. Finally, the shoulder height in Equation 5.3 is derived assuming that the vertical distance between the chin and the acromial landmark is roughly a third of h. h ≈ f̂(b) = b 7.5 = b ∗ 0.13 (5.1) 66 5.1. Upper Body Motion Capturing & Feature Extraction Table 5.1: Evaluation of the waist height estimator given in Equation 5.2. Database N Measure Variable Names (height, waist height) MAE SE MAPE ANSUR II 6068 stature, waistheightomphalion 17.93 mm 0.177 mm 1.74% 3D Measure 4454 height, waist_height_preferred 30.15 mm 0.326 mm 2.95% Table 5.2: Evaluation of the shoulder height estimator given in Equation 5.3. Database N Measure Variable Names (height, shoulder height) MAE SE MAPE ANSUR II 6068 stature, acromialheight 13.00 mm 0.123 mm 0.93% 3D Measure 4326 height, acromion_height 14.40 mm 0.197 mm 1.03% waistheight ≈ b − (h ∗ 3) = b ∗ 0.6 (5.2) shoulderheight ≈ b − (h + h 3 ) = b ∗ 0.82 (5.3) The ANSUR II Dataset [Paq09], a database containing anthropometric measurements of 4,082 male and 1,986 female subjects representing the United States Army and stemming from a survey published in 2012, and a database (referred to by us as ‘3D Measure’) using surface anthropometry based on the CAESAR dataset [RDP99], containing 4464 subjects and published by Andy R. Terrel on kaggel1, are used to evaluate our estimators. For the total body height b, we use the variable stature from the ANSUR II dataset and the variable height from the 3D Measure dataset. Based on these values, we calculated waist and shoulder height and compared our estimated values with the respective actual measurements from the datasets. The variable names of the actual measurements we compared our estimations against are stated in Table 5.1 and Table 5.2. These tables also report the mean absolute error (MAE), standard error (SE) calculated based on the absolute errors, and the mean absolute percentage error (MAPE) of our estimators. As some measurements in the 3D Measure dataset are missing, we discarded the affected entries, resulting in variable N for this dataset. 5.1.2 Critical Points and Curvature Critical points in calculus are traditionally defined in the context of continuous functions and their derivatives. A critical point of a function f(x) is an argument x0 of f where the function’s derivative is zero f ′(x0) = 0 or undefined (i.e., where f is not differentiable). 13-D Anthropometry Measurements of Human Body on kaggle https://www.kaggle. com/datasets/thedevastator/3-d-anthropometry-measurements-of-human-body-sur (01/05/2025) 67 5. Implementation Table 5.3: Conditions to detect extrema and inflection points in our discrete data. Critical Point Condition Minima xi−1 > xi < xi+1 Maxima xi−1 < xi > xi+1 Minima on Plateaus xi−1 ≥ xi < xi+1 ∨ xi−1 > xi ≤ xi+1 Maxima on Plateaus xi−1 ≤ xi > xi+1 ∨ xi−1 < xi ≥ xi+1 Inflection Δ2xi−1 ∗ Δ2xi < 0 Critical points mark points where the slope of f changes and include stationary points— local and global extrema—as well as inflection points—points where the concavity changes (i.e., the curvature changes sign). While human motion is continuous, the motion-capturing process involves sampling the motion at a finite rate, resulting in discrete mocap data, where the mathematical definition of critical points does not apply. Nevertheless, the concept of critical points remains useful for feature extraction. However, the approach must either adapt to the discrete nature of the data or involve estimating a continuous approximation of the discrete data, enabling the application of the mathematical definition. We chose the former approach. For a discrete time series X = (x1, x2, ..., xn), a data point xi is considered critical if it satisfies one of the conditions in Table 5.3. The definitions are based on direct numeric comparisons and approximations of derivatives for 1D data points using finite differences. The first difference for xi is given by Equation 5.4, and the second-order central difference is given by Equation 5.5. For the special case of plateaus (i.e., consecutive points with the same value), we consider both the start and end points critical. This definition of critical points is directly applicable for time series where xi is one dimensional, i.e., scalar features such as racket speed. For multi-dimensional data such as position data, the detection is done for each component separately (i.e., axis-wise) or for the magnitude. The axis-wise analysis allows insight into the motion in each direction. We apply filter mechanisms in our implementation to prevent the detection of too many critical points due to noise in the data (e.g., due to tracking jitter or minor irrelevant movements). These mechanisms allow the rejection of critical points based on conditions such as a minimum frame gap between critical points, minimum difference in value, or interpolating data in some instances. They are configurable for each feature individually. Δxi = xi+1 − xi (5.4) Δ2xi = Δxi − Δxi−1 = (xi+1 − xi) − (xi − xi−1) = xi+1 − 2xi + xi−1 (5.5) In addition to the critical points, information on the curvature of the time series can be extracted by looking at the second difference at each point xi, as shown in Table 5.4. This information is mainly used to analyze the shape of the racket path when viewed as 68 5.2. Upper Body Motion Phase Segmentation Table 5.4: Conditions to define the curvature in our discrete data. Curvature Condition Up Concave Δ2xi > 0 Down Concave Δ2xi < 0 No Concavity (linear) Δ2xi = 0 Figure 5.2: Schematic representation of the motion phase’s temporal extends and seg- mentation process. a piecewise linear curve. Additionally, intersection points of the racket path projected on a 2D plane (e.g., when viewed along a primary axis) are detected as features. 2D intersection points are defined as the points where 2D line segments (the direction vectors constructed between pairs of consecutive 2D position data points) intersect. 5.2 Upper Body Motion Phase Segmentation The phase detection module divides the recorded forehand stroke into five distinct upper- body motion phases: Ready Position (RP), Backswing (BS), Forward Swing (FS), Contact Point (CP), and Follow Through (FT), as described in Section 4.2.3 and illustrated in Figure 5.2. Each motion phase is defined by its temporal extent, specified as the two boundary frames marking the start and end of the phase. These boundary frames are denoted as an interval [FrameStart, FrameEnd]. The segmentation process begins with detecting the contact point of the racket with the ball using collision detection. The contact point is represented by the interval [FrameCP, FrameCP + 1], whereby FrameCP also marks the end of the forward swing and FrameCP + 1 marks the start of the follow-through. Contact Point Detection. The detection of the contact point, implemented through collision detection, is an existing feature of Tennis Esports and was not developed within the scope of this master’s thesis. Two possible cases are taken into account by the contact point detection; either the racket hits the ball or misses. If the ball is successfully hit, the contact point detection provides the two successive frames in which the collision occurs and the actual time at which the racket strikes the ball. If the racket barely misses the ball, the two consecutive frames where the racket came closest to colliding are 69 5. Implementation RP FS BS CP FT Figure 5.3: 3D racket positions of a forehand swing, segmented into our motion phases. The follow-through is highlighted in orange. returned. Otherwise, the estimated optimal frame span to hit the ball is returned. The segmentation of the other motion phases is based on the contact point and extends in both temporal directions from its boundary frames (depicted in Figure 5.2). The end frame of the follow-through is estimated by moving forward in time from the contact point (CP → FT) and analyzing the motion features along the way to find a suitable upper bound. Similarly, moving backward in time from the contact point, the start frames of the Forward Swing, Backswing, and Ready Position are sequentially estimated (RP ← BS ← FS ← CP), with each phase estimated based on the boundaries of the previously identified phase. The segmentation process can be described as a boundary optimization, where generous bounds are set as initial values. The unknown boundary frame is then refined by analyzing motion features such as critical points and curvature to find a more suitable bound. The algorithm for the follow-through segmentation is described below. The segmentation of the remaining motion phases works similarly, however, their lower bounds are estimated instead of the upper bound. They mainly utilize the swing direction and critical points of the racket’s speed and position as features. Figure 5.3 demonstrates our motion phase segmentation on a forehand swing where the captured racket positions are plotted and colored based on the associated motion phase. Follow Through Detection. We defined the starting frame of the follow-through as the frame immediately following the contact point (FrameCP + 1), making our segmentation of the FT dependent on the contact point detection. Once the contact point is identified, the algorithm outlined in the flowchart in Figure 5.4 is applied to 70 5.2. Upper Body Motion Phase Segmentation Start Data Output Interval ftBounds End Set: ftBounds as [cpBounds.end, motion.end] Set: ftBounds as [cpBounds.end, globalMinimum.end] Set: ftBounds as [cpBounds.end, minima[0].frame] Get all frames within ftBounds where the racket's speed relative to the HMD is at a minimum (for the whole motion). Set: minima as features.RelativeRacketSpeed .Minima.Where(frame > cpBounds.end) Data Input MocapData motion, MotionFeatures features, Interval cpBounds Where minima found? (!minima.IsEmpty) Get the global minimum (first occurrence of lowest speed) Set: globalMinimum as minima.FindGlobalMinimum Is cp.racketSpeed ≤ globalMinimum.racketSpeed? Filter the minima based on racket speed, removing those that are outside a specific range (includes the globalMinimum). Set: minima as minima.Where(minimum. racketSpeed elem of [globalMinimum. RacketSpeed, threshold]) Figure 5.4: Flowchart outlining the segmentation of the follow-through. The boundary frames of the contact point are denoted by cpBounds. estimate the end frame. The objective is to determine when the racket decelerates before transitioning back to the ready position. Initially, the end frame is set to the final frame of the recorded motion, which serves as the largest possible upper bound. The algorithm refines this upper bound by analyzing the frames where the racket top speed (relative to the HMD) reaches a local or global minimum after the contact point, as illustrated in Figure 5.5. In the first step, the algorithm retrieves all frames where the relative racket-top speed is at a minimum within the initial bounds. These critical points are pre-identified by the feature detection module and are only accessed and filtered by the algorithm. If no minimum lies within the bounds, the racket did not decelerate during the recorded motion. In such cases, the algorithm returns the initial bounds, as the recording duration was insufficient to capture the complete FT. If minima are found, the algorithm identifies the first frame corresponding to the lowest relative racket top speed (i.e., the global minimum) within the initial bounds. This frame represents the moment the racket reaches its minimal velocity after decelerating from the contact point. The upper bound 71 5. Implementation frames 0 5 0 10 re la tiv e ra ck et s pe ed m /s 15 100 200 300 400 Fo llo w -T hr ou gh s ta rt Fo llo w -T hr ou gh e nd Figure 5.5: The plot shows the captured racket speed relative to the HMD corresponding to the same swing depicted in Figure 5.3. The dots mark the local and global minima after the contact point and the two vertical markers represent the temporal bounds of the follow-through, identified using the phase segmentation algorithm described in the text. is then updated to this frame, as it provides a more precise segmentation of the FT compared to the initial bounds. While testing our phase segmentation, we observed that some players do not fully decelerate the racket during the FT phase and only bring the racket to a complete stop when returning to the ready position. In these cases (Figure 5.5 illustrates such a chase), our algorithm segments not only the follow-through but also the recovery phase (or part thereof). The third step is introduced to handle these cases and find a more suitable estimate. This step filters the previously identified minima based on whether their corresponding relative racket speeds fall within an acceptable range. The acceptable range involves a custom thresholding mechanism that is not detailed here. Minima that fall outside this range are removed from the collection. By design, the collection of minima always retains the previously determined global minimum. The final end frame of the FT is then set to the first minimum of this collection. Figure 5.3 and Figure 5.5 demonstrate the estimated follow-through bounds of a captured forward swing. 5.3 Upper Body Motion Analysis Via Coaching Rules As soon as a motion phase is detected, the rule evaluation module asynchronously evaluates the corresponding coaching rules. Evaluation may involve a single data point, a range of features, or an entire time series, depending on the coaching rule. 72 5.3. Upper Body Motion Analysis Via Coaching Rules Evaluate Single Frame. Coaching rules that evaluate the mocap data of a single frame or specific point in time (e.g., data interpolated between successive frames, as in the case of the contact point). An example is the rule CP:TIM:WaistLevel, which evaluates whether the contact point height is around waist height. Evaluate Time Series. Coaching rules that separately assess each frame within a specified temporal range (e.g., an entire motion phase or just a range thereof) and aggregate the results. Aggregation methods to obtain a single result vary depending on the coaching rule, including taking the minimal/maximal error or calculating the (weighted) arithmetic mean of absolute or percentage errors. Coaching rules using this form of evaluation are, for example, RP:FOC:FaceOpponent, BS:FOC:WatchBall, and FT:FOC:KeepStable. Evaluate Extracted Features. Coaching rules that evaluate extracted features, such as critical points, instead of specific frames. An example is the rule FT:POS:AcrossBody, which assesses whether the racket crosses the estimated sagittal plane from the dominant to the non-dominant side and reaches a minimal unilateral distance (threshold) during the temporal bounds of the follow-through. This assessment uses the global extrema of the racket position along the estimated mediolateral axis. Other examples are rules that assess the shape of the racket’s swing path. Each coaching rule results in a single performance score ∈ [0, 100]. A score of 100% indicates that the coaching rule is fully satisfied, while a lower score indicates a motion error. The lower the score, the more severe the motion error. The performance score for a given coaching rule is determined in one of three ways (excluding combinatorial rules): distance-based as described below, as a binary classifier, or a hybrid method (i.e., a combination of the other methods). A coaching rule using a hybrid method is BS:TIM:Continuous, which utilizes a classifier to determine whether the swing is continuous. If not, the total duration of pauses is evaluated and compared against a tolerance. Performance metrics for each motion phase are computed by aggregating the scores of all relevant coaching rules using a weighted arithmetic average. Multiple aspects influence the weights, such as the priority of specific rules and balancing biases. Finally, based on the analysis of all coaching rules, an overall performance score is generated that rates the technique for the entire swing. 5.3.1 Distance-based Implementation The performance score is evaluated by comparing the actual motion to a predefined optimum or set of thresholds. The optimum may be defined as a specific value, a tolerance range, or a threshold that a feature must either exceed or remain below, such as a cardinal plane. A distance measure such as Euclidean or angular distance quantifies the motion error, which is then used to derive the performance score. Examples of distance-based rules include BS:FOC:WatchBall, RP:RKT:ClearView, CP:RKT:FaceVertical, and FT:POS:AcrossBody. The implementation of the coaching rule CP:RKT:FaceVertical is presented in Algorithm 5.1. This rule evaluates whether the racket face is ‘nearly’ vertical to the court at impact by comparing the actual angle of the racket face to a tolerance range. Knudson and Elliott 73 5. Implementation [KE04] define ‘nearly’ vertical as an angle to vertical of less than 10◦ in both directions. We have chosen larger tolerances to account for potential inaccuracies. Algorithm 5.1: Evaluating the performance score for CP:RKT:FaceVertical (racket face is nearly vertical at impact). The positive y-axis corresponds to the upward direction. Input : Vector3 racketF aceNormal: the normal vector of the racket face on the hitting side at impact Vector3 courtSideAwareF orward: forward axis pointing from the player towards the net defined as forward · Sign(netCenterP os − playerP os).z) Tolerance [a, b]: tolerance with a ≤ 0◦ ≤ b; if the motion error lies within this range, the rule is classified as true (i.e., defines the range of angles where the racket face is interpreted as ‘nearly’ vertical) float majorError: used to map the calculated motion error to a score; Abs(motionError) ≥ majorError results in a score of 0% Output : float score: returns the performance score ∈ [0, 100] 1 MainFunction float EvaluateScore(Vector3 racketF aceNormal, Vector3 courtSideAwareF orward, Tolerance [a, b], float majorError) 2 float motionError ← MotionError(racketF aceNormal, courtSideAwareF orward, [a, b], majorError) 3 float errorInP ercent ← motionError majorError /* error as percentage given by motionError−0 majorError−0 */ 4 float score ← (1.0f − errorInP ercent) · 100 5 return score 6 float MotionError(Vector3 r⃗, Vector3 f⃗ , Tolerance [a, b], float majorError) 7 if Angle(r⃗, f⃗) ≥ 90 then 8 return majorError /* ball was not hit towards the net */ 9 end 10 float angleT oV ertical ← Pitch(r⃗) /* 0◦ means the racket face is vertical */ 11 if a ≤ angleT oV ertical ≤ b then /* no motion error as the racket face angle lies within the tolerance range (i.e., racket face is vertical or ‘nearly’ vertical) */ 12 return 0 13 end /* evaluates the motion error as minimal absolute distance between the angleToVertical and the tolerances */ 14 float motionError ← Min(Abs(angleT oV ertical − a), Abs(angleT oV ertical − b)) 15 return Clamp(motionError, 0, majorError) 16 float Angle(Vector3 d1⃗, Vector3 d2⃗) 17 return arccos(d1⃗ · d2⃗) · RadToDeg 18 float Pitch(Vector3 d⃗) 19 if ∥d⃗∥ = 0 then 20 return 0 21 end 22 return arcsin (︂ d⃗y ∥d⃗∥ )︂ · RadToDeg 5.4 Diagnosis In our design, we specified that feedback in our system should focus on a single coaching rule at a time. The diagnosis module realizes this foundation of our feedback design. It identifies the coaching rule that the feedback should relate to and determines the nature of the auditory response—whether it should positively reinforce performance improvements or address errors by giving instructions for correction. Selecting a coaching rule involves considering various factors to ensure the provision of valuable feedback, including adherence to coaching rules, improvements in user performance, priority ratings, and follow-up errors. In order to incorporate these factors in the recommendation selection process, the diagnosis module relies on the motion analysis results as input—which enclose 74 5.4. Diagnosis End Start Data Output Rule newSelection EFeedbackType feedbackType Data Input Rule previousSelection EvaluationResult[] evaluationResult Set: Select CP:PHA:Hit Set: Short verbal cues Set: Retain selection Set: Corrective feedback Set: Positive feedback Set: Retain selection Set: Corrective feedback Set: Select search result Find most relevant rule (see Algorithm 5.3.) Did the user hit the ball? Retain previousSelection (see Algorithm 5.2.) Has user improved previousSelection? Yes Yes Yes No No No Figure 5.6: Flowchart outlining the diagnosis of a coaching rule. a collection of evaluated coaching rules and data from the user’s previous stroke to assess improvement. Hence, the diagnosis module only operates once the user has executed the forehand stroke and the evaluation module has completed its analysis. The decisions involved in the recommendation selection process are illustrated in Figure 5.6 through a flowchart and are explained in more detail in the subsequent paragraphs of this section. First Decision Point: Did the user hit the ball? While our system also analyses coaching rules in case the user misses the ball, this analysis is less accurate as it is based on either the determined ideal contact point or an estimated contact point in the event of nearly missing the ball by a small margin. Furthermore, our system cannot identify why a user missed the ball, making constructive feedback on improving their technique to hit successfully impossible. In case of a miss, short verbal cues for coaching rule CP:PHA:Hit are provided (e.g., ‘Try again!’, ‘You’ve got this’). 75 5. Implementation Second & Third Decision Point: Should the previous coaching rule selection be retained and has the user improved? To determine whether the previous selection should be retained, we use Algorithm 5.2. This function takes as input the coaching rule recommended for the previous forehand shot, the motion analysis results of the current forehand shot, and an integer that counts how often the same rule has already been recommended in series. As the first step, the function checks whether the previous Algorithm 5.2: Determines if the previously selected coaching rule can be reapplied. Input : Rule previousSelection: The coaching rule recommended to the user for the previous forehand swing Rule[] evaluatedRules: The evaluated coaching rules for the current forehand swing uint count: How often the selected rule was recommended in a row Output : Returns true when the previousSelection can be reapplied 1 bool RetainPreviousSelection(Rule previousSelection, Rule[] evaluatedRules, uint count) 2 if !previousSelection.IsV alid then 3 return false 4 end 5 if count < 2 then 6 return true 7 end 8 Rule current = evaluatedRules.Where(rule ⇒ rule.Id.Equals(previousSelection.Id)) 9 if current.Score > previousSelection.Score then 10 return true 11 end 12 return false selection is valid. If it is invalid, in other words, if no previous recommendation exists or the corresponding evaluation of the coaching rule did not lead to valid results, feedback cannot relate to the same coaching rule, and the function returns false. On the contrary, the function continues. If possible, the system selects the same rule at least twice in a row to let the user know whether or not the changes to their technique based on the previous feedback had a positive effect. Accordingly, the function returns true if the previously selected rule has not been reapplied yet. Otherwise, the function checks whether the user has improved the performance score of the respective coaching rule compared to the last forehand shot. If, on the one hand, there is no performance improvement, the system has already addressed the same error to the user with two different instructions without achieving a positive result. For a future improved version of our system, this could be the place to recommend customized exercises that concentrate on the specific coaching rule. However, for now, we decided that reapplying negative feedback for the same error yet again is not necessary as it might be frustrating for the user. Thus, the function returns false in this case, and the recommendation module is free to search for a more fitting recommendation. On the other hand, if the user has improved compared to the last shot, the recommendation should be retained with a positive audio response, regardless of how often the rule has already been recommended in a series, and the function returns true. This decision to prefer positive feedback on the same coaching rule over addressing another error reduces the risk of negative feedback’s overabundance and hopefully positively reinforces changes that lead to performance improvements. If Algorithm 5.2 returns true, the previous recommendation is reapplied. Otherwise, the recommendation module needs to select another coaching rule as the recommendation. 76 5.5. Multimodal Feedback Select Recommendation. The selection of the most relevant coaching rule is depicted in Algorithm 5.3. The algorithm searches for the coaching rule with the highest rank, which combines the rule’s error and priority as defined from line 19. When found, the algorithm checks whether the selected rule is a follow-up error of another rule. Since some mistakes lead to others—in tennis, for example, a missed ball might result from lousy ball focus or timing—the user should be made aware of the error source, not its symptoms. Therefore, if the selected rule has known error sources and they are not evaluated with a perfect score, the rule representing the error source with the highest rank is recommended instead. Algorithm 5.3: Select the coaching rule based on the highest rank [rank = rule.Error + weight ∗ normalized(rule.P riority)] and error sources (rules that might lead to follow-up errors in later rules). Input : Rule[] evaluatedRules: The evaluated coaching rules for the current forehand swing uint weight: Controls how much influence the normalizedPriority of a coaching rule has on the calculated rank (weight ∈ [0, 100]). Output : Rule recommendation: returns the selected coaching rule as recommendation 1 MainFunction Rule SelectRecommendation(Rule[] evaluatedRules, uint weight) 2 Rule selection ← SelectHighestRankedRule(evaluatedRules, weight) 3 if !selection.IsV alid or !selection.IsConsequentialError or all rule.ErrorSources have perfect score then 4 return selection 5 end 6 selection ← search recursively through rule.ErrorSources and return the one with the highest rank 7 return selection 8 Rule SelectHighestRankedRule(Rule[] evaluatedRules, uint weight) 9 Rule selection ← default 10 float highestP riority ← evaluatedRules.Max(rule ⇒ rule.P riority) 11 float highestRank ← float.MinValue 12 foreach rule in evaluatedRules do 13 float rank ← CalculateRank(rule, weight, highestP riority) if rank > highestRank then 14 highestRank ← rank 15 selection ← rule 16 end 17 end 18 return selection 19 float CalculateRank(Rule rule, uint weight, float highestPriority) 20 if rule.Error ≤ 0 then 21 return 0 22 end 23 float normalizedP riority ← rule.P riority/highestP riority 24 return rule.Error + (weight ∗ normalizedP riority) 5.5 Multimodal Feedback After a forehand stroke is completed, the target training pauses and multimodal feedback is administered to the user. The feedback consists of a motion replay, verbal feedback, an interactive UI that presents textual feedback and motion analysis results, and visual cues. 77 5. Implementation Figure 5.7: The interactive UI below the motion replay. The frame’s corresponding motion phase is highlighted. 5.5.1 Motion Replay Self-reflection is a key element of training. To enable self-reflection in our training setup, we decided to include a replay of the user’s motion, as it allows users to observe their forehand technique and assess their performance. The captured motion is played back as a 3D animation during the feedback cycle. The animation replays the captured motion of the HMD, racket, and non-dominant hand, as shown in Figure 4.9. We decided against showing a full-body avatar in the motion replay, as inverse kinematics would be necessary to animate it, resulting in a potentially inaccurate representation of the actual motion. In addition to the animated joints, the racket’s path, segmented into the detected motion phases, is drawn. The motion replay also contains a playback of the ball, accompanied by the ball’s trajectory, marker for the first bounce, and marker for the contact point, depicted in Figure 4.8. An interactive UI below the motion replay, as shown in Figure 5.7, highlights the current replayed motion phase. The UI provides options to select a motion phase in order to jump to its last frame and a button to pause and resume the replay at any frame. If activated, color coding (the color scheme can be found in Figure 5.11) in the replay and UI highlights errors in the motion and indicates performance scores (Figure 5.8). The color coding on the animated joints indicates at which frame a motion error occurs and how severe the error is. The color coding in the UI and motion phases indicates the corresponding performance score. Color coding is active during the training portion of our user study. 5.5.2 Verbal Feedback The main verbal feedback component is based on the selected coaching rule. Each coaching rule has five different formulations, as outlined below. The first three provide informative instructions on a motion error; the other two are positive reinforcements in case of improvement since the last attempt. The corresponding audio files are pre-generated with text-to-speech using ElevenLabs. If the user misses the ball, no verbal feedback is provided for the selected coaching rule. Instead, short verbal phrases indicating a miss are replayed. If the user achieves an overall score of more than 90, audio acknowledging 78 5.5. Multimodal Feedback Figure 5.8: Snapshot of a forehand swing with a pendulum-like motion, violating coaching rule BS:SWP:Pendulum. The image captures a frame during the backswing. The racket’s path during the backswing is highlighted to reflect the administered feedback, with yellow indicating a mediocre performance score of this motion phase. The racket is color-coded to highlight the motion error (specifically, the racket head dropped too far during the backswing). Below the replay, the UI displays the color-coded motion phases, with the backswing selected to match the current frame and active motion phase. this achievement is played before the feedback of the coaching rule. This secondary verbal feedback component is only played if the user scored lower than 90 in the previous attempt to reduce redundancy. Example phrases in case of a missed ball or score reached are provided in Table 5.5. Examples of the five verbal feedback components for coaching rules are given in Table 5.6. Detailed Explanation (LONG). A description of the coaching rule and associated motion error, often including how to correct the error or the reasoning behind the rule. It is an introductory guide to the coaching rule, presenting detailed insights and establishing verbal cues or analogies (e.g., ‘C-shaped swing’ or ‘drawing a C’ [REC15]). This feedback description is given the first time the coaching rule is recommended due to a motion error. On motion errors thereafter, only a short description is prompted. The LONG audio can be replayed via the UI on demand. Short Descriptions (SHORT1 and SHORT2). Two distinct short prompts provid- ing descriptive or prescriptive feedback for the coaching rule. They are selected in alternating order. We have at least two short formulations per rule to increase the variability and reduce the repetitiveness of verbal feedback. Positive Response (IMPROVE). Feedback on progress since the last attempt. The response may include an informative element highlighting how to refine the motion or a motivational element encouraging continued effort. Praise for Perfect Execution (PERFECT). Positive feedback acknowledging that the coaching rule is fully satisfied (i.e., the user achieves a 100% score for the rule). 79 5. Implementation Table 5.5: Examples of the verbal feedback formulations provided when the user misses the ball or reaches a certain overall score. Prompt Audio Message Ball Missed ‘Try again’ ‘You’ve got this. Give it another shot’ ‘Focus on hitting the ball’ [90,100)% ‘Good job! You’ve exceeded 90%’ ‘Incredible, you’ve hit more than 95%’ ‘Almost perfect’ 100% ‘Awesome, you’ve reached a perfect score’ ‘Incredible performance, well done’ ‘Congratulations, you’ve reached 100%’ Table 5.6: Examples of the verbal feedback formulations depending on users improvement for coaching rules BS:FOC:WatchBall, CP:TIM:WaistLevel, FT:POS:AcrossBody, and FS:SWP:LowToHigh. Prompt Audio Message LONG ‘Focus on the ball as you take your racket back. It helps you predict its path and adjust your position.’ ‘Hit the ball at waist height for the most comfortable and powerful forehand. You can control the impact height by your court position or by bending your knees.’ ‘Follow Through across your body and finish over your opposite shoulder. Avoid stopping short! ’ ‘Draw a C with your racket as you swing. From your high backswing, drop the racket head below your wrist and the ball, to swing up for best topspin forehand. With this swing, you can use momentum to generate power.’ SHORT1 ‘Keep your eyes on the ball’ ‘Hit the ball at waist height for the most comfortable and powerful forehand’ ‘Follow through across your body.’ ‘Drop the racket head and then swing up forming a C-shaped path.’ SHORT2 ‘Track the ball with your eyes as soon as it’s shot.’ ‘Try to hit the ball around waist height’ ‘Follow through all the way across your body.’ ‘From your high backswing, drop the racket head below the ball, to swing up for topspin.’ IMPROVE ‘Your focus during the backswing got better. Keep it up’ ‘Keep working on your ideal contact point. You’re improving.’ ‘Keep going. Bring the racket all the way across you body and don’t stop short.’ ‘Drop the racket even further to be able to hit the ball with an upward motion’ PERFECT ‘Great ball focus during the backswing.’ ‘Fantastic! You’ve hit the ball around waist height.’ ‘Well done. Your follow through ended up across the body.’ ‘Great improvement! ’ 80 5.5. Multimodal Feedback Figure 5.9: The main page of the UI feedback panel. The top row shows the overall performance and how many rules were performed correctly (note: this does not reflect all rules in our system). Small arrows indicate score changes since the previous attempt. Below that, feedback for the recommended coaching rule is displayed, including its score accompanied by color coding, a brief description, and an illustrative image. 5.5.3 Interactive UI Figure 5.9 shows the interactive UI panel placed to the left of the motion replay (as seen in Figure 4.2). It displays the overall performance score alongside the recommended coaching rule’s performance metric and textual description. The textual description is accompanied by an image depicting correct execution of the coaching rule (Figure 4.7b), sometimes additionally illustrate a possible motion error (Figure 4.7a). Small arrows next to the performance scores indicate whether the score has improved or declined since the last attempt. By selecting “Detailed Stroke Analysis” at the bottom, users can view detailed results of the motion analysis and description of all motion phases and coaching rules on demand, as illustrated in Figure 5.10. 5.5.4 Color Coding Color coding is a visual aid in the motion replay and indicates performance scores. In the UI, the color coding is supplemented by the actual numerical score or an alternative representation in the form of symbols. These icons (e.g., checkmarks) provide a quick overview of performance, indicating whether a coaching rule has been fully met, is mediocre, or has a low score (see Figure 5.11). For the color coding palette, we selected 81 5. Implementation Figure 5.10: The secondary page of the UI feedback panel, which users can switch to on-demand. While the top row remains identical to the main page, the lower section provides a more detailed performance breakdown. The left column displays performance metrics for individual motion phases. Users can select a phase to explore its coaching rules. The middle column lists the coaching rules for the selected motion phase, ordered by their score, with the lowest-scoring rule at the top. Users can select each coaching rule. The right column shows details for the selected coaching rule, similar to the main page but with the additional options to view a more detailed description or replay its verbal long description (LONG) audio file as needed. #CC4137 [0%,25%) [25%,50%) [50%,75%) [75%,100%) [100%] #E3870E #D4B922 #7DAD34 #20A145 Figure 5.11: Color palette for color coding motion errors and performance scores. The intervals indicate how score ranges are assigned to specific colors. five colors ranging from green (representing high scores) to red (denoting low scores), as these colors are sometimes associated with “bad” and “good” (depending on the region and culture). The color scheme is shown in Figure Figure 5.11. To ensure accessibility, we used Adobe’s color wheel and accessibility tools to design our palette2. 2Adobe Color https://color.adobe.com/create/color-accessibility (01/20/2025) 82 CHAPTER 6 Evaluation & Results This chapter describes the evaluation methodologies we applied to test our system and to answer research questions RQE 1–4 (see Section 1.3). It also details the results. We have adopted an iterative expert evaluation process throughout the implementation of our system to ensure that our implementation and coaching rules work as intended. After finalizing the project, we conducted a single group pre-and post-test user study (N = 26) with a preceding small pilot study (Np = 2) to gain insights into design considerations, user experience aspects, and the effectiveness of our approach. 6.1 Iterative Expert Evaluation During the implementation of our system, we regularly invited tennis coaches and players to test our system and provide feedback. Part of the feedback was then adopted before the coaches tested the application again. This iterative evaluation and implementation process was critical to ensure that the implemented coaching rules aligned with the meaning the coaches intended when formulating the rules with us and that they were assessed correctly. It also allowed us to tweak and adjust any thresholds to balance the difficulty levels of coaching rules and performance scores (e.g., by adjusting them to be more or less lenient). The tennis coaches also helped us prioritize specific coaching rules over others to provide meaningful and actionable feedback. Thresholds and prioritization weights were adjusted multiple times over this iterative process and involved the experience, knowledge, and opinion of multiple tennis coaches and players. 6.2 User Study Our user study is designed to address our evaluation-oriented research questions RQE 1–4 (described in Section 1.3) and primarily focuses on gathering qualitative insights into participants’ experiences, opinions, feedback, and suggestions. These insights aim to 83 6. Evaluation & Results refine our approach and inform the design of future VR-based training programs. In addition, hypotheses H1–4 are investigated to assess our VR tennis training methodology concerning users’ performance improvement, motivation to play tennis, and confidence in their tennis skills: H1 Participants’ motivation to practice tennis after the VR training session is greater than their baseline motivation levels assessed during the pre-test. H2 The reported confidence levels are higher after the training than before. H3 Adherence to coaching rules increases when comparing performance tests conducted before and after training. H4 The calculated performance scores are higher in the post-test compared to the pre-test baseline. The study follows a within-subjects pretest-posttest design to compare measure- ments taken before and after our VR training via surveys and short performance tests. Survey evaluation follows a mixed-method approach, incorporating quantitative anal- ysis of Likert scales and qualitative coding of open-ended questions (OEQ). An overview of how each research question is evaluated is presented in Table 6.1. The main user study was conducted under supervision and guided to ensure that participants could fully experience the VR training without getting stuck for too long due to technical issues or unclear instructions. Participants’ questions were addressed throughout the study, and assistance was provided when necessary but kept to a minimum. When participants inquired about coaching rules or their technique, the supervisor initially directed them to relevant in-application resources (e.g., by pointing out images or more detailed expla- nations in the UI) before resorting to alternative explanations. When interpreting our results, it must be considered that such interventions may affect the learning effect and adherence to coaching rules measured during the performance tests. Notably, this study does not aim to evaluate the training’s effectiveness, knowledge retention, or skill transfer to the real world. The assessment of performance changes, regardless of their cause, solely serves as an initial indicator of short-term learning effects, and were included as adverse effects, signal potential issues with the training approach. At the same time, improvements indicate a short-term learning effect, suggesting potential effectiveness and warranting further investigation using a control group to identify their cause (i.e., to differentiate whether performance improvements are caused by the intervention or other factors such as natural improvement over time). 6.2.1 Pilot Study Before the main user study, we performed a small pilot study (Np = 2) to identify potential issues and refine the setup of the experiment. This led to three main changes. (1) The number of balls in the performance test doubled from 8 to 16. (2) We decreased the duration of the VR tennis training from 12 to 10 minutes. (3) We implemented a mechanism in the tutorials that prevents participants from advancing to the next step via input while audio instructions are still playing, ensuring that all participants receive the same information. As one pilot study participant has a slight deuteranomaly (red-green 84 6.2. User Study Table 6.1: Overview of the experimental procedure, including the methodology used to address each evaluation-oriented research question (see Section 1.3). The full questionnaire can be found in Appendix A. Open-ended questions are abbreviated with OEQ. Research Question Methodology Description RQE 1 Feedback Helpfulness & Preference Post-training survey • Likert scale • OEQ The Likert scale assesses the perceived helpfulness of different feedback modalities, enabling quantitative comparison. OEQ provide qualitative insights into user preferences and reasons for favoring or disliking specific modalities. RQE 2 Motivation Pre- and post-training survey • Likert scale • OEQ The Likert scale assesses motivation to practice tennis before and after the VR training for quantitative comparison. OEQ at the end to discern contributing factors. RQE 2 User Experience Post-training survey • OEQ Qualitative insights into perceived usability, enjoyment, accessibility, and learning value of our VR training. Subjective user opinion on complementing real tennis training. RQE 3 Short-Term Learning Effect Pre- and post-training survey and performance tests • Metrics • Likert scale • OEQ Quantitative and qualitative indications of a short-term learning effect or adverse effects stemming from the VR tennis training. Likert scale to quantitatively compare self-reported confidence in skill. RQE 4 Trust Post-training survey • OEQ Qualitatively assess whether users trust or believe in the system’s analysis and perceive it as accurate. color vision deficiency), we asked for feedback on our color scheme and involved the participant in changes. The participants from the pilot study were not included in the results of the main study. 6.2.2 User Study Protocol Before participating, participants were informed about the study context, procedure, their task and objective during the experiment, their right to refuse or withdraw, and safety. They were also asked to read and sign a consent form. At the beginning of the study, the coordinator showed the participants the tracking space (an L-shaped tracking area with a total length of 3m and a width ranging from 1.5m to 2.7m) and initial position for the VR sessions, which were marked on the floor. Afterward, controls on the HMD and controllers were explained and participants were asked to put on the equipment. The interpupillary distance (IPD) on the Meta Quest 2 was adjusted if necessary. Once the participant was ready, the experiment started with the first VR session. The experimental procedure is depicted in Figure 6.1. In between the first and second VR sessions, participants were asked to fill out the demographics questionnaire 85 6. Evaluation & Results VRSQ VRSQ QuestionnaireDQ First VR Session Tutorial Performance Test Second VR Session Intro Performance TestTraining Figure 6.1: Procedure of our user study (DQ denotes the Demographic Questionnaire). and Virtual Reality Sickness Questionnaire (VRSQ). Participants were again asked to fill out the VRSQ after the second VR session. The user study concluded with a questionnaire containing likert-scales and open-ended questions. The questionnaire is given in Appendix A. The average duration of the user study was approximately 47 minutes (SD ≈ 12 minutes), of which around 20 minutes where spent in VR (SD ≈ 1.5 minutes). First VR Session. The first VR session began with a tutorial, followed by the initial performance test to assess participants’ forehand technique. The tutorial allowed participants to familiarize themselves with the virtual tennis training environment, ball physics, and controls. Additionally, an avatar demonstrated the correct forehand technique, and the motion phases of the forehand groundstroke were explained. The steps and exact instructions provided during the tutorial are listed in Appendix A. While the tutorial had no time limit, it was relatively brief, with an average duration of approximately 3 minutes (SD ≈ 0.7 minutes). Second VR Session. The second VR session started with an intro to the training tools available and served to introduce the participant to the procedure and feedback modalities of the VR tennis training. The motion replay, UI elements, automatic assessment based on coaching rules, and feedback were presented and explained one after the other. The steps and exact instructions given during the intro are also listed in Appendix A. There was no time limit for the intro. Participants could explore the different tools, UI elements, and coaching rules at their own pace but were not able to go back to previous steps. On average, participants spent around 3.1 minutes in the intro (SD ≈ 1.26 minutes). Following the intro, participants completed a 10-minute tennis forehand training session. The session concluded with another performance test. Performance Test. The VR performance test was conducted at the end of both VR sessions and was designed as a target practice exercise without explicit feedback (apart from Knowledge of Results (KR) provided in Tennis Esports, such as haptic feedback and indicators of whether the target was hit). Participants encountered sixteen balls in rapid succession without pause or feedback in between shots. We used eight different parameter configurations that adjusted aspects such as ball speed or target size to gradually increase the difficulty. Each configuration was presented twice before switching to the next, giving participants two attempts per setting. VRSQ. The Virtual Reality Sickness Questionnaire (VRSQ) [KPCC18] measures VR- related discomfort like motion sickness. In our user study, the VRSQ is administered to 86 6.3. Experimental Results participants directly before and after the second VR session to assess the effects of the VR tennis training. 6.2.3 Participants A total of 26 participants (19 males, 7 females) aged between 21 and 60 years (M = 31.0, SD = 9.57, Mdn = 27) were recruited for the main study. Their tennis skill level ranged from none to intermediate (None = 10, Novice = 9, Intermediate = 7), with two participants playing tennis with their left hand. All but three participants had prior experience with VR, and six participants had tried VR only once before the study. Eight participants identified as novice or occasional users. The remaining nine reported using VR regularly, ranging from at least once per month to daily. Due to the color coding, we asked participants to provide any information about known color vision deficiencies. One participant reported problems differentiating “yellow/blue,” while another stated, “Under certain light conditions, shades of green are difficult.” Additionally, the participants’ total body height was inquired via the demographics questionnaire, as some coaching rules depend on height estimation. Participants’ heights ranged from 160 to 198 cm (M = 176.77, SD = 9.24, Mdn = 177.5). 6.3 Experimental Results This section presents the data and results from our user study, consisting of responses to questionnaires and data collected during the performance tests. We applied a mixed approach for evaluating the user study and applied both quantitative and qualitative analysis methods. The results are presented accordingly. In Section 6.3.1, the quantitative results are presented, while Section 6.3.2 contains the qualitative results. An overview of our evaluation methods can be found in Table 6.1. 6.3.1 Quantitative Analysis We used a pretest-posttest design to measure the differences between paired observations. In our case, we compared the following metrics assessed before and after the VR training. • Self-reported motivation to play tennis • Self-reported confidence in their forehand technique • Indicators of motion sickness (VRSQ) • Performance test results (performance scores, coaching rule adherence) In order to compare our feedback modalities, we included a Likert scale in the final ques- tionnaire. It assesses participants’ perceived helpfulness for each modality. Additionally, we look at the average trend of performance metrics over the training. We hypothesize that different tennis experiences, baseline skill levels, and learning preferences affect the effectiveness of our system and how users experience it. We suspect that, especially between tennis beginners and those with tennis experience, 87 6. Evaluation & Results Difference (Post - Pre)Questionnaire Fr eq ue nc y 5 6 9 7 8 4 3 2 1 -1 0 3210 10 Pre Post Li ke rt S ca le ( A gr ee m ne t) 5 6 7 4 3 2 1 Figure 6.2: Left: Reported motivation levels to practice tennis before and after our VR training, based on agreement (1 = strongly disagree, 7 = strongly agree) with the statement: “I feel motivated to practice tennis, whether in virtual reality or in real life”. Right: Distribution of differences between post-test and pre-test motivation levels. differences might emerge that result in non-normal distributions of our metrics, such as bimodal or multimodal distributions. The results of Shapiro-Wilk Tests reveal that the pair differences between most of our paired observations likely do not follow a normal distribution, thereby supporting these assumptions. Accordingly, we used non-parametric methods for the quantitative evaluation. Specifically, we used the Wilcoxon Signed-Rank Test to assess differences between paired observations (pre- and post-test values) and the Friedman Test for k-related samples. For calculating the effect size r for the Wilcoxon Signed-Rank Test, we apply Rosenthal’s formula r = Z/ √ N , where Z is the standardized test statistic and N the number of pairs [RCH+94, FMR12]. 6.3.1.1 Motivation To Play Tennis In H1, we hypothesize that participants feel more motivated to practice tennis—whether in VR or real life—after completing the VR tennis training than before. To test hypothesis H1, we included a 7-point Likert item in the questionnaires administered before and after the training session. This item measures participants’ level of agreement with the statement: “I feel motivated to practice tennis, whether in virtual reality or in real life.” The responses to this statement are shown in Figure 6.2 (left chart). The median motivation level increased from 5 in the pre-test to 6 in the post-test. The results of a related-samples Wilcoxon Signed-Rank Test (Z = 2.808, p = .005, r = 0.551) indicate that the difference in motivation levels before and after the training session is statistically significant. The distribution of differences between post-test and pre-test motivation levels, as shown in the right bar chart of Figure 6.2, shows that the majority of participants (57.70%) reported an increase in motivation. These results support hypothesis H1. 88 6.3. Experimental Results Questionnaire Pre Post Li ke rt S ca le ( A gr ee m ne t) 5 6 7 4 3 2 1 Difference (Post - Pre) Fr eq ue nc y 5 6 9 7 8 4 3 2 1 -1 0 3210 10 Figure 6.3: Left: Reported confidence in the forehand tennis technique before and after our VR training, based on agreement (1 = strongly disagree, 7 = strongly agree) with the statement: “I am confident in my current forehand tennis technique”. Right: Distribution of differences between post-test and pre-test values. 6.3.1.2 Self-Reported Confidence in Forehand Technique We hypothesize that the training increases participants’ confidence in their forehand tennis technique, as the feedback is designed to acknowledge improvements, provide positive reinforcement, and offer metrics to track performance. To test hypothesis H2, we compared participants’ self-reported confidence levels in their forehand tennis technique, measured before and after completing the VR training using a 7-point Likert item in the questionnaires. The reported confidence levels are given in Figure 6.3. H2 is supported by the results of a Wilcoxon Signed-Rank Test (Z = 2.623, p = .009, r = 0.514), which indicate a statistically significant change in confidence levels, with the median confidence level increasing by 1 from the pre-test to the post-test. 53.85% of participants reported an increase in confidence, while 15.38% reported a decrease, as shown in the bar chart of Figure 6.3. 6.3.1.3 Performance Test Results We evaluated two types of performance metrics in order to compare the performance test results. The first metric, coaching rule adherence, is the ratio of the number of coaching rules the participant successfully adhered to (i.e., rules that received a score of 100%) to the total number of rules. The second metric consists of performance scores that quantify motion correctness by accounting for rule importance and motion error severity. These performance scores are calculated by evaluating each coaching rule with a score in the range [0, 100], as described in Section 5.3, and then aggregating them with a weighted arithmetic mean based on rule prioritization. Seven performance scores were considered in the evaluation: the overall score, which assesses the whole motion by combining the results of all rules, and a separate score for each motion phase, integrating 89 6. Evaluation & Results * * * 50 60 90 70 80 40 30 20 10 100 0 avgmin minmax avg max * * * OverallAdherence Figure 6.4: Violin plots displaying performance metrics for the entire motion, measured during the performance tests pre and post VR training. The first three plots represent coaching rule adherence, and the last three represent overall performance scores. The plots display the lowest, average, and highest values achieved during the performance tests. Statistical differences were assessed using the Wilcoxon Signed-Rank Test (* = significant, n.s. = not significant). only the rules associated with the respective phase. These summative performance scores were also displayed during training. It is important to note that while both metrics are related, one can improve the performance scores without increasing coaching rule adherence or get lower/higher performance scores than coaching rule adherence, depending on rule prioritization. We utilize the Wilcoxon Signed-Rank Test to assess whether changes in performance from pre- to post-test are statistically significant and the Hodges–Lehmann estimator to estimate the location shift. As each performance test has 16 trials, we apply the arithmetic mean (additionally min and max) to summarize the performance metrics for each participant across trials. First, let us look at the change in overall performance and coaching rule adherence. Figure 6.4 visually compares the related performance metrics between pre-test and post-test, showing an increasing trend across all comparisons, and the Wilcoxon Signed- Rank Test results (see Table 6.2) indicate significant positive changes. Additionally, the Hodges–Lehmann estimator suggests a location shift of 9.54, 95% CI [6.63, 12.93] in overall performance and 5.95, 95% CI [3.75, 8.52] in coaching rule adherence. As the performance test evaluated 58 coaching rules in total, the positive shift in coaching rule adherence translates to approximately three coaching rules (95% CI: approximately two to five coaching rules). The plots and results imply that participants tended to achieve a higher overall performance score and fulfilled more coaching rules in the post-test, thereby supporting H3 and H4. Now, let us examine how participants performed in the individual motion phase during the performance tests. Figure 6.5 provides a visual comparison between the pre-and post-test performances for each motion phase, while Table 6.3 contains the Wilcoxon Signed Rank Test results. RP displays the highest estimated location shift of 20.00, 90 6.3. Experimental Results Table 6.2: Comparison of coaching rule adherence [A] and overall performance [O] between pre-test and post-test. Statistics are presented for the lowest, average, and highest values achieved during performance tests. The positive/negative differences show how many participants increased/decreased the given metric from the pre-test to the post-test. The Wilcoxon Signed Rank Test results are reported as Z and p values with α = 0.05. Metric Pre Mean (SD) Post Mean (SD) Differences Pos. | Neg. Z p r Hodges-Lehmann Est. A (min) 46.55 (16.62) 54.31 (18.03) 20 | 6 2.693 .007 .528 7.76, 95% CI [2.59, 12.93] A (avg) 72.15 ( 9.87) 78.88 ( 7.89) 24 | 2 4.229 <.001 .829 5.95, 95% CI [3.75, 8.52] A (max) 84.20 ( 9.66) 90.73 ( 5.94) 22 | 3 3.809 <.001 .747 5.88, 95% CI [3.45, 8.78] O (min) 38.25 (14.41) 44.89 (16.79) 19 | 7 2.426 .015 .476 6.10, 95% CI [1.33, 11.48] O (avg) 65.01 (14.19) 74.96 (11.87) 24 | 2 4.229 <.001 .829 9.54, 95% CI [6.63, 12.93] O (max) 82.55 (16.31) 92.62 ( 8.50) 22 | 4 3.594 <.001 .705 10.08, 95% CI [6.49, 13.42] Table 6.3: Statistics and Wilcoxon Signed Rank Test results (reported as Z and p values with α = 0.05) for the average motion phase performance scores measured during performance tests. The positive/negative differences show how many participants increased/decreased the given metric from pre-test to post-test. Metric Pre Mean (SD) Post Mean (SD) Differences Pos. | Neg. Z p r Hodges-Lehmann Est. RP (avg) 68.30 (24.71) 89.49 (10.83) 22 | 4 3.670 <.001 .720 20.00, 95% CI [ 9.35, 30.03] BS (avg) 72.75 (19.10) 83.92 (15.43) 18 | 8 2.679 .007 .525 9.37, 95% CI [ 2.65, 19.17] FS (avg) 86.83 ( 8.70) 93.18 ( 5.21) 22 | 4 3.467 <.001 .680 5.22, 95% CI [ 2.40, 9.53] CP (avg) 90.57 ( 6.41) 90.75 ( 6.75) 12 | 14 .368 .713 .072 .44, 95% CI [-1.56, 2.29] FT (avg) 62.17 (33.88) 76.55 (18.64) 17 | 9 2.197 .028 .431 15.91, 95% CI [ 1.41, 27.07] EX (avg) 41.12 (16.26) 44.74 (21.28) 15 | 11 1.435 .151 .281 3.71, 95% CI [-1.53, 8.97] 95% CI [9.35,30.03], and highest effect size, which signifies that participants tended to have the highest performance score improvement in their ready position. However, as coaching rules for the ready position tend to have higher prioritization (as motion errors in the first phase can lead to many follow-up errors later in the stroke), the prioritization weights might have influenced and skewed this magnitude. In Figure 6.5, it is observable that scores associated with CP and EX resulted in the smallest changes, supported by their small estimated location shifts (see Table 6.3). The Wilcoxon Signed Rank Test results for CP and EX are not statistically significant for α = 0.05. 6.3.1.4 Performance Throughout Training Figure 6.6 presents line charts illustrating coaching rule adherence and the overall performance score over time during the training session. Both metrics show an increasing estimated trend. Each participant had 10 minutes (precisely 605 seconds) for VR training. However, the resulting time series are unevenly spaced since participants could decide how long they wanted to remain in the feedback phase after each forehand stroke. Furthermore, 91 6. Evaluation & Results * * * n.s. n.s.* 50 60 90 70 80 40 30 20 10 100 0 RP BS CP EXFS FT Figure 6.5: Violin plots displaying performance scores for the six individual motion phases, measured during the performance tests pre and post VR training. Statistically significant differences are based on the Wilcoxon Signed-Rank Test (* = significant, n.s. = not significant). Pe rc en ta ge [0 ,1 ] 0.725 0.750 0.825 0.775 0.800 Training Time in s 6000 200 400 score adherence Pe rc en ta ge [0 ,1 ] 0.740 0.760 0.820 0.780 0.800 Training Time in s 6000 200 400 score adherence Figure 6.6: The estimated trend of coaching rule adherence and overall performance over time during the training session. Left: 60s window; Right: 300s window; 1s increment. the number of sample points and their time points differ between participants. We applied a sliding window performing a time-weighted average to, on the one hand, transform the data into equally spaced observations and, on the other hand, to smooth the data to better visualize the trend over time. The method is a form of weighted moving average to estimate the underlying trend of our time series. The first line chart in Figure 6.6 is generated with a 60-second window. The second chart is generated with a 300-second window, resulting in a higher smoothing of the data. The increasing trend suggests that participants improved over the training session. Figure 6.7 shows that, while most participants had a better overall performance score at the end of the training compared to the beginning, not all improved. The estimated trend is decreasing for eight participants. 6.3.1.5 Helpfulness and Preference of Feedback Modalities We included a 7-point Likert scale in the questionnaire to determine whether participants found specific feedback modalities more helpful than others (see Appendix A). The responses for each modality are displayed in Figure 6.8. We use the Friedman Test to test for for the significance of differences. The results on perceived helpfulness (χ2 = 92 6.3. Experimental Results Training Time in s 6000 300 0.6 0.5 0.7 1.0 0.8 0.9 Pe rc en ta ge [0 ,1 ] Training Time in s 6000 300 Pe rc en ta ge [0 ,1 ] 0.6 0.5 0.7 1.0 0.8 0.9 Training Time in s Pe rc en ta ge [0 ,1 ] 6000 300 0.7 0.6 0.5 1.0 0.8 0.9 Figure 6.7: Estimated trend of overall performance scores over the training session per participant. The time series are generated with a 300-second sliding window and 1-second time steps. The upper plot shows the trend of all 26 participants. The bottom left plot shows all trends that resulted in a better overall performance at the end compared to the beginning, while the bottom right one shows all decreasing trends. 7.380, p = 0.287) are not statistically significant for α = 0.05. Therefore, we cannot reject the null hypothesis of the Friedman Test, which assumes that the seven feedback modalities are perceived as equally helpful. However, what is noticeable in Figure 6.8 is that the perceived helpfulness of the haptic feedback strongly varies between participants. Further, all feedback modalities, except haptic feedback, were rated by at least 75% of participants with a helpfulness score of 4 or greater and had a median perceived helpfulness ≥ 5. These results suggest that most participants found motion replay, color coding, auditory feedback, performance scores, textual feedback (including images), and the list of coaching rules in the UI helpful. Additionally, participants were asked in one of the open questions what feedback modality they preferred and why. The exact formulation is ‘Do you have a preference for one feedback modality? Please share which one and why.’ For evaluation, multiple preferences are accepted. The evaluation of the answers resulted in the data displayed in Figure 6.9 (a). Three participants did not state a preference for any feedback modality and were excluded from the analysis of this question, as their non-responses could not be reliably 93 6. Evaluation & Results Least Helpful Most Helpful Li ke rt S ca le (H el pf ul ne ss ) 5 6 7 4 3 2 1 Motion Replay Color Coding Auditory Haptic Scores Text & Images UI List Figure 6.8: Perceived helpfulness of the feedback modalities for learning or practicing a tennis forehand stroke rated by a 7-point Likert scale ranging from least to most helpful (How helpful was feedback modality x?). (a) Preference (b) Sentiment 1: Replay 2: Color Coding 3: Auditory 4: Haptic 5: Scores 6: Images, Text 7: List 0 5 10 15 20 25 5 6 741 2 3 5 5 6 6 9 7 7 8 4 4 3 2 1 1 2 3 0 Neg. Neu. Pos. Figure 6.9: The left chart shows the number of participants who preferred each feedback modality. The chart on the right displays how many participants expressed a positive (pos.), negative (neg.) or neutral (neu.) sentiment about each modality. “UI List” refers to the list of coaching rules with their respective metrics provided in the UI. interpreted. One participant explicitly mentioned that they had no preference. The results of a Cochran’s Q Test on feedback modality preference (χ2 = 12.880, p =0.045) are statistically significant for α = 0.05. This result suggests that at least one feedback modality stands out and is preferred significantly more or less than another. However, a post hoc pairwise test ( McNemar’s Test with Bonferroni adjustment) did not find pairs of relevant differences for α = 0.05. The highest difference can be observed between the motion replay, which 9 participants preferred, and the haptic feedback, which only 1 participant stated as a preference (χ2 = -.348, p = 0.058). 6.3.1.6 VRSQ We applied the VRSQ [KPCC18] during the user study to assess motion sickness before and after the virtual training session. The questionnaire uses a 4-point Likert scale with 9 items, which are grouped into three components: Oculomotor, Disorientation, and Total. 94 6.3. Experimental Results Table 6.4: VRSQ scores for the pre-test, post-test, and the difference (i.e., post − pre) are presented as mean (SD). Results from the Related-Samples Wilcoxon Signed Rank Test are reported as Z and p values (N = 26, α = 0.05). Components Pre Mean (SD) Post Mean (SD) Diff Mean (SD, Median) Z p Oculomotor 5.77 (9.94) 6.09 (10.42) .32 (7.26, .0) .258 .796 Disorientation 3.33 (6.32) 4.62 (6.47) 1.28 (4.22, .0) 1.508 .132 Total 4.55 (7.62) 5.35 (7.80) .8 (5.02, .0) .420 .674 Table 6.5: Descriptions for negative, neutral, and positive sentiments. Sentiment Description Negative A negative sentiment is assigned to statements highlighting drawbacks/flaws of the system, criticism, participants’ dislikes, or responses that indicate negative effects, experiences, or a need for improvement. Neutral A neutral sentiment is assigned to general statements, observations that do not convey a clear sentiment, ideas, or suggestions that are not linked to criticism. We also include responses where participants express uncertainty in their judgment. Positive A positive sentiment is assigned to participants’ expressions of enjoyment and appreciation. It consists of statements that describe favorable experiences, or highlight benefits of the system. The aggregated VRSQ scores for both the pre- and post-test are presented in Table 6.4 (calculated based on the formulas reported by Kim et al. [KPCC18]). To evaluate the impact of the virtual training session on motion sickness, we compared the pre- and post-test VRSQ scores by calculating the differences between paired observations (pre- and post-test values) and applying the Wilcoxon signed-rank test. The results given in Table 6.4, indicate that there is no significant difference between pre- and post-test values. 6.3.2 Qualitative Analysis The findings of the qualitative analysis are based on the qualitative coding of responses to the open-ended questions in our user study. The questionnaire can be found in Appendix A.1. We categorized participant responses into codes and sentiments using abductive coding—a hybrid approach combining deductive and inductive coding. From this coding, four themes emerged: (1) Outcome & Effect, (2) Experience & Usability, (3) Teaching Methodology Impressions, and (4) Feedback Design. The following subsections present the results of each theme in detail, including the corresponding codes and representative examples from participant responses. Tables 6.6, 6.7, 6.8, and 6.9 provide a summary of the identified codes, their descriptions, the number of unique participants who mentioned them, and the distribution of sentiments—i.e., the breakdown of how many participants expressed positive, neutral, or negative statements about each code. The sentiments are defined in Table 6.5. Notably, sentiments are not mutually exclusive. A single participant may express both positive and negative sentiments about the same code. 95 6. Evaluation & Results 6.3.2.1 Outcome & Effect The theme Outcome & Effect captures participants’ perceived impact of the VR training on learning, skill improvement, and motivation to practice tennis. The respective codes, descriptions, and response distributions can be found in Table 6.6. For instance, 25 out of 26 participants mentioned the VR training’s effect on skill in their answers, of which 22 reported skill improvement. At the same time, 5 participants made neutral statements, and another 5 addressed adverse outcomes. Overall, participants perceive the outcome and effect of the VR tennis training as positive. Skill Improvement. In their responses, 25 out of 26 participants mentioned the impact of VR training on performance. Of these, 22 participants reported experiencing a sense of progress, noticeable skill improvement, or the potential for improvement with extended practice. Among those who reported improvement, responses ranged from “I noticed some improvement”, and “I feel like I hit much more of the balls in the final evaluation,” to “I saw a huge improvement as I started from 0.” Participants became aware of improvements as “the motion overall started to feel better” and due to the feedback. While no participant reported exclusively negative effects on skill, a few expressed mixed sentiments, such as “Now I know more and can do it a bit better. It still does not feel quite right”, and “The only thing I did not quite get is how to aim the stroke.” Some noticed improvement “technically” but not “on hitting the target.” A participant wrote: “My forehand stroke got better in the execution, but I wasn’t able to hit the target as well as in the first trial.” Three participants expressed uncertainty about whether or not their forehand has improved; one clarified that they “need to practice the techniques in real life to assess if [their] skills are or will be improved.” Increased Motivation. Many participants expressed an increased motivation to practicing tennis in VR or real life, and a few who had played tennis in the past reported renewed enthusiasm. One respondent stated: “It’s been a while since I played tennis, and now I would like to try it again.” Participants listed accessibility, instant feedback, game design, learning effect, fun, and an engaging experience as key contributing factors to motivation. Example responses include: “I would really like to practice something like this long term, it was really enjoyable and really useful to try to learn a tennis movement”, “It increased my motivation because I feel like I’ve gotten slightly better, but it showed me how much I’m still doing wrong. So it feels like there is a lot of potential left to tap into”, and “I want to try in real life what I learned during the VR training.” Knowledge Gain. Most responses for this code conveyed that participants learned something new about the general forehand technique, such as “where to focus”, about “positioning, timing, and [the] swing”, or “why certain movements will make the stroke better.” Others pointed out that they learned “something about [their own] forehand swing”, such as what they are “doing wrong and how it should technically look like.” Participants without prior knowledge specified that they learned basic principles such as 96 6.3. Experimental Results Table 6.6: Description of codes for the theme Outcome & Effect and the number of participants who mentioned them. The final column lists how many participants expressed a negative , neutral , or positive sentiment for each code (from top to bottom). Code Description Count 5 5Skill Improvement Participants noticed improvements in their forehand technique or perfor- mance during or as a result of the VR training. Some expressed efficient training success, uncertainty, or a decrease in accuracy. 25 22 - -Motivation To Practice Tennis Participants expressed increased motivation to practice tennis or a renewed enthusiasm for tennis after the VR training. 19 19 - 2Knowledge Gain Participants mentioned a refreshment/gain in knowledge or expressed their belief that the application has a learning value, especially for beginners. 19 19 “how the general motion of a forehand stroke is to be executed" and "what to do at certain points throughout the stroke.” Some participants with tennis experience mentioned that “the VR training definitely increased [their] knowledge about tennis movements.” Other intermediate players remarked that it “kinda refreshed [their] knowledge.” One elaborated: “I played a long time ago, so I realized how my knowledge returned.” Other responses conveyed that the application has the potential of learning value. For instance, a participant stated: “I think that you can learn to play tennis with this program. Or improve your technique if you have already played the sport.” 6.3.2.2 Experience & Usability The theme Experience & Usability focuses on the user experience and usability of the VR training system. Participants highlighted both positive aspects and challenges in their interactions with the system. The codes assigned most often are provided in Table 6.7. Usability. Participants generally found the VR training and the UI “intuitive”, “quick to learn”, and “easy to use”. However, there were some mixed responses about the navigation and interaction. One participant stated that the “UI is mostly intuitive,” but “sometimes feedback after [the] stroke took some time to find in the menu.” Others critiqued the usability and functionality of the motion replay and its UI. One participant wrote: “One minor issue I had with the UI was the phase selector for the stroke replay. Selecting a phase was possible to me, but I then could not really observe it, because the phases all are really short.” This participant suggested that “a slow motion function should be provided for replaying the stroke, so [that users] can clearly see which part went wrong.” A second participant recommended an interactive link between the replay and the feedback panel, so that when a motion phase is selected on the replay, the associated information is displayed on the feedback panel. Additionally, a bug in the user study (but not in the VR training itself) lead to a “immediate confirming of follow up menus” when 97 6. Evaluation & Results Table 6.7: Description of codes for the theme Experience & Usability and the number of participants who mentioned them. The final column lists, for each code, how many participants expressed a negative , neutral , or positive sentiment (from top to bottom). Code Description Count 8 -Usability The VR system and the user interfaces were rated as intuitive, easy to use, and navigate. Mixed responses brought up challenges in certain UIs and a bug. 26 26 1 -Enjoyment Expressions of enjoying the VR training or experience in general. Some participants indicated that aspects exceeded their expectations. 22 22 8 -Realism Opinions on the realism of VR tennis training in terms of visual realism and similarity to real life regarding ball physics, tactile response, environmental conditions, and feeling of movements were mixed. 17 12 14 2 Mobility & Controller in Weak Hand Participants reported inhibitions in natural movements due to the fear of hitting something outside the tracking volume, drifting and being restricted by the small space and controller in the nondominant hand. 15 - 11 2 Visibility Issues & HMD-related Problems Some participants voiced struggles seeing the ball and ball machine. Participants also reported HMD-related drawbacks such as eye strain, heat, FOV, and resolution. 11 - navigating in-between the tutorial, VR training, and the performance test. This bug was fixed after a few participants but led to points of criticism noted in the questionnaire. Enjoyment. Many participants expressed enjoyment, stating that they “liked the whole experience”, “really enjoyed the VR training”, and found it “super cool”, “engaging”, and a “good workout.” However, one participant noted that while the gameplay was enjoyable, it was also frustrating “not hitting the target because [they] had to follow the instructions.” Six participants indicated that the experience exceeded their expectations. For instance, they wrote: “It was quite surprising how well the different techniques and errors I’ve used and made have been recognized”, “It was fun and surprisingly complex”, and “I like it and am surprised at how realistic it is to play.” Participants highlighted various aspects that contributed to their enjoyment, including usability, accessibility, realism, “training success”, feedback design, and the motion replay. They also mentioned the value of “instant feedback and very clear instructions on what to do better”. Realism. Opinions on realism were mixed. On the one hand, participants found the virtual environment “immersive” and realistic looking. Movements were described as “pretty close to real life tennis,” and haptic feedback “felt very natural.” Several participants commented that the gameplay and ball physics felt “like playing tennis in real life.” One specified that the “accurate” and “quick” ball reaction contributed to their enjoyment. Other participants highlighted that “using a real racket as controller significantly improves the immersiveness [...] and enhances the overall reality and precision of hits.” On the other hand, participants expressed that “VR felt very different from 98 6.3. Experimental Results reality.” They remarked that “the feeling of playing real tennis with a real racket and ball is different” and “being in VR is not as good as in real life (feeling fresh air, body movement).” Other factors affecting realism included not being able to see the feet, the secondary controller, “limits in physics”, tracking issues, unrealistic animation, and insufficient haptic feedback. One participant explained: “The major drawback of VR, in my opinion, is an absence of haptic feedback, i.e. when the ball hits the racket, there is no shock through the handle in VR, which clearly there is in the real world.” Mobility & Controller in Weak Hand. Nine participants reported mobility issues. Five feared hitting objects or walls outside the tracking boundary or “putting too much force behind a swing to not damage the controllers”, which inhibited their movements. This was partly a flaw in the user study, as the tracking area was “a bit small.” Another issue reported by two participants was that "the room boundary was not taken into account by the tennis app, and [they] started drifting towards the wall/table." During the user study, multiple participants had to recenter their boundary or were instructed to return to their initial position, which might have influenced their comfort and mobility. Ten participants highlighted that the controller in the weak hand is distracting, disturbing, or even “hinders free movement” and alters the forehand technique as “playing tennis you should have one hand free.” One participant explained: “Having a controller on my non dominant hand disturbed me a bit at first since I wanted to hold the handle with both hands at the beginning of every trial.” Visibility and VR-Related Issues. Four participants struggled with the visibility of the ball and ball machine (drone), especially when “further away”. Participant reasoned that the visibility issues might arise due to the limited resolution, FOV, distance, and colors that are too similar to the background. One suggested: “A brighter [drone] would help with larger distances - it merges with background right now.” Nine participants reported VR or HMD-related problems. The downsides mentioned were eye stress/load resulting in “blinking a lot” or bad sight and focus, limited resolution, heat and sweating under the headset, and estimating distances to the ball. 6.3.2.3 Impressions of Teaching Methodology This theme explores how participants perceived the VR tennis training as a learning tool in relation to traditional coaching methods. It focuses on the participant’s impressions, trust, and thoughts on the teaching methodology. Relevant codes are summarized in Table 6.8. The general impression of our teaching methodology were positive. Participants wrote appreciations such as “good approach,” “it is such a useful tool, especially (in this case) due to the instant analysis of performance,” “definitely [a] good support,” “felt like a one on one session which is very helpful,” and “could help home practice a lot.” However, participants also pointed out negative aspects, such as feedback overload, and made suggestions for improvements. 99 6. Evaluation & Results Table 6.8: The most relevant codes for the theme Teaching Methodology Impressions, along with their descriptions, and the number of participants who mentioned the code. The last column shows the sentiment breakdown for each code, listing from top to bottom how many participants expressed a negative , neutral , or positive sentiment. Code Description Count | Sent. 8 2 Complements Traditional Coaching Participants think that the VR tennis training can effectively complement traditional coaching to a certain degree. 26 26 8 1 Subjective correctness of analysis Participants mostly found the system’s analysis and visual representation of their motion accurate. For some, the physics, score, or feedback did not align with their expectations. 21 15 4 -Trust The majority of participants perceived the system analysis and feedback as trustworthy. The motion replay and agreement with the analysis increased trust, while insufficient realism and explanations decreased it. 18 16 11 2Teaching Focus & Recommendation Some critique was expressed on the teaching focus. Participants emphasized that the explanations and instructions on topspin were insufficient and unclear. 13 - - -Accessibility Participants mentioned accessibility as an advantage of VR tennis training and named factors such as flexible training, opportunity for self-training, affordance, easy access without traveling, and injury. 10 10 6 -Overload Participants expressed being overwhelmed by the amount of feedback and unable to focus or implement all at once. 6 - Complements Traditional Coaching. All participants think our VR tennis training can complement traditional coaching. However, one specified “but maybe only for tracking of the movement or to learn the basic movements and concepts.” Some participants see limitations regarding realism and mobility. One participant pointed out that being unable to ask questions for clarification is a drawback of our solution and suggested utilizing a chatbot to mitigate it. Others observed that the stance, court positioning, as well as “walking and running patterns” were not part of the training, but “play a big role in tennis.” Nevertheless, participants with mixed responses still see our VR tennis training as a helpful “supplementary training” and “think that training tennis in VR is a great additional bonus to training tennis in ‘real life’.” Participants believe that VR training is “a good addition” to traditional training “because the movements for the correct technique can be learned”, “because the motion of strokes in tennis is the same in VR as in the real world”, “as you can actually look at your motion and see the weaknesses of it”, “as it gives a very detailed analysis” and “delivers so much information about your forehand directly, which is more difficult to extract from something like a video recording of your play”, as well as “practice conditions are repeatable”. One participant stated that the “quick feedback loop of seeing the exact motion and what exactly was done wrong is something that can’t really be done in reality,” while another wrote: “Caused by today’s positive experience I made, I would say that VR tennis could be a pretty good teacher for beginners or people who want to improve their skills. In some aspects it might be a better trainer than a human one, cause there is a consequent feedback, which is qualitatively assessable. Most human trainers aren’t able to give feedback like this.” Furthermore, multiple participants mentioned accessibility as a major advantage of VR 100 6.3. Experimental Results training compared to traditional tennis coaching on a dedicated court. Alignment and Trust. A majority of participants responded that they “trusted the system’s analysis very much.” Nine of them reported that they also found it “quite accurate”, and that the feedback and motion replay “felt aligned to [their] perception.” According to participants, trust and confidence was increased due to the visual represen- tation of the stroke that “looked very accurate”, “the associated analysis [which is] easy to understand”, and agreement with the assessment. Example responses are: “When [the system] said I didn’t do something right or wrong, I agreed with it,” “I never had the feeling it was contradictory to my own perception”, “I could tell that I made a mistake and the system would confirm that”, “I trusted it a lot, seeing the statistics and then realizing the mistakes it told me I’m making increased my trust”, and “Sometimes when a stroke ‘felt’ bad, I was confirmed through the feedback as to which part was faulty, and it felt to me that that was actually the part of the stroke where I made a mistake.” One participant reported that they did not “agree with [the system] in every situation”, but still trusted it in general. At the same time, another found that “it seems accurate, although sometimes it gives suggestions that are not aligned with what I was expecting.” What decreased confidence in the system was that the ball flight, contact point, and aim were not aligned with participants’ expectations (realism). One participant noted that following the instructions made the swing too strong and more difficult to aim. Other factors are insufficiently clear instructions, being able to “trick [the system] a little,” and that “the score was quite high even though I hit a terrible shot off the court.” Teaching Focus & Recommendations. Some participants critiqued the teaching focus or suggested how to improve it. Seven participants emphasized that the explanations and instructions on topspin were insufficient and unclear. Better and more “detailed instruction on how to achieve topspin” and “which movement to perform to increase topspin” are needed. Two participants critiqued that “learning the technique is more important than hitting the target area.” One of them suggested “focus[ing] a little more on hitting the target” and incorporating options for customizing the training, for example, by adjusting parameters individually to help with the training. The other participant elaborated: “I had the impression that the correctness of the C-swing was the most important element, no matter whether one hits the target or not” and suggested “to rank the importance of different rules.” The same participant also wrote that “it feels like there is no memory of progress overall - it is always just the last ball that is being discussed; little continuity.” They also suggested incorporating several past swings in the recommendation and feedback process. Finally, a participant suggested it could be more “beneficial to have separate sections for the different parts of the forehand.” Accessibility. Ten participants mentioned accessibility as an advantage of VR tennis training compared to on-court training. They pointed out that the VR training is “more easily accessible than traveling to a tennis center”, as one can “play ‘anywhere’”, “anytime”, “without a fixed training time and date”, and provides a “cheaper and more 101 6. Evaluation & Results Table 6.9: Most relevant codes for the theme Feedback Design, along with their descrip- tions, and the number of participants who mentioned the code. The last column shows the sentiment breakdown for each code, listing from top to bottom how many participants expressed a negative , neutral , or positive sentiment. Code Description Count | Sent. 17 6Understandability There were mixed responses regarding feedback understandability, clarity of instructions, and ease of implementing it. Participants noted topics that were insufficiently covered and suggested improvements. 22 13 2 -Helpfulness Overall, participants found the VR tennis training helpful. 15 15 11 1Feedback Variability Participants found that the verbal feedback got repetitive when they made the same error multiple times and expressed a need for diverse feedback and higher variability in formulations. 13 3 2 -Detailed Participants liked the detailed analysis of stroke patterns, abstraction level, detailed responses, and specific tips for improving. However, some instructions missed details or had too much. 10 9 - -In-Game Motivation A few participants mentioned that the feedback, performance score, and praise were motivating factors in-game, encouraging them to keep practicing. 6 6 1 4Timing Participants found the immediate feedback useful but also suggested that feedback could be given in an aggregated form after a couple of shots rather than after each attempt. 9 7 flexible training opportunity.” One outlined: “I got lots of the benefits of a coach, while not having to rent a court and book a coach/machine [...], which can get expensive.” Another participant conveyed accessibility for those with injury and reported: “I had to stop playing tennis due to knee problems, this system would give me the opportunity to at least play tennis in the virtual world,” while another participant noted that when practicing alone at home, “no one sees you fail.” Overload. Six participants expressed being overwhelmed by the amount of feedback and unable to focus or implement all at once. For instance, participants reported: “It was hard to focus on all the things at once during training. When I would do the thing the tutorial told me to focus on, I noticed how I lost track of other things I was doing right before,” “There are quite a lot of things to concentrate on at the same time,” “For me as a total beginner it was a bit much to start,” and “A lot to keep on your mind.” This overload might be related to the timing of feedback and the feedback design in general. 6.3.2.4 Feedback Design The final theme focuses on how participants perceived the feedback provided by the system. It includes participants’ preferences regarding feedback modality and statements on clarity, helpfulness, timing, and other aspects of feedback design. Codes on the general feedback design are summarized in Table 6.9. 102 6.3. Experimental Results Understandability. Responses regarding feedback understandability were mixed. In general, the feedback was perceived as clear and understandable. Contributing factors were “clear and concise” instructions on “what to do better”, “short and simple” feedback, and a “motion [replay] combined with coaching rules[, as this] is very powerful to understand what went wrong.” A participant remarked: “It was very nice that it was possible to adjust to the feedback so easily and to visually see how [one] should move to make it better.” However, other participants expressed uncertainty, difficulties in implementing feedback, or mentioned “misunderstanding[s] about [tennis specific] terminology.” Participants linked this to insufficient variability of feedback formulations and inadequate explanations of certain topics and coaching rules. The most prominent topic was topspin; eight participants indicated that this concept was not sufficiently explained. One of them wrote: “Instructions helped to improve. Only point left unclear was which movement to perform to increase topspin.” The stance and swing pattern were other unclear topics and “the position of the bat should also be explained and assessed.” Many participants made suggestions on how the feedback design could be improved to enhance understandability. Multiple participants addressed that “the feedback was not always clear and a more specific example of the correct technique would have been helpful”. They suggested adding or substituting images with animations, holograms, or “videos showing how to do the correct movements,” as images and text alone are not sufficient. Three participants suggested showing an ideal motion in addition to the replay. One of them wrote: “I like the replay, but I am lacking the suggestion of how to correct the error [, for example, a] sample ghost next to [the] replay doing things right.” Another proposed visualizing “a comparison between [their] best movement and the current one.” Helpfulness & Preference of Modality. Overall, participants found the VR training helpful, mostly due to the “instant analysis of performance”, “actionable feedback that helps quickly improve technique”, and because it “felt like a one on one session.” Multiple participants found the diagnostic and corrective feedback useful. However, not all feedback was perceived as helpful. Figure 6.9 (b) displays how many participants mentioned each feedback modality with either a positive, negative, or neutral sentiment, including preference statements. The motion replay was mentioned the most often. Participants “enjoyed being able to see what [they] did and learn from it” as it “can help to reflect [one’s] own motion”. They found the “3rd person view” of their motion a “trust-able representation”, as they “could see [their] own size and position reflected”. The only negative aspects participants reported for the motion replay were usability issues with the associated UI and color coding, which was unclear to some. One participant emphasized, that “Being able to replay the motion was perfect! That’s by far the most important aspect for me. The other features helped a lot too though, the first one that comes to mind is the haptic and the auditory feedback. The feedback is my go-to learning experience.” Participants found the verbal feedback and instruction useful for improving their technique, but also found it repetitive, not variable enough, or annoying and preferred to skip it after a while. A participant mentioned: “sometimes it’s hard to improve with the feedback ’increase your aim’ or ’aim at the target’.” Participants 103 6. Evaluation & Results remarked that it was helpful that the images showed how to perform the coaching rule correctly, but also suggested using animations or videos instead of images. For some, the text provided useful information that taught “[them] things about the forehand technique that [they] haven’t heard of before.” For others, it was “not very clear.” A few participants mentioned that they liked the score and color coding of the feedback panel. One participant preferred it “since it was easier to perceive quickly [...] and the clear and easy to interpret color encoding and scores drew my focus towards that panel.” Another responded that the “total score was a good measure since it’s for everything.” Responses were the most mixed about the haptic feedback. Many did not notice it and would have liked a stronger tactile response when the ball collides with the racket. Others found it “awesome” and that “it felt very natural” and some even preferred it over other feedback modalities. Feedback Variability and Repetitiveness. Many participants found the verbal feedback component repetitive. Especially, “when struggling with the same task for multiple balls”, then “it gets annoying to hear the same thing over and over again.” Participants suggested to increase the variability by including short feedback variations, “rephrased feedback”, and “metaphors.” Higher variability would have helped “to clarify the specific point that is being focused on.” One participant also proposed that the system could “just try to motivate the player instead of repeating the feedback.” In-Game Motivation. Participants mentioned that the praise and “immediate success report” were motivating. Three participants highlighted that the score was a good motivator for them in-game, as it “encouraged [them] to keep going, aiming for a 100% rules evaluation.” Another remarked that “even if [they] did not make much progress, [the feedback] kept encouraging [them], which felt good.” Timing. Some participants “really liked the instantaneous feedback after a stroke” and found “the instant analysis of performance” and “feedback step by step” “very useful”. One participant found “the immediate success report motivating,” another expressed that “it is good to have the most recent mistake announced by the trainer—although quite annoying as well,” while a third mentioned they “just needed the feedback for the first few hits” and later skipped it. Two participants suggested implementing aggregated feedback with “multiple balls before feedback” “to have a more dynamic training and see whether [their] shots are consistent across time.” Another suggested a “statistical overview” at the end where “best, average, worst overall scores are shown.” 6.4 Discussion The quantitative and qualitative evaluation of the user study provides answers to our research questions RQE 1–4 and highlights the advantages and disadvantages of our teaching methodology. Furthermore, the user study helped to identify areas for future improvement, 104 6.4. Discussion which are discussed in Section 7.3 and limitations, discussed in Section 7.2, in both our system architecture and user study design. Feedback Helpfulness & Preference [RQE 1 ]. Our evaluation shows that participants had no clear preference for one feedback modality over the others. While some feedback modalities, such as the motion replay, were perceived as more helpful than others, all were found helpful by most participants (except the haptic feedback, which we will cover below). Preferences varied between individual participants, with some stating that their general preference for specific modalities influenced what they found most helpful in VR training. These findings align with guideline G4 and underline the importance of multimodal feedback in complex training applications to accommodate different learning preferences. Participants rated the performance scores, motion replay, and text (with example illustrations) as the most helpful and mentioned them favorably in open-ended responses. Based on participants’ responses, the performance scores, immediate success report, and praise (auditory feedback) promoted user engagement and increased in-game motivation, aligning with guideline G6. The motion replay was mentioned most often and regarded as helpful for understanding feedback and self-reflection (G14). Furthermore, it contributed positively to participants’ trust (G7) in the system and motion analysis by providing a visual representation aligned with participants’ perceptions. Participants’ responses also revealed drawbacks in our realizations of certain feedback modalities, presumably influencing how participants perceived and rated them. These factors are limitations of our implementation and can be improved in future work. Our haptic feedback received mixed responses as it felt insufficient as tactile feedback, an important component in tennis. Therefore, it also affected how participants perceived the realism of our system and simulation. Notably, the racket handle we used during the user study weakens the haptic feedback from the controller and, therefore, affects its effectiveness. This probably also affected participants’ assessment of its helpfulness. We suggest not neglecting haptic feedback in future motion learning applications, especially in sports where tactile feedback is integral. Another flaw is the design of our auditive feedback. While overall, participants found it helpful, it got repetitive for many, and participants expressed a need for more diverse feedback and higher variability in formulations. While we tried to implement variable feedback aligned with G4, our results make it apparent that our implementation is not sufficient and improvements or other approaches, such as Large Language Models (LLM) and text-to-speech, are necessary. User Experience & Judgment [RQE 2 ]. Overall, participants perceived our system’s user experience and teaching methodology as positive, enjoyed the experience, and saw benefits, such as learning value and accessibility, both in our approach and VR. Motivation to practice tennis increased after the VR training, and many participants also expressed increased motivation or renewed enthusiasm for tennis in their responses to open-ended questions. Participants mentioned the following key contributing factors of our VR training: a fun and engaging experience, accessibility, instant feedback, learning effect, sense of progress, usefulness, game design, and the knowledge of motion errors and, 105 6. Evaluation & Results thereby, the realization that there is still much potential left to improve. Many of these factors are also discussed in guidelines G4 and G6. Participants also critiqued certain aspects and design decisions. They made suggestions for improvements, which we will go into further in Section 7.2 and Section 7.3—mentioned flaws and limitations of our system mainly concerning (1) mobility and HMD-related problems, (2) realism, (3) feedback understandability and insufficient cover for important topics such as topspin, and (4) overload from the feedback content, timing, frequency, as well as from the detailed motion analysis. For us, the critique and suggestions on those topics highlight the importance of facilitating natural movement and comfort for motion learning applications (G1), including clear instructions and demonstrations of exercises (G13) and usability, which includes comprehensive feedback (G4). Additionally, a better balance must be found between providing informative/detailed assessment (G16) and prioritizing interventions and feedback (G17) so as not to overwhelm users with the amount of information while still providing a valuable and helpful tool. Finally, all participants believe that our VR training, or VR in general, can—to a certain degree and not without limitations—complement traditional tennis training and would be a great additional bonus. Participants provided multiple justifications. First, some participants indicated that skill transfer to the real world (G12) is possible, which we assume to be an essential property to complement traditional training. Second, participants mentioned various advantages of VR training over traditional tennis coaching due to the properties of VR. Participants mainly listed accessibility aspects such as lower cost and higher flexibility, but they also mentioned other aspects such as repeatable practice conditions, a quick feedback loop (G10), detailed analysis (G16), and the motion replay, which supports self-monitoring and reflection (G14). Adherence to coaching rules & learning effect [RQE 3 ]. Our evaluation indicates that the VR training enhanced participant’s confidence in their forehand technique. Additionally, multiple participants reported a noticeable skill improvement or a sense of progress during training, which aligned with the increase in confidence found. However, some participants reported that while they might have improved in technique, their aim did not improve. This subjective assessment also aligns with our evaluation results. We found measurable performance gains from the pre-test to the post-test for coaching rules regarding the Ready Position (RP), Backswing (BS), Forward Swing (FS), and Follow Through (FT), which can all be interpreted as more technique-focused coaching rules where some of them might not directly influence the outcome of the shot. The results also reveal that participants did not improve significantly in the assessed coaching rules for the Contact Point (CP) and Execution (EX) between pre-test to post-test, which are directly related to aim (CP affects aim and EX assesses aim). Importantly, our results do not show a decrease in outcome-related performance. Overall, the measurable performance gains from the pre-and post-test and during training suggest a positive short-term learning effect for technique-focused coaching rules. Future research must investigate this further to determine whether VR training or other factors cause measurable improvements. However, it is an initial positive indicator of the usefulness and effectiveness of our system. 106 6.4. Discussion Additionally, participants expressed potential for skill improvements with extended practice with our system and the belief that it has a learning value, especially for beginners. Many participants mentioned a gain in knowledge from our user study and VR training or refreshment thereof. Trust & Alignment [RQE 4 ]. Based on the responses to the open-ended questions, a majority of participants both trusted the system’s underlying swing analysis and feedback and found them accurate and aligned with their perceptions. Contributing factors that enhanced trust were motion replay, feedback understandability, agreement with the systems analysis and feedback, and confirmation of mistakes by the system that participants already realized themselves, aligning with guideline G7. Insufficient realism and unclear instructions on terminology or how to improve a motion error negatively affected participants’ trust and the perception of the system’s accuracy. 107 CHAPTER 7 Conclusion & Future Work This chapter contains a summary of our work and findings. Limitations in our user study design, VR tennis training methodology, and implementation are discussed. Additionally, directions for future work and suggested improvements are outlined. 7.1 Conclusion In this thesis, we presented a novel approach for training the correct technique of a modern tennis forehand topspin in VR. Our method combines the advantages of an immersive and versatile virtual environment with automated, rule-based motion analysis and multimodal feedback to support motor learning in an engaging and motivating setting. We reviewed and outlined a range of guidelines from the literature to inform the design process of our VR-based tennis training methodology. Our design prioritizes factors related to guided self-training, accessibility (e.g., portability, flexibility, affordability, and ease of use), user engagement, motivation, determinism, and the facilitation of natural movements. The resulting motor learning system demonstrates how traditional tennis coaching principles can be applied in VR. Our approach utilizes partial motion capture and an expert-driven motion analysis method to evaluate coaching rules—which break down the fundamental technical and biomechanical aspects of the modern forehand topspin into simple rules—in an automated fashion. Tailored feedback on a user’s performance is administered after each completed forehand shot based on the motion analysis results. We applied a decision tree to prioritize feedback on a single coaching rule at a time. The feedback acknowledges and positively reinforces when a user improves a coaching rule based on the previous instruction and otherwise addresses errors by providing corrective feedback. In order to support different learning preferences, multiple feedback modalities are incorporated in our design. Feedback on a coaching rule is provided through verbal cues, textual descriptions, illustrations, and scores. A motion replay enables self-monitoring and 109 7. Conclusion & Future Work supports given auditive feedback with color coding. Additionally, performance scores on the technique serve as in-game motivational factor and indicate progress. We conducted a user study with 26 participants to evaluate our VR tennis training based on the evaluation-oriented research questions RQE 1–4 described in Section 1.3. The study was supervised and followed a within-group pretest-posttest design to compare measurements taken before and after a 10-minute virtual tennis training session. For evaluation, we applied a mixed-method approach. The quantitative results demonstrated a significant improvement in participants’ motivation to play tennis, confidence in their tennis skills, and performance metrics, suggesting a motivational and short-term learning effect of our VR-based tennis training. Further, most feedback modalities were perceived as helpful. Qualitative insights indicate that participants enjoyed the VR tennis training, experienced a subjective sense of learning effect, and mostly trusted the systems motion analysis and feedback. Overall, participants believe our VR tennis training can effectively complement traditional tennis training to a certain degree. 7.2 Limitations A key constraint of our VR tennis training is the limited mocap data due to the exclusive use of the Meta Quest 2 for motion capture. While this design decision allows a minimal hardware setup necessary for a more accessible and flexible alternative to on-court tennis training and existing self-training setups with stationary hardware, it introduces technical limitations. The limited amount of tracked joints restrict the capabilities of our automated motion analysis and feedback administration, as numerous coaching rules cannot be evaluated with the available mocap data. Therefore, full-body dynamics or lower-body techniques, such as footwork, cannot be analyzed, limiting the helpfulness of the feedback and the potential attainable learning effect. Additionally, the motion replay can only represent the tracked body joints. Our haptic feedback received mixed responses and may have negatively affected how some participants perceived the realism. Participants rated it as too weak for the tactile response of an actual tennis racket-ball collision. The racket handle notably reduces the haptic feedback from the controller, and other hardware solutions would be necessary to increase realism. We suggest not neglecting haptic feedback in future motion learning applications, especially in sports such as tennis, where tactile feedback is integral. Based on participants’ responses, we recommend realism as an additional key consideration in the design of VR-based motor learning systems. Factors influencing perceived realism include haptic responses, visual cues, simulated physics, and the behavior of avatars. To effectively support motor learning and skill transfer to real life, it may be necessary to identify which real-world factors are integral to the target motion and replicate them in VR. Another flaw of our feedback design is the limited variability of verbal feedback. Although the auditive feedback modality was generally perceived as helpful, participants noted that it became repetitive when they repeated the same error multiple times and expressed a desire for higher variability in formulations. 110 7.3. Future Work Finally, our user study design has limitations that affect the informative value of our results. First, as our user study lacks a control group, we cannot determine whether the measured performance improvements from the pre-test to the post-test stem from our VR tennis training or other factors. Participants may have improved not because of the feedback but because of natural improvement from repetition or because they became more familiar with the virtual environment and controls. Further research needs to be conducted to investigate the causes behind the measured improvements. Second, we do not evaluate how our VR tennis training affects aspects of technique not covered by our automated motion analysis. Assessing additional coaching rules, such as those related to footwork, utilizing a secondary mocap system could provide interesting insights. Third, performance was measured only immediately after the VR training session. Therefore, our results can only suggest short-term effects. Conclusions about long-term effects, knowledge retention, or skills transfer to real-world scenarios cannot be drawn. Last, the user study was conducted in a room with narrow tracking space, and participants were required to hold a controller in their non-dominant hand for mocap purposes despite it not being necessary for input or the evaluation of our implemented coaching rules. These two avoidable shortcomings in the study setup may have restricted movement and realism. 7.3 Future Work This section addresses potential improvements and extensions for our implementation that are considered future work. Furthermore, we discuss additional experiments that could be conducted in the future to gain deeper insights into the effects of our VR tennis training and to investigate causal-and-effect relationships. 7.3.1 Potential Future Improvements and Features Many potential improvements and features could be added to our implementation. Examples are provided below, some of which were suggested by participants during our user study. Furthermore, our motor learning methodology could be extended to other tennis techniques, such as the serve and slice. Reducing feedback repetitiveness. Experienced repetitiveness of feedback could be reduced through numerous approaches: (a) Introducing a higher variability in feedback, which could be achieved via large language models and text-to-speech. (b) Establishing shorter verbal cues (e.g., “watch the ball”, “hit low-to-high”, “draw a C”). (c) Reducing feedback frequency, for instance, by only providing feedback after a certain number of forehand shots. (d) Adapting the algorithm for selecting the most relevant coaching rule to reduce repetitiveness. Concurrent feedback. Certain coaching rules could be suitable to provide very short audio cues concurrent to the performance immediately when an error is performed. Examples are coaching rules corresponding to the ready position where cues like “Racket ready!”, " or “Face your opponent!” may be helpful. 111 7. Conclusion & Future Work Support Questions. Participants suggested incorporating a chatbot or similar features to be able to ask questions. This would allow users to ask for clarification on feedback or aspects of the technique they did not understand. Additional visual cues. Visual representations of a motion error, such as the deviation from the correct motion or tolerance thereof, could be a helpful addition to the motion replay. One participant also suggested a visual comparison between their best motion and their current attempt. Showcasing correct motion. Better representations of the correct technique should be included (in addition to or as an alternative to the illustrations). Participants suggested holograms, videos, or an animated avatar demonstrating the technique. Better explanations. Certain topics, particularly the topspin, need better explana- tions. Tailored Exercises. Tailored exercises could be provided to train specific aspects of technique based on motion analysis. Configurability. Coaching rules and parameters for the target practice (e.g., ball speed, spin, and target size) could be made configurable to adjust for different skill levels. Improve Motion Replay UI. The user interface below the motion replay could be extended to pause the motion at a specific frame or to display slow motion. Summative Statistics. Statistics after training could display the progress over time or provide aggregated performance measures. Adapt newer mocap methods. Technologies for motion capture are rapidly evolving and could be integrated into our system. An example is the inside-out body tracking on the Meta Quest 3. 7.3.2 Additional Experiments Additional experiments could be conducted as future research, evaluating further effects of our VR tennis training and investigating causation: Causality. Causal relationships could be evaluated via controlled studies to investi- gate whether the observed improvements in motivation and performance can be attributed directly to the VR tennis training. It would be valuable to examine whether the administered feedback contributes positively to motor learning or if similar outcomes could be achieved without it. Demographics. We hypothesize that specific demographic attributes, such as tennis skill level and learning preferences, affect how users experience our VR tennis training and its effectiveness. A larger user study would be necessary to evaluate this hypothesis with reasonable statistical power. Overall technique. An interesting research question would be how our VR tennis training affects the overall tennis forehand technique, including the evaluation of coaching rules not implemented in our system due to the partial mocap (e.g., related to footwork, grip, balance, and hip motion). 112 APPENDIX A Appendix: User Study A.1 Questionnaire 1. Please rate your agreement with the following statement: I feel motivated to practice tennis, whether in virtual reality or real life. 1 2 3 4 5 6 7 Strongly Disagree ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ Strongly Agree 2. Did the VR training increase your motivation for future tennis training? Why or why not? What aspects contributed to your enjoyment, and what did you like or dislike about the VR training? 3. Please rate your agreement with the following statement: I am confident in my current forehand tennis technique. 1 2 3 4 5 6 7 Strongly Disagree ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ Strongly Agree 4. How has the VR training influenced your tennis skills and knowledge? Did you notice any improvements in your forehand stroke during today’s VR training? 5. What are your impressions of the teaching methodology used for learning and training tennis skills in today’s VR session? 6. Do you think VR training can effectively complement traditional tennis training, and why or why not? 113 A. Appendix: User Study 7. How much did you trust the system’s analysis of your tennis forehand? Was it accurate and aligned with your own perception? What influenced your confidence in the system? 8. How intuitive did you find the VR tennis training system and the user interfaces? Were you able to navigate through the menus and options easily? 9. Please describe any challenges or difficulties you had during the VR training. 10. How helpful were the following items in today’s VR training for learning or practicing a tennis forehand stroke? 10.a. How helpful was the replay of your motion? 1 2 3 4 5 6 7 Least Helpful ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ Most Helpful 10.b. How helpful was the color coding (visible in the UI and motion replay)? 1 2 3 4 5 6 7 Least Helpful ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ Most Helpful 10.c. How helpful was the auditory feedback? 1 2 3 4 5 6 7 Least Helpful ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ Most Helpful 10.d. How helpful was the haptic feedback on hit? 1 2 3 4 5 6 7 Least Helpful ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ Most Helpful 10.e. How helpful were the scores on your performance? 1 2 3 4 5 6 7 Least Helpful ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ Most Helpful 114 A.2. Experimental Procedure and Verbal Instructions 10.f. How helpful was the textual feedback and explanation with images? 1 2 3 4 5 6 7 Least Helpful ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ Most Helpful 10.g. How helpful was the list of all coaching rules? 1 2 3 4 5 6 7 Least Helpful ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ Most Helpful 11. Do you have a preference for one feedback modality? Please share which one and why. Did you find the instructions and feedback clear and concise? Were there any moments that you found frustrating, confusing, or repetitive? 12. Optional - What additional features or improvements would you like to see in the VR training? How could the training be improved to make it more effective? 13. Optional - Do you have any additional comments or suggestions regarding your experience with the VR training? A.2 Experimental Procedure and Verbal Instructions Detail the steps participants follow in each part of the VR sessions. A.2.0.1 VR Tutorial The VR tutorial is part of the first VR session and explains necessary controls and tennis terminology. Short tennis target exercises on an outdoor court serve as familiarization with the virtual environment and ball physics. There was no time limit. Each verbal instruction during this tutorial is accompanied with a textual instruction and an animated avatar indicating where to look or demonstrating movements. Step 1: Optimal position—A circle on the ground of the VR environment indicates where the center of the tracking space is. This is explained and shown to the participant. ‘Let’s find your optimal position. Look down to see the circle. It shows you were you should stand.’ Step 2: Tap the ball—Three floating balls appear one after the other in front of the participant. The task is to tap or hit them with the racket. ‘Hit the floating ball with your racket. You have to hit 3 balls to continue.’ Step 3: Forehand Groundstroke—An avatar demonstrates a forehand groundstroke, which the participant should observe. 115 A. Appendix: User Study ‘It’s time for a bit of tennis background. The tennis player to your right performs a forehand groundstroke. It’s one of the most common strokes in tennis. Observe the technique and continue when you are ready.’ Step 4: Explanation of motion phases—In addition to the animation of a forehand ground- stroke, the five motion phases are listed and explained briefly. ‘The forehand stroke begins from the ready position, the racket is then brought back during the backswing. In the forward swing, the racket is accelerated toward the ball, where it should eventually hit it. This moment is called the contact point. After impact, follow through and finish the stroke with the racket over your shoulder.’ Step 5: Hit over the net—Floating balls appear one after the other on the dominant side of the participant. The objective is to hit three of them over the net and into the target. ‘Try out the forehand groundstroke. You will be automatically teleported to the ball. Hit 3 shots into the target to continue.’ Step 6: Meet Ball-E—Again, the objective is to hit three balls over the net and into the target, but this time the balls are fired from a ball machine towards the participant. ‘Meet Ball-E, our flying ball machine. Hit three shots with your forehand into the target to continue.’ Step 7: Controls—A 3D model of the controllers shows the participants where the A and X buttons are that they need to press to continue. ‘Well done! Press A or X on your controller to continue.’ After the completion of the VR tutorial, the participant is teleported to an indoor court, where the first performance test called "Skill Evaluation" takes place. A.2.0.2 VR Performance Test The VR performance test is conducted at the end of both VR sessions. It takes place on a virtual indoor tennis court that is also used during the tennis training session. The test is introduced to the participant with the following test and audio message. ‘Here, you’ll encounter sixteen targeted balls to assess your forehand performance. The test starts with a simple floating ball and then gradually increases in difficulty. Try to use a good forehand technique to the best of your knowledge.’ When the participant is ready, the performance test starts. After the test is completed, a panel pops up, instructing the participant to put down the headset to continue with some questionnaires. A.2.0.3 Intro to VR Training The second VR session starts with an intro to introduce the participant to the procedure and feedback modalities of the VR tennis training. There was no time limit. Step 1: Motion Replay—This step introduces the training process and motion replay. It demonstrates that the exercise is paused after each shot to provide the user with a 116 A.2. Experimental Procedure and Verbal Instructions replay of their motion. At the beginning of this step, the following audio message is played: ‘Hit the floating ball to see a replay of your swing’ After the participant performed the swing and hit the floating ball, another message is replayed. ‘You can now see a replay of your last shot. The line shows the path of your racket. When you’re ready to proceed to the next ball, press A or X on your controller.’ Step 2: UI below the Motion Replay—An UI is added to the motion replay and it is briefly explained how to interact with it. ‘Below the replay, you will find a user interface displaying the five phases of a forehand stroke. Interact with this interface to control the replay. Start by selecting the contact point. The replay will pause at the moment of impact. To resume, press the play button in the middle. When you are ready, move on to the next ball.’ Step 3: Analysis—The UI containing coaching rules appears and is briefly explained. ‘Your forehand technique is analyzed using coaching rules. Look to your left. There, you will find an interactive panel. The total score at the top reflects your performance and shows how many rules you got right. Below that, you’ll find a more detailed breakdown of your performance. Take a look to find how well you did in the individual motion phases or gain further insights into the coaching rules. Continue to the next ball whenever you are ready’ Step 4: Feedback—The verbal feedback component and main tab on the UI are introduced to the participant. ‘From now on, I’ll give feedback after each shot concentrating on one coaching rule at a time. You’ll find the suggested rule on the main tab on your left. Track your score there and check detailed instructions if needed. Continue to start with the training.’ A.2.0.4 VR Forehand Tennis Training ‘Let’s work on your forehand groundstroke. Practice the proper forehand technique for 10 minutes. After each shot, you can review your motion and I’ll offer auditory feedback on your technique. Another performance test will start after the training session. Begin when you are ready.’ 117 Overview of Generative AI Tools Used Grammarly and ChatGPT were used for spell-checking, grammar enhancement, and text editing to refine wording and sentence structures. Additionally, ChatGPT was used to generate utility scripts to help plot charts. DeepL Translator was used to translate some words or phrases between German and English. Grammarly. Grammarly Inc. https://www.grammarly.com ChatGPT. OpenAI. (GPT-4o and earlier versions). https://openai.com/chatgpt DeepL Translator. DeepL SE. https://www.deepl.com/en/translator 119 List of Figures 2.1 The first two sketches illustrate the properties of linear motion in its two types: (1) rectilinear motion, which follows a straight path, and (2) curvilinear motion, which follows a curved path. In both motions, points p and q travel the same distance at the same speed, where the lines connecting p and q stay parallel and remain the same length throughout the entire motion. The third sketch (3) illustrates the properties of angular motion. The points p and q rotate at the same angle around the center (axis of rotation) simultaneously. As point p is farther away from the center, it needs to cover a longer distance at the same time than q, which is closer. The lines connecting p and q remain the same length but not parallel [HKD15, McG13]. . . . . . . . . . . . . . 10 2.2 Illustration of the sensors and sources placement in outside-in, inside-out, and inside-in motion capture setups. . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Cardinal planes and their respective principal axes. . . . . . . . . . . . . . 16 2.4 Illustration of boolean relational features. . . . . . . . . . . . . . . . . . . 18 2.5 The four-task model of qualitative motion analysis, as presented by Knudson and Morrison [KM02]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.6 Overview of the components of extrinsic feedback, with Knowledge of Results (KR) referring to the outcome of an action and Knowledge of Performance (KP) referring to the technique and quality of the movement. . . . . . . . 29 4.1 Screenshot of the virtual environment from the trainees point of view. . . 48 4.2 Screenshot of the visual feedback components from the trainee’s point of view. 49 4.3 Schematic overview of our design process and motor learning system. . . . 51 4.4 Illustration of the C-looped stroke pattern characteristic of a modern forehand topspin stroke, spanning from the Ready Position up to the Contact Point. 53 4.5 Illustration of the six upper-body motion phases of the forehand topspin. 54 4.6 Examples of different possible racket paths during the backswing of a tennis forehand. They range from a high take-back characteristic of a modern topspin forehand over a flat forehand, a pendulum-like swing, and less conventional patterns such as multi-swings and looping motions. . . . . . . . . . . . . . 56 4.7 Example images illustrating coaching rules for the Ready Position (RP) and the Backswing (BS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.8 Snapshot from the motion replay that shows the entire captured ball trail. The recording ends with a short delay after the outcome of the shot is visible. 62 121 4.9 Snapshots from the motion replay. Aligned approximately with the sagittal, transverse, and frontal cardinal planes. The motion replay shows the cap- tured joints (HMD, racket, and non-dominant hand) and the racket’s path, segmented into the detected motion phases. . . . . . . . . . . . . . . . . 63 5.1 Simplified diagram illustrating the sequence of modules. . . . . . . . . . 65 5.2 Schematic representation of the motion phase’s temporal extends and segmen- tation process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3 3D racket positions of a forehand swing, segmented into our motion phases. The follow-through is highlighted in orange. . . . . . . . . . . . . . . . . . 70 5.4 Flowchart outlining the segmentation of the follow-through. The boundary frames of the contact point are denoted by cpBounds. . . . . . . . . . . . 71 5.5 The plot shows the captured racket speed relative to the HMD corresponding to the same swing depicted in Figure 5.3. The dots mark the local and global minima after the contact point and the two vertical markers represent the temporal bounds of the follow-through, identified using the phase segmentation algorithm described in the text. . . . . . . . . . . . . . . . . . . . . . . . . 72 5.6 Flowchart outlining the diagnosis of a coaching rule. . . . . . . . . . . . . 75 5.7 The interactive UI below the motion replay. The frame’s corresponding motion phase is highlighted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.8 Snapshot of a forehand swing with a pendulum-like motion, violating coaching rule BS:SWP:Pendulum. The image captures a frame during the backswing. The racket’s path during the backswing is highlighted to reflect the admin- istered feedback, with yellow indicating a mediocre performance score of this motion phase. The racket is color-coded to highlight the motion error (specifically, the racket head dropped too far during the backswing). Below the replay, the UI displays the color-coded motion phases, with the backswing selected to match the current frame and active motion phase. . . . . . . . 79 5.9 The main page of the UI feedback panel. The top row shows the overall performance and how many rules were performed correctly (note: this does not reflect all rules in our system). Small arrows indicate score changes since the previous attempt. Below that, feedback for the recommended coaching rule is displayed, including its score accompanied by color coding, a brief description, and an illustrative image. . . . . . . . . . . . . . . . . . . . . 81 122 5.10 The secondary page of the UI feedback panel, which users can switch to on-demand. While the top row remains identical to the main page, the lower section provides a more detailed performance breakdown. The left column displays performance metrics for individual motion phases. Users can select a phase to explore its coaching rules. The middle column lists the coaching rules for the selected motion phase, ordered by their score, with the lowest-scoring rule at the top. Users can select each coaching rule. The right column shows details for the selected coaching rule, similar to the main page but with the additional options to view a more detailed description or replay its verbal long description (LONG) audio file as needed. . . . . . . . . . . . . . . . . 82 5.11 Color palette for color coding motion errors and performance scores. The intervals indicate how score ranges are assigned to specific colors. . . . . . 82 6.1 Procedure of our user study (DQ denotes the Demographic Questionnaire). 86 6.2 Left: Reported motivation levels to practice tennis before and after our VR training, based on agreement (1 = strongly disagree, 7 = strongly agree) with the statement: “I feel motivated to practice tennis, whether in virtual reality or in real life”. Right: Distribution of differences between post-test and pre-test motivation levels. . . . . . . . . . . . . . . . . . . . . . . . . 88 6.3 Left: Reported confidence in the forehand tennis technique before and after our VR training, based on agreement (1 = strongly disagree, 7 = strongly agree) with the statement: “I am confident in my current forehand tennis technique”. Right: Distribution of differences between post-test and pre-test values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.4 Violin plots displaying performance metrics for the entire motion, measured during the performance tests pre and post VR training. The first three plots represent coaching rule adherence, and the last three represent overall performance scores. The plots display the lowest, average, and highest values achieved during the performance tests. Statistical differences were assessed using the Wilcoxon Signed-Rank Test (* = significant, n.s. = not significant). 90 6.5 Violin plots displaying performance scores for the six individual motion phases, measured during the performance tests pre and post VR training. Statistically significant differences are based on the Wilcoxon Signed-Rank Test (* = significant, n.s. = not significant). . . . . . . . . . . . . . . . . . . . . . . 92 6.6 The estimated trend of coaching rule adherence and overall performance over time during the training session. Left: 60s window; Right: 300s window; 1s increment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.7 Estimated trend of overall performance scores over the training session per participant. The time series are generated with a 300-second sliding window and 1-second time steps. The upper plot shows the trend of all 26 partici- pants. The bottom left plot shows all trends that resulted in a better overall performance at the end compared to the beginning, while the bottom right one shows all decreasing trends. . . . . . . . . . . . . . . . . . . . . . . . . 93 123 6.8 Perceived helpfulness of the feedback modalities for learning or practicing a tennis forehand stroke rated by a 7-point Likert scale ranging from least to most helpful (How helpful was feedback modality x?). . . . . . . . . . . . . 94 6.9 The left chart shows the number of participants who preferred each feedback modality. The chart on the right displays how many participants expressed a positive (pos.), negative (neg.) or neutral (neu.) sentiment about each modality. “UI List” refers to the list of coaching rules with their respective metrics provided in the UI. . . . . . . . . . . . . . . . . . . . . . . . . . . 94 124 List of Tables 2.1 Performance outcome and production measures in the tennis context based on examples provided by Magill and Anderson [MA10]. . . . . . . . . . . 24 2.2 Non-exhaustive list of advantages and disadvantages of expert- and data-driven approaches based on the following literature: [ZLER14, HGH+18, DKHH+15, GDF23, RWHH19, LVX+20, SPB+20, FMS+22]. . . . . . . . . . . . . . . 25 4.1 Lists the number of coaching rules implemented in our system per motion phase (Ready Position (RP), Backswing (BS), Forward Swing (FS), Contact Point (CP), Follow Through (FT), and Execution (EX)) and category. #Cb. denotes the number of combinatorial rules. . . . . . . . . . . . . . . . . . 60 4.2 Examples of coaching rules used for our rule-based motion analysis of the modern tennis forehand topspin. Each rule is linked to a motion phase and category. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.3 Main components of our feedback design. . . . . . . . . . . . . . . . . . . 62 5.1 Evaluation of the waist height estimator given in Equation 5.2. . . . . . . 67 5.2 Evaluation of the shoulder height estimator given in Equation 5.3. . . . . 67 5.3 Conditions to detect extrema and inflection points in our discrete data. . 68 5.4 Conditions to define the curvature in our discrete data. . . . . . . . . . . 69 5.5 Examples of the verbal feedback formulations provided when the user misses the ball or reaches a certain overall score. . . . . . . . . . . . . . . . . . . 80 5.6 Examples of the verbal feedback formulations depending on users improvement for coaching rules BS:FOC:WatchBall, CP:TIM:WaistLevel, FT:POS:AcrossBody, and FS:SWP:LowToHigh. . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.1 Overview of the experimental procedure, including the methodology used to address each evaluation-oriented research question (see Section 1.3). The full questionnaire can be found in Appendix A. Open-ended questions are abbreviated with OEQ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 125 6.2 Comparison of coaching rule adherence [A] and overall performance [O] be- tween pre-test and post-test. Statistics are presented for the lowest, average, and highest values achieved during performance tests. The positive/negative differences show how many participants increased/decreased the given metric from the pre-test to the post-test. The Wilcoxon Signed Rank Test results are reported as Z and p values with α = 0.05. . . . . . . . . . . . . . . . . 91 6.3 Statistics and Wilcoxon Signed Rank Test results (reported as Z and p values with α = 0.05) for the average motion phase performance scores measured during performance tests. The positive/negative differences show how many participants increased/decreased the given metric from pre-test to post-test. 91 6.4 VRSQ scores for the pre-test, post-test, and the difference (i.e., post−pre) are presented as mean (SD). Results from the Related-Samples Wilcoxon Signed Rank Test are reported as Z and p values (N = 26, α = 0.05). . . . . . . 95 6.5 Descriptions for negative, neutral, and positive sentiments. . . . . . . . . 95 6.6 Description of codes for the theme Outcome &Effect and the number of par- ticipants who mentioned them. The final column lists how many participants expressed a negative , neutral , or positive sentiment for each code (from top to bottom). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.7 Description of codes for the theme Experience &Usability and the number of participants who mentioned them. The final column lists, for each code, how many participants expressed a negative , neutral , or positive sentiment (from top to bottom). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.8 The most relevant codes for the theme Teaching Methodology Impressions, along with their descriptions, and the number of participants who mentioned the code. The last column shows the sentiment breakdown for each code, listing from top to bottom how many participants expressed a negative , neutral , or positive sentiment. . . . . . . . . . . . . . . . . . . . . . . . 100 6.9 Most relevant codes for the theme Feedback Design, along with their descrip- tions, and the number of participants who mentioned the code. The last column shows the sentiment breakdown for each code, listing from top to bottom how many participants expressed a negative , neutral , or positive sentiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 126 List of Algorithms 5.1 Evaluating the performance score for CP:RKT:FaceVertical (racket face is nearly vertical at impact). The positive y-axis corresponds to the upward direction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2 Determines if the previously selected coaching rule can be reapplied. . . 76 5.3 Select the coaching rule based on the highest rank [rank = rule.Error + weight ∗ normalized(rule.Priority)] and error sources (rules that might lead to follow-up errors in later rules). . . . . . . . . . . . . . . . . . . . 77 127 Acronyms BS Backswing. 54, 60, 69, 70, 106, 121, 125 CAVE Cave Automatic Virtual Environment. 32 CP Contact Point. 53, 54, 60, 61, 69, 70, 91, 106, 121, 125 CPR Cardiopulmonary Resuscitation. 33, 38 DTW Dynamic Time Warping. 19, 26, 27, 33 EX Execution. 54, 60, 91, 106, 125 FOV field of view. 52, 98, 99 FS Forward Swing. 54, 60, 69, 70, 106, 125 FT Follow Through. 54, 60, 69–72, 106, 125 HMD head-mounted display. 1, 32, 47, 52, 53, 58, 59, 61, 63, 71, 72, 85, 98, 99, 106, 122 IMU Inertial Measurement Units. 13, 14, 52 IPD interpupillary distance. 85 KP Knowledge of Performance. 29, 62, 63, 121 KR Knowledge of Results. 29, 33, 50, 62, 63, 86, 121 mocap motion capture. 3–5, 13–19, 21, 34, 35, 38, 39, 41, 44, 48, 50–54, 56, 58, 59, 65, 66, 68, 73, 110–112 OEQ open-ended questions. 84, 85, 125 RP Ready Position. 53, 54, 60, 69, 70, 90, 106, 121, 125 UI user interface. 48, 77–79, 81, 82, 84, 86, 93, 94, 97, 98, 103, 122–124 VR virtual reality. 1–7, 9, 14, 32, 34, 35, 37, 47–49, 52, 53, 84–92, 96–103, 105–107, 109–112, 123 VRSQ Virtual Reality Sickness Questionnaire. 86, 87, 94, 95, 126 XR extended reality. 43, 44 129 Bibliography [ACF+18] Danilo Avola, Luigi Cinque, Gian Luca Foresti, Marco Raoul Marini, and Daniele Pannone. Vrheab: a fully immersive motor rehabilitation system based on recurrent neural network. Multimedia Tools and Applications, 77:24955–24982, 2018. [ADDR04] Dennis A. Attwood, Joseph M. Deeb, and Mary E. Danz-Reece. 2 - personal factors. In Ergonomic Solutions for the Process Industries, pages 29–63. Gulf Professional Publishing, Burlington, MA, USA, 2004. [And09] K. H. Anderson. Coaching tennis: technical and tactical skills. Human Kinetics, Champaign, IL, USA, 2009. [ASL+19] Kamel Aouaidjia, Bin Sheng, Ping Li, Jinman Kim, and David Dagan Feng. Efficient body motion quantification and similarity evaluation using 3-d joints skeleton coordinates. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 51(5):2774–2788, 2019. [Bar15] Barrington Barber. Anatomy for Artists: A Complete Guide to Drawing the Human Body. Arcturus Publishing Limited, London, UK, 2015. [BDZC19] Louise Brennan, Enrique Dorronzoro Zubiete, and Brian Caulfield. Feed- back design in targeted exercise digital biofeedback systems for home rehabilitation: A scoping review. Sensors, 20(1):181, December 2019. [BS13] Jim Brown and Camille Soulier. Tennis: Steps to success. Human kinetics, Champaign, IL, USA, 2013. [CCL+19] Xiaoming Chen, Zhibo Chen, Ye Li, Tianyu He, Junhui Hou, Sen Liu, and Ying He. Immertai: Immersive motion learning in vr environments. Journal of Visual Communication and Image Representation, 58:416–427, January 2019. [CR07] Miguel Crespo and Machar M Reid. Motivation in tennis. British Journal of Sports Medicine, 41(11):769–772, November 2007. 131 [DBW+15] Joshua S Dines, Asheesh Bedi, Phillip N Williams, Christopher C Dodson, Todd S Ellenbecker, David W Altchek, Gary Windler, and David M Dines. Tennis injuries: epidemiology, pathophysiology, and treatment. JAAOS-Journal of the American Academy of Orthopaedic Surgeons, 23(3):181–189, 2015. [DKHH+15] Iwan De Kok, Julian Hough, Felix Hülsmann, Mario Botsch, David Schlangen, and Stefan Kopp. A multimodal system for real-time action instruction in Motor Skill Learning. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pages 355–362, Seattle, WA, USA, 2015. Association for Computing Machinery. [DMSD22] Daniele Di Mitri, Jan Schneider, and Hendrik Drachsler. Keep me in the loop: Real-time feedback with multimodal data. International Journal of Artificial Intelligence in Education, 32(4):1093–1118, December 2022. [Dou05] Stavros J Douvis. Variable practice in learning the forehand drive in tennis. Perceptual and motor skills, 101(2):531–545, October 2005. [DPdSYMM17] Augusto Dias Pereira dos Santos, Kalina Yacef, and Roberto Martinez- Maldonado. Let’s dance: how to build a user model for dance students using wearable technology. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, pages 183–191, 2017. [FCNH+18] Victor Fernandez-Cervantes, Noelannah Neubauer, Benjamin Hunter, Eleni Stroulia, and Lili Liu. Virtualgym: A kinect-based system for seniors exercising at home. Entertainment Computing, 27:60–72, 2018. [Fed24] International Tennis Federation. ITF Global Tennis Report 2024: A survey of tennis participation and performance worldwide, 2024. [FMR12] Catherine O Fritz, Peter E Morris, and Jennifer J Richler. Effect size estimates: current use, calculations, and interpretation. Journal of experimental psychology: General, 141(1):2, 2012. [FMS+22] Fotos Frangoudes, Maria Matsangidou, Eirini C. Schiza, Kleanthis Neok- leous, and Constantinos S. Pattichis. Assessing human motion during exercise using machine learning: A literature review. IEEE Access, 10:86874–86903, August 2022. [GBDM+22] Ronan Gaugne, Jean-Baptiste Barreau, Pierre Duc-Martin, Elen Esnault, and Valérie Gouranton. Sport heritage in vr: Real tennis case study. Frontiers in Virtual Reality, 3:922415, 2022. [GDF23] Robert A. Greenes and Guilherme Del Fiol. Clinical decision support and beyond: Progress and opportunities in knowledge-enhanced health and healthcare. In Clinical Decision Support and Beyond, pages 811–831. Elsevier, 2023. 132 [GK22] Mai Geisen and Stefanie Klatt. Real-time feedback using extended reality: A current overview and further integration into sports. International Journal of Sports Science & Coaching, 17(5):1178–1194, October 2022. [GRCR20] Cyril Genevois, Machar Reid, Thomas Creveaux, and Isabelle Rogowski. Kinematic differences in upper limb joints between flat and topspin forehand drives in competitive male tennis players. Sports biomechanics, 2020. [GSADMG23] Ronan Gaugne, Sony Saint-Auret, Pierre Duc-Martin, and Valérie Gouranton. Virtual reality for the preservation and promotion of histor- ical real tennis. In Computer Graphics International Conference, pages 400–411. Springer, 2023. [HASN23] Yuichiro Hiramoto, Mohammed Al-Sada, and Tatsuo Nakajima. Design- ing a 3d human pose estimation-based vr tennis training system. In 2023 IEEE 29th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), pages 267–268. IEEE, 2023. [HFS+16] Felix Hülsmann, Cornelia Frank, Thomas Schack, Stefan Kopp, and Mario Botsch. Multi-level analysis of motor actions as a basis for effective coaching in virtual reality. In Proceedings of the 10th International Symposium on Computer Science in Sports (ISCSS), pages 211–214, Loughborough, UK, 2016. Springer. [HGH+18] Felix Hülsmann, Jan Philip Göpfert, Barbara Hammer, Stefan Kopp, and Mario Botsch. Classification of motor errors to provide real-time feedback for sports coaching in virtual reality — a case study in squats and tai chi pushes. Computers & Graphics, 76:47 – 59, November 2018. [HKB17] Felix Hülsmann, Stefan Kopp, and Mario Botsch. Automatic error analysis of human motor performance for interactive coaching in virtual reality, September 2017. [HKD15] Joseph Hamill, Kathleen M. Knutzen, and Timothy R. Derrick. Biome- chanical basis of human movement. Wolters Kluwer Health, Philadelphia, PA, USA, 4th edition, 2015. [HZW+22] Jana Hoffard, Xuan Zhang, Erwin Wu, Takuto Nakamura, and Hideki Koike. Skisim: A comprehensive study on full body motion capture and real-time feedback in vr ski training. In Proceedings of the Augmented Humans International Conference 2022, pages 131–141, 2022. [IAR+23] Ricko Irawan, Mahalul Azam, Setya Rahayu, Heny Setyawati, S Adi, Bambang Priyono, Anan Nugroho, et al. Biomechanical motion of the tennis forehand stroke: Analyzing the impact on the ball speed using 133 biofor analysis software. Physical Education Theory and Methodology, 23(6):918–924, 2023. [IHK+18] Atsuki Ikeda, Dong-Hyun Hwang, Hideki Koike, Gerd Bruder, Shunsuke Yoshimoto, and Sue Cobb. Ar based self-sports learning system using decayed dynamic timewarping algorithm. In Proc. ICAT-EGVE, pages 171–174, Limassol, Cyprus, 2018. Eurographics. [JGZ+13] Hueihan Jhuang, Juergen Gall, Silvia Zuffi, Cordelia Schmid, and Michael J Black. Towards understanding action recognition. In Pro- ceedings of the IEEE international conference on computer vision, pages 3192–3199, 2013. [JR20] Shixin Jiang and Jun Rekimoto. Mediated-timescale learning: Manipu- lating timescales in virtual reality to improve real-world tennis forehand volley. In Proceedings of the 26th ACM Symposium on Virtual Real- ity Software and Technology, pages 1–2, Virtual Event, Canada, 2020. Association for Computing Machinery. [KE04] Duane Knudson and Bruce Elliott. Biomechanics of Tennis Strokes, pages 153–181. Springer US, Boston, MA, USA, 2004. [KGSK24] Peter Kán, Georg Gerstweiler, Anna Sebernegg, and Hannes Kauf- mann. Analysis of tennis forehand technique using machine learning. In ICAT-EGVE 2024 - International Conference on Artificial Reality and Telexistence and Eurographics Symposium on Virtual Environments, Tsukuba, Japan, 2024. Eurographics. [KJR13] David J Kooyman, Daniel A James, and David D Rowlands. A feedback system for the motor learning of skills in golf. Procedia Engineering, 60:226–231, 2013. [KM02] Duane V. Knudson and Craig S. Morrison. Qualitative analysis of human movement. Human Kinetics, Champaign, IL, USA, 2nd edition, 2002. [KMCAS18] Saif Khairat, David Marc, William Crosby, and Ali Al Sanousi. Reasons For Physicians Not Adopting Clinical Decision Support Systems: Critical Analysis. JMIR Medical Informatics, 6(2):e24, April 2018. [Knu06] Duane Knudson. Biomechanical principles of tennis technique: using science to improve your strokes. Racquet Tech Publishing, Vista, CA, USA, 2006. [KP20] Myeongsub Kim and Sukyung Park. Golf swing segmentation from a single imu using machine learning. Sensors, 20(16):4466, 2020. 134 [KPCC18] Hyun K. Kim, Jaehyun Park, Yeongcheol Choi, and Mungyeong Choe. Virtual reality sickness questionnaire (vrsq): Motion sickness measure- ment index in a virtual reality environment. Applied Ergonomics, 69:66– 73, 2018. [KRK23] Peter Kán, Martin Rumpelnik, and Hannes Kaufmann. Embodied Con- versational Agents with Situation Awareness for Training in Virtual Reality. In ICAT-EGVE 2023 - International Conference on Artifi- cial Reality and Telexistence and Eurographics Symposium on Virtual Environments. The Eurographics Association, 2023. [LE04] Chan-Su Lee and Ahmed Elgammal. Gait style and gait content: bilinear models for gait recognition using gait re-sampling. In Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings., pages 147–152. IEEE, 2004. [Lee02] Adrian Lees. Technique analysis in sports: a critical review. Journal of Sports Sciences, 20(10):813–828, 2002. [LHSI06] Jin-hee Lee, Su-Jeong Hwang Shin, and Cynthia L Istook. Analysis of human head shapes in the united states. International journal of human ecology, 7(1):77–83, January 2006. [LKK16] Jonathan Feng-Shun Lin, Michelle Karg, and Dana Kulić. Movement primitive segmentation for human motion modeling: A framework for analysis. IEEE Transactions on Human-Machine Systems, 46(3):325–339, 2016. [LLS+10] Johannes Landlinger, Stefan Lindinger, Thomas Stöggl, Herbert Wagner, and Erich Müller. Key factors and timing patterns in the tennis forehand of different skill levels. Journal of sports science & medicine, 9(4):643, 2010. [LNBRF21] Peter Le Noury, Tim Buszard, Machar Reid, and Damian Farrow. Ex- amining the representativeness of a virtual reality environment for simu- lation of tennis performance. Journal of Sports Sciences, 39(4):412–420, 2021. [LVX+20] Yalin Liao, Aleksandar Vakanski, Min Xian, David Paul, and Russell Baker. A review of computational approaches for evaluation of reha- bilitation exercises. Computers in biology and medicine, 119:103687, 2020. [LWMK20] Huimin Liu, Zhiquan Wang, Christos Mousas, and Dominic Kao. Vir- tual reality racket sports: Virtual drills for exercise and training. In 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 566–576. IEEE, 2020. 135 [MA10] Richard Magill and David I. Anderson. Motor learning and control: concepts and applications. McGraw-Hill, New York, NY, USA, 2010. [MAKD17] Marion Morel, Catherine Achard, Richard Kulpa, and Séverine Dubuis- son. Automatic evaluation of sports motion: A generic computation of spatial and temporal errors. Image and Vision Computing, 64:67–78, 2017. [McG13] Peter Merton McGinnis. Biomechanics of sport and exercise. Human Kinetics, Champaign, IL, USA, 3rd edition, 2013. [Men11] Alberto Menache. Understanding motion capture for computer animation. In Understanding Motion Capture for Computer Animation. Elsevier Science & Technology, Burlington, MA, USA, 2nd edition, 2011. [MKM+22] Katsutoshi Masai, Takuma Kajiyama, Tadashi Muramatsu, Maki Sugi- moto, and Toshitaka Kimura. Virtual reality sonification training system can improve a novice’s forehand return of serve in tennis. In 2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pages 845–849. IEEE, 2022. [MR08] Meinard Müller and Tido Röder. A relational approach to content-based analysis of motion capture data. In Human Motion, pages 477–506. Springer, 2008. [MRC05] Meinard Müller, Tido Röder, and Michael Clausen. Efficient content- based retrieval of motion capture data. In ACM SIGGRAPH 2005 Papers, SIGGRAPH ’05, page 677–685, New York, NY, USA, 2005. Association for Computing Machinery. [MSL19] Stefan C Michalski, Ancret Szpak, and Tobias Loetscher. Using virtual environments to improve real-world motor skills in sports: a systematic review. Frontiers in psychology, 10:2159, 2019. [MSS+19] Stefan Carlo Michalski, Ancret Szpak, Dimitrios Saredakis, Tyler James Ross, Mark Billinghurst, and Tobias Loetscher. Getting your game on: Using virtual reality to improve real table tennis skills. PLOS One, 14(9):e0222351, September 2019. [MWK22] Takashi Matsumoto, Erwin Wu, and Hideki Koike. Augmenting vr ski training using time distortion. In 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pages 570–571. IEEE, 2022. [NMT+18] David L. Neumann, Robyn L. Moffitt, Patrick R. Thomas, Kylie Loveday, David P. Watling, Chantal L. Lombard, Simona Antonova, and Michael A. Tremeer. A systematic review of the application of interactive virtual reality to sport. Virtual Reality, 22(3):183–198, September 2018. 136 [Nog12] Pedro Nogueira. Motion capture fundamentals. a critical and comparative analysis on real-world applications. In Doctoral symposium in informatics engineering, volume 1, pages 303–314, 2012. [NSBB23] Dario Novak, Filip Sinković, Zlatan Bilić, and Petar Barbaros. The effects of a short virtual reality training program on dynamic balance in tennis players. Journal of Functional Morphology and Kinesiology, 8(4):168, December 2023. [OII+19] Masaki Oshita, Takumi Inao, Shunsuke Ineno, Tomohiko Mukai, and Shigeru Kuriyama. Development and evaluation of a self-training system for tennis shots with motion feature assessment and visualization. The Visual Computer, 35(11):1517–1529, 2019. [OIMK18] Masaki Oshita, Takumi Inao, Tomohiko Mukai, and Shigeru Kuriyama. Self-training system for tennis shots with motion feature assessment and visualization. In 2018 International Conference on Cyberworlds (CW), pages 82–89. IEEE, 2018. [ÖS20] Serkan Örücü and Murat Selek. Design and validation of rule-based expert system by using kinect v2 for real-time athlete support. Applied Sciences, 10(2):611, 2020. [OSC22] Hawkar Oagaz, Breawn Schoun, and Min-Hyung Choi. Real-time posture feedback for effective motor learning in table tennis in virtual reality. International Journal of Human-Computer Studies, 158:102731, 2022. [Paq09] Steve Paquette. Anthropometric survey (ANSUR) II pilot study: methods and summary statistics. Anthrotch, US Army Natick Soldier Research, Development and Engineering Center, 2009. [RAB+00] F. D. Rose, E. A. Attree, B. M. Brooks, D. M. Parslow, and P. R. Penn. Training in virtual environments: transfer to real world tasks and equivalence to real task training. Ergonomics, 43(4):494–511, April 2000. [RCH+94] Robert Rosenthal, Harris Cooper, Larry Hedges, et al. Parametric measures of effect size. The handbook of research synthesis, 621(2):231– 244, 1994. [RDP99] Kathleen M Robinette, Hans Daanen, and Eric Paquet. The caesar project: a 3-d surface anthropometry survey. In Second international conference on 3-D digital imaging and modeling (cat. No. PR00062), pages 380–386. IEEE, 1999. [REC15] Machar Reid, Bruce Elliott, and Miguel Crespo. Tennis Science: How Player and Racket Work Together. University of Chicago Press, Chicago, IL, USA, 2015. 137 [Ren72] Contini Renato. Body segment parameters, part ii. In Artificial limbs. A review of current developments: Spring 1972. The National Academies Press, Washington, DC, USA, 1972. [RF20] Alen Rajšp and Iztok Fister. A systematic literature review of intelli- gent data analysis methods for smart sport training. Applied Sciences, 10(9):3013, 2020. [RG01] Paul Roetert and Jack L Groppel. World-class tennis technique. Human Kinetics, Champaign, IL, USA, 2001. [RK19] E. Paul Roetert and Mark Kovacs. Tennis anatomy. Human Kinetics, Champaign, IL, USA, 2019. [Röd06] Tido Röder. Similarity, retrieval, and classification of motion capture data. PhD thesis, Rheinischen Friedrich- Wilhelms- Universität, Bonn, 2006. [RW11] Joey Rive and Scott C Williams. Tennis skills & drills. Human Kinetics, Chicago, IL, USA, 2011. [RWA+17] Julia Richter, Christian Wiede, André Apitzsch, Nico Nitzsche, Chris- tiane Lösch, Martin Weigert, Thomas Kronfeld, Stefan Weisleder, and Gangolf Hirtz. Assisted Motion Control in Therapy Environments Using Smart Sensor Technology: Challenges and Opportunities, pages 119–132. Springer, March 2017. [RWHH19] Julia Richter, Christian Wiede, Ulrich Heinkel, and Gangolf Hirtz. Mo- tion evaluation of therapy exercises by means of skeleton normalisation, incremental dynamic time warping and machine learning: A comparison of a rule-based and a machine-learning-based approach. In VISIGRAPP (4: VISAPP), pages 497–504, 2019. [Sch17] Andrea Schärli. Functional movement analysis in dance. Handbook of Human Motion, 2017. [She13] Joseph Sheppard. Anatomy: A complete guide for artists. Courier Corporation, 2013. [SKK20] Anna Sebernegg, Peter Kán, and Hannes Kaufmann. Motion similarity modeling - a state of the art report. Technical Report TR-193-02-2020-5, Research Unit of Computer Graphics, Institute of Visual Computing and Human-Centered Technology, Faculty of Informatics, TU Wien, August 2020. [SMS+18] Kei Saito, Katsutoshi Masai, Yuta Sugiura, Toshitaka Kimura, and Maki Sugimoto. Development of a virtual environment for motion analysis of 138 tennis service returns. In Proceedings of the 1st International Workshop on Multimedia Content Analysis in Sports, pages 59–66, 2018. [SPB+20] Reed T. Sutton, David Pincock, Daniel C. Baumgart, Daniel C. Sadowski, Richard N. Fedorak, and Karen I. Kroeker. An overview of clinical decision support systems: benefits, risks, and strategies for success. npj Digital Medicine, 3(1):17, February 2020. [SRRW13] Roland Sigrist, Georg Rauter, Robert Riener, and Peter Wolf. Aug- mented visual, auditory, haptic, and multimodal feedback in motor learning: A review. Psychonomic Bulletin and Review, 20(1):21 – 53, 2013. [ŠŠPM19] Luka Šlosar, Boštjan Šimunič, Rado Pišot, and Uros Marusic. Validation of a tennis rating score to evaluate the technical level of children tennis players. Journal of Sports Sciences, 37(1):100–107, 2019. [SW00] Richard A. Schmidt and Craig A. Wrisberg. Motor learning and perfor- mance. Human Kinetics, Champaign, IL, USA, 2nd edition, 2000. [TAHK12] P. E. Taylor, G. J. M. Almeida, J. K. Hodgins, and T. Kanade. Multi- label classification for the analysis of human motion quality. In 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 2214–2218, San Diego, CA, USA, August 2012. IEEE. [TCKL13] Tran Thang Thanh, Fan Chen, Kazunori Kotani, and Bac Le. An apriori-like algorithm for automatic extraction of the common action characteristics. In 2013 Visual Communications and Image Processing (VCIP), pages 1–6. IEEE, 2013. [TF18] Xinmei Tian and Jiayi Fan. Joints kinetic and relational features for action recognition. Signal Processing, 142:412–422, 2018. [TNZ+24] Feng Tian, Shuting Ni, Xiaoyue Zhang, Fei Chen, Qiaolian Zhu, Chunyi Xu, and Yuzhi Li. Enhancing tai chi training system: Towards group- based and hyper-realistic training experiences. IEEE Transactions on Visualization and Computer Graphics, 2024. [TPD+16] Lili Tao, Adeline Paiement, Dima Damen, Majid Mirmehdi, Sion Han- nuna, Massimo Camplani, Tilo Burghardt, and Ian Craddock. A com- parative study of pose representation and dynamics modelling for online motion quality assessment. Computer vision and image understanding, 148:136–152, 2016. [Val16] Jakub Valčík. Similarity models for human motion data. Phd thesis, Masaryk University - Faculty of Informatics, Brno, Czechia, April 2016. 139 [VeaKSS22] Verena Venek et al., Stefan Kranzinger, Hermann Schwameder, and Thomas Stoeggl. Human movement quality assessment using sensor technologies in recreational and professional sports: a scoping review. Sensors, 22(13):4786, 2022. [VW18] Jan P Vox and Frank Wallhoff. Preprocessing and normalization of 3d-skeleton-data for human motion recognition. In 2018 IEEE Life Sciences Conference (LSC), pages 279–282. IEEE, 2018. [WHP+15] Thomas Waltemate, Felix Hülsmann, Thies Pfeiffer, Stefan Kopp, and Mario Botsch. Realizing a low-latency virtual reality environment for motor learning. In Proceedings of the 21st ACM symposium on virtual reality software and technology, pages 139–147, 2015. [WNPK20] Erwin Wu, Takayuki Nozawa, Florian Perteneder, and Hideki Koike. Vr alpine ski training augmentation using visual cues of leading skier. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 878–879, 2020. [WPNK21] Erwin Wu, Mitski Piekenbrock, Takuto Nakumura, and Hideki Koike. SPinPong-Virtual reality table tennis skill acquisition using visual, haptic and temporal cues. IEEE Transactions on Visualization and Computer Graphics, 27(5):2566–2576, May 2021. [YGVG12] Angela Yao, Juergen Gall, and Luc Van Gool. Coupled action recognition and pose estimation from multiple views. International journal of computer vision, 100:16–37, 2012. [YWW+20] Chih-Hung Yu, Cheng-Chih Wu, Jye-Shyan Wang, Hou-Yu Chen, and Yu-Tzu Lin. Learning tennis through video-based reflective learning by using motion-tracking sensors. Journal of Educational Technology & Society, 23(1):64–77, 2020. [Zat98] Vladimir M. Zatsiorsky. Kinematics of Human Motion. Human Kinetics, Champaign, IL, USA, 1998. [Zat02] Vladimir M Zatsiorsky. Kinetics of human motion. Human Kinetics, Champaign, IL, USA, 2002. [ZLER14] Wenbing Zhao, Roanna Lun, Deborah D. Espy, and M. Ann Reinthal. Rule based realtime motion assessment for rehabilitation exercises. In 2014 IEEE Symposium on Computational Intelligence in Healthcare and e-health (CICARE), pages 133–140, December 2014. [ZLX17] Songyang Zhang, Xiaoming Liu, and Jun Xiao. On geometric features for skeleton-based action recognition using multilayer lstm networks. In 2017 IEEE winter conference on applications of computer vision (WACV), pages 148–157. IEEE, 2017. 140 [ZREL17] Wenbing Zhao, M. Ann Reinthal, Deborah D Espy, and Xiong Luo. Rule-based human motion tracking for rehabilitation exercises: Realtime assessment, feedback, and guidance. IEEE Access, 5:21382–21394, 2017. [ZWK21] Xuan Zhang, Erwin Wu, and Hideki Koike. Watch-your-skiing: Visu- alizations for vr skiing using real-time body tracking. In 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pages 387–388. IEEE, 2021. 141