SynVis
Darstellung von Graphem-Farb-Synästhesie
mittels Augmented Reality
MASTERARBEIT
zur Erlangung des akademischen Grades
Master of Science
im Rahmen des Studiums
Media and Human-Centred Computing
eingereicht von
Christina Tüchler, BSc
Matrikelnummer 11908107
an der Fakultät für Informatik
der Technischen Universität Wien
Betreuung: Univ.Prof. Mag.rer.nat. Dr.techn. Hannes Kaufmann
Mitwirkung: Projektass.in Dipl.-Ing.in Dr.in techn. Katharina Krösl, BSc
Dipl.-Ing. Dr.techn. Daniel Cornel, BSc
Wien, 24. August 2024
Christina Tüchler Hannes Kaufmann
Technische Universität Wien
A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.at
SynVis
Digitising Grapheme-Colour Synaesthesia
Through Augmented Reality
MASTER’S THESIS
submitted in partial fulfillment of the requirements for the degree of
Master of Science
in
Media and Human-Centred Computing
by
Christina Tüchler, BSc
Registration Number 11908107
to the Faculty of Informatics
at the TU Wien
Advisor: Univ.Prof. Mag.rer.nat. Dr.techn. Hannes Kaufmann
Assistance: Projektass.in Dipl.-Ing.in Dr.in techn. Katharina Krösl, BSc
Dipl.-Ing. Dr.techn. Daniel Cornel, BSc
Vienna, 24th August, 2024
Christina Tüchler Hannes Kaufmann
Technische Universität Wien
A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.at
Erklärung zur Verfassung der
Arbeit
Christina Tüchler, BSc
Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die verwen-
deten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der
Arbeit – einschließlich Tabellen, Karten und Abbildungen –, die anderen Werken oder
dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter
Angabe der Quelle als Entlehnung kenntlich gemacht habe.
Wien, 24. August 2024
Christina Tüchler
v
Danksagung
An dieser Stelle möchte ich mich bei allen bedanken, die mich während der Erstellung
dieser Masterarbeit unterstützt haben.
Ein besonderer Dank gilt der VRVis Zentrum für Virtual Reality und Visualisie-
rung Forschungs-GmbH. Diese wird vom BMK, BMAW, Land Steiermark, Steirische
Wirtschaftsförderung - SFG, Land Tirol und Wirtschaftsagentur Wien - Ein Fonds der
Stadt Wien im Rahmen des von der FFG abgewickelten COMET - Competence Centers
for Excellent Technologies (879730) gefördert. Hier konnte ich im Rahmen eines dreimo-
natigen Praktikums viel Wissen aus der Forschung mitnehmen und meine Masterarbeit
starten.
Ich möchte meinen herzlichen Dank an Hannes Kaufmann aussprechen, der mich
während dieser Arbeit betreut hat. Er war stets per E-Mail erreichbar und hatte immer
ein offenes Ohr sowie Lösungen für meine Anliegen, sei es in Bezug auf Hard- oder
Software. Ebenso bedanke ich mich bei Katharina Krösl und Daniel Cornel, die mich
während meines Praktikums und darüber hinaus tatkräftig unterstützt und gefördert
haben. Unsere fast zweiwöchigen Jour fixes waren äußerst hilfreich, und ich konnte mich
immer auf ihr zeitnahes Feedback verlassen. Vielen Dank an euch alle, die Zusammenarbeit
mit euch hat mir große Freude bereitet!
Darüber hinaus möchte ich mich bei allen Teilnehmern der Evaluierungsphase (Exper-
teninterviews und Nutzerstudie) bedanken, ohne die diese Arbeit nicht möglich gewesen
wäre.
Nicht zuletzt möchte ich mich bei meinem Partner, meiner Familie und meinen Freun-
den bedanken, die mich jederzeit emotional unterstützt haben. Danke euch!
vii
Acknowledgements
I would like to take this opportunity to thank everyone who has supported me during
writing this Master’s thesis.
Special thanks go to VRVis Zentrum für Virtual Reality und Visualisierung
Forschungs-GmbH. This is funded by the BMK, BMAW, Province of Styria, Styrian
Business Promotion Agency - SFG, Province of Tyrol and Vienna Business Agency - A
Fund of the City of Vienna as part of the COMET - Competence Centres for Excellent
Technologies (879730) handled by the FFG. During a three-month internship here, I was
able to gain a lot of knowledge from research and start my Master’s thesis.
Many thanks to Hannes Kaufmann, who supervised me during this work. He was
always just an email away, ready to listen and provide solutions to my concerns, whether
hardware or software-related. I am also grateful to Katharina Krösl and Daniel
Cornel, who supported and encouraged me during my internship and beyond. Our
almost bi-weekly meetings were extremely helpful, and I could always rely on their
prompt feedback. Thank you all so much; working with you has been a truly enjoyable
experience!
I would also like to thank all participants in the evaluation phase (expert interviews
and user study), without whom this work would not have been possible.
Last but not least, I would like to thank my partner, my family and my friends, who
have supported me emotionally at all times. Thank you all!
ix
Kurzfassung
Die häufigste Form der Synästhesie, die Graphem-Farb-Synästhesie, verursacht einzigar-
tige Empfindungen, bei denen Buchstaben und Zahlen mit bestimmten Farben assoziiert
werden. Angesichts der rasanten technologischen Entwicklung, insbesondere im Bereich der
unterstützenden Technologien, untersucht diese Masterarbeit die visuelle Reproduktion
der Graphem-Farb-Synästhesie mithilfe von Augmented Reality (AR).
Ziel dieser Arbeit ist es herauszufinden, ob die individuell unterschiedlichen Wahrnehmun-
gen von Synästheten in einfache, maschinell implementierbare Regelwerke kodiert werden
können, die es Synästheten ermöglichen, schwarzen Text vor dem Lesen einzufärben.
Außerdem sollten die technischen Voraussetzungen für die Implementierung eines solchen
Systems ermittelt werden.
Daher wird eine Literaturrecherche durchgeführt, um herauszufinden, ob es bestimmte
wiederkehrende Muster in der Farbwahrnehmung von Synästheten auf Wortebene gibt.
Auf der Grundlage früherer Forschungsarbeiten und Experteninterviews mit einem Syn-
ästhesieforscher und einem Synästheten wird die Identifizierung und Formalisierung von
regulierenden Faktoren, die bestimmte Farben bei Synästheten hervorrufen, validiert.
Dies ermöglicht die Entwicklung eines Prototyps, der mobile AR zur Darstellung von
Graphem-Farb-Synästhesie-Wahrnehmungen verwendet. Die Anwendung ermöglicht die
Umfärbung von Text in der realen Welt mithilfe des Kamerabildes des Geräts nach
verschiedenen vordefinierten Regeln.
Zur qualitativen und quantitativen Analyse werden Experteninterviews, Benchmark-Tests
zur Bewertung der Performance der Anwendung (Framezeit, Antwortzeit und Fehlerquote)
sowie eine Nutzerstudie zur Ermittlung der technischen Machbarkeit durchgeführt. Die
Bewertung der verschiedenen Visualisierungsalternativen ergibt eine Präferenz für die am
wenigsten invasive Visualisierung. Die effektivste Methode ist die Einfärbung des Textes
in den entsprechenden Farben direkt auf der Pixelbasis des Kamerabildes. Aus Gründen
der Lesbarkeit bei dunklen Farben wird die Darstellung von Farbtönen als Hintergrund
hinter schwarzer Schrift nicht bevorzugt.
Die Arbeit befasst sich mit den technischen Hindernissen und schlägt Optionen für
zukünftige Forschung vor, die den Weg für weitere Forschung auf diesem Gebiet ebnen.
xi
Abstract
The most common type of synaesthesia, known as grapheme-colour synaesthesia, causes
unique sensations in which letters and numbers are associated with specific colours. With
the rapid expansion of technology, particularly in the field of assistive technology, this
Master’s thesis investigates the visual replication of grapheme-colour synaesthesia using
Augmented Reality (AR).
The goal of this thesis is to see if synaesthetes’ different individual perceptions could be
coded into simple, machine-implementable rule-sets that would allow synaesthetes to
pre-colour achromatic text before reading. Furthermore, it aims to establish the technical
requirements for implementing such a system.
Therefore, a literature review is conducted to find out if there are specific recurring
patterns in how synaesthetes perceive colours on the word level. Based on previous
research and expert interviews with a synaesthesia researcher and a synaesthete, the
identification and formalisation of the regulatory factors that elicit specific colours in
synaesthetes are validated. This allows for the creation of a prototype that uses mobile AR
to represent grapheme-colour synaesthesia perceptions. The app enables the recolouring
of real-world text using the device’s camera input, based on different rule-sets provided.
To analyse this qualitatively and quantitatively, the thesis includes expert interviews,
benchmark tests to assess the app performance (frame time (FT), response rate (RR), and
error rate (ER)), and a user study to determine technological feasibility. Evaluating the
various visualisation alternatives reveals a preference for minimum invasive visualisations.
The most effective method is to outline text in the appropriate colours directly on the
pixel basis of the camera picture. Visualising hues as backdrops behind black lettering,
on the other hand, is disliked due to readability concerns with dark colours.
The work addresses technical hurdles and suggests options for future research, opening
the door for more research in this area.
xiii
Contents
Kurzfassung xi
Abstract xiii
Contents xv
1 Introduction 1
1.1 Motivation and Problem Statement . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Questions and Approach . . . . . . . . . . . . . . . . . . . . 2
1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background and Related Work 5
2.1 Psychological Background . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Word Origin and Definition . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Types of Synaesthesia . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 Perception, Cognition and Personality . . . . . . . . . . . . . . 7
2.2 Grapheme-Colour Synaesthesia . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Related Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Inducer and Concurrent . . . . . . . . . . . . . . . . . . . . . . 10
Projective and Associative . . . . . . . . . . . . . . . . . . . . . 10
Lower and Higher Distinction . . . . . . . . . . . . . . . . . . . 11
Natural and Artificial . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Learning Synaesthesia . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.4 Neurological Perspective . . . . . . . . . . . . . . . . . . . . . . 13
2.2.5 Regulatory Factors . . . . . . . . . . . . . . . . . . . . . . . . 14
Shared Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Different Appearances . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Colouring Graphemes . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.1 Colour Pickers . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
xv
2.3.2 Studies and Applications . . . . . . . . . . . . . . . . . . . . . 18
2.4 Data Augmentation and Visualisation . . . . . . . . . . . . . . . . . . 20
2.4.1 Text Detection and Recognition . . . . . . . . . . . . . . . . . . 20
2.4.2 Relevant Visualisation Techniques . . . . . . . . . . . . . . . . 22
2.5 Reading Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 Design 27
3.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.1 Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.2 Non-Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Design of the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Word Colouring Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Visualisation and Colouring Types . . . . . . . . . . . . . . . . . . . . 30
3.5 Application Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5.1 Colour Definition for Each Grapheme . . . . . . . . . . . . . . 32
3.5.2 Rule-Set Definition for Word Colouring . . . . . . . . . . . . . 32
3.5.3 Visualisation Style Definition . . . . . . . . . . . . . . . . . . . 32
3.5.4 Text Scan Functionality . . . . . . . . . . . . . . . . . . . . . . 33
4 SynVis Implementation 35
4.1 Tech Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.1 Unity Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.2 Text Detection and Recognition . . . . . . . . . . . . . . . . . . 36
4.1.3 Used Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 User Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.1 Intuitiveness and How-To Guidance . . . . . . . . . . . . . . . 37
4.3.2 Simplicity and Mode Indication . . . . . . . . . . . . . . . . . . 38
4.3.3 Consistency and Colour Scheme . . . . . . . . . . . . . . . . . 38
4.3.4 User Data Persistence . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Algorithms and Techniques . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4.1 User Data Structure . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4.2 Detection and Colourisation Procedure . . . . . . . . . . . . . . 41
4.4.3 Texture Direct Pixel Manipulation . . . . . . . . . . . . . . . . 41
4.4.4 TMP Overlay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5 Testing and Evaluation Design 45
5.1 Benchmark Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.2 Implementation of Testing Environment and Tools . . . . . . . 46
5.1.3 Testing Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 Pilot Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.1 User Study Protocol . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.4 Expert Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.4.1 Selection of Experts . . . . . . . . . . . . . . . . . . . . . . . . 49
5.4.2 Interview Procedure . . . . . . . . . . . . . . . . . . . . . . . . 50
6 Results 53
6.1 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.1.1 Benchmark Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.1.2 Pilot Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Input Field Adjustment . . . . . . . . . . . . . . . . . . . . . . 56
Display Adjustments . . . . . . . . . . . . . . . . . . . . . . . . 57
Focus Mode Activation . . . . . . . . . . . . . . . . . . . . . . 57
Questionnaire Format . . . . . . . . . . . . . . . . . . . . . . . 57
6.1.3 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
System Usability Scale . . . . . . . . . . . . . . . . . . . . . . . 58
Visualisation Preferences . . . . . . . . . . . . . . . . . . . . . 58
Readability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2.1 Identified Themes . . . . . . . . . . . . . . . . . . . . . . . . . 61
Rule-Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Visualisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7 Discussion of Results 67
7.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.1.1 Colouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.1.2 Rule-Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.1.3 Recognition and Detection . . . . . . . . . . . . . . . . . . . . 70
7.2 Potential Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2.1 Colouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2.2 Rule-Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2.3 Recognition and Detection . . . . . . . . . . . . . . . . . . . . 71
7.2.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.2.5 Further Research and Development . . . . . . . . . . . . . . . . 72
8 Conclusion 73
List of Figures 75
List of Tables 79
Acronyms 81
Bibliography 83
Appendix 95
User Study Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Texts for Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
CHAPTER 1
Introduction
This thesis explores the technical feasibility of a prototype called SynVis, which enables
individuals with grapheme-colour synaesthesia to modify the colour of text to align their
personal perceptions with the real world text. It demonstrates the potential of achieving
this in augmented reality (AR) with a System Usability Scale (SUS) score of 88.75,
indicating an excellent user experience, according to the Sauro-Lewis curved grading
system [LUM15].
Synaesthesia is a condition that can be experienced by humans in which the reception of
regular inputs naturally results in unusual concurrent experiences. For example, one of the
most prevalent and most researched forms of synaesthesia is grapheme-colour synaesthesia,
which describes the phenomenon of experiencing colour when reading graphemes such
as letters or digits. This happens because for people with synaesthesia, the brain’s V4
(colour processing area), which is part of the visual cortex, has an abnormally high level
of connection to the regions that process graphemes [BHC+10, MS22]. Around 4% of the
total population experiences synaesthesia, with over 1% experiencing grapheme-colour
synaesthesia, which results in more than 90 million people [SMS+06, RG19, WS20].
Although synaesthesia is linked to high levels of creativity and improved recognition
recall, it also necessitates more work to resolve conflicts between sensory perceptions and
contemporaneous experiences. As synaesthesia is strongly linked to learning, ignorance
can lead to stigmatisation and a lack of adequate support [WAS+14].
1.1 Motivation and Problem Statement
While some advantages of synaesthesia, such as enhanced memory recall and increased
creativity, have been identified, the benefits and downsides of grapheme-colour synaes-
thesia, particularly in the context of reading, remain largely unexplored [GRJ12, Sim07].
It is unknown to what extent the downsides can be alleviated with assistive technology,
and if the benefits can be exploited for increased reading efficiency. Key knowledge and
1
1. Introduction
technologies to conduct this research are missing. While there are already tools that
allow the definition of colours for individual graphemes, like the "Synesthesia Battery"
[EKT+07] and the colouring of words like the "SeeSynesthete" Chrome Extension [Chr20],
these applications only function at the individual grapheme level. Given the considerable
individuality of induced colour experiences, it is challenging to determine the most
appropriate methodology for formalising them into a finite set of rules. It is also unknown
how to use such rules to approximate colour experiences algorithmically to maximise
congruence between reproduction and the synaesthetic experience it induces. It is not
even studied if and how colour perceptions induced by single letters are transferred to
words and full sentences. Answering these questions is hard, as synaesthesia is a complex
and highly individual phenomenon. In the course of this thesis, this gap is addressed.
In this thesis, visual stimuli are aligned with synaesthetic experiences in arbitrary texts
to reinforce the impression of the read text through targeted bundling of neurological
signals. This is realised on a novel research platform that allows for the visualisation of
individually coloured texts in AR.
1.2 Research Questions and Approach
This thesis aims to determine the feasibility of facilitating and reinforcing potential
benefits related to synaesthesia using contemporary technology.
The research questions of this thesis can be formulated as follows:
RQ1: How can we formalise and reproduce individual grapheme-colour synaesthetic
experiences on a digital screen?
RQ2: What technical developments are necessary to align an AR visualisation
with the experience of grapheme-colour synaesthesia?
To answer RQ1, an extensive literature review is conducted with the objective of for-
malising rule-sets, which are subsequently validated through an expert interview with a
psychologist specialised in synaesthesia research.
To answer RQ2, an application is created that allows users to select colours for graphemes
using a colour picker widget and a variety of suggested stylised text visualisations (e.g.,
solid colours or outlines) to define the appearance of graphemes and reproduce their visual
perception of grapheme-colour synaesthesia on a digital screen as closely as possible.
These visualisation techniques are adapted from previous research [EKT+07]. The
prototype is then expanded with character/text detection and AR to test its use with
printed text in any environment. Finally, the recoloured and restyled text is displayed on
the screen, see Figure 1.1.
2
1.3. Contribution
Figure 1.1: Screenshot of the prototype showing the text recolouring feature, which
allows users to personalise their reading experience.
In order to evaluate this approach and ascertain its usability, user testing is conducted
with twelve volunteers. Benchmark testing is carried out to test the performance and
ascertain the extent of potential errors. A psychologist in synaesthesia research, as well
as a synaesthete, are interviewed to validate the usefulness of the approach as a whole.
This mixed methods approach is primarily designed to demonstrate technical feasibility,
while leaving detailed functional assessment to psychologists.
The application of these evaluation techniques enables the determination of the technical
feasibility and utility of this type of assistive technology.
1.3 Contribution
Since reading is an important task in everyday life, this thesis serves as a cornerstone for
further research in the direction of natural synaesthetic reading. Synaesthetic experience
is an essential factor in text reading that can be managed for improved reading compre-
hension, which can be quantified with established reading comprehension assessments at
word, sentence, and text level [LFJH19a]. This thesis contributes by presenting:
• Validated formalised rule-sets that describe how synaesthetic colour perceptions
can be applied to words.
• A prototype that is capable of reconstructing synaesthetic perceptions through the
medium of AR.
• The results of a user study on the usability of this approach, as well as the preferred
visualisation style combined with readability ratings.
• Benchmark test results and potential further development and improvement ideas.
3
1. Introduction
• Insights into the general usage of assistive technology for synaesthesia.
Therefore, this project lays a solid foundation for further research and development.
1.4 Thesis Structure
At the beginning, Chapter 2, provides the necessary foundation knowledge on synaes-
thesia. Because this thesis combines psychology, computer science, user experience, and
visualisation, this chapter goes over the psychological foundations of synaesthesia, with
a particular emphasis on grapheme-colour synaesthesia and related types, as well as
the neurological perspective. It investigates various methods of determining colours for
graphemes, citing relevant research and applications. It also evaluates earlier research on
text detection and recognition, as well as various possible visualisation techniques.
Chapter 3 defines the methodological approach and design decisions. It encompasses the
functional and non-functional requirements, the system’s design, the determination of
the word colouring rules, the selection of different visualisation and colouring styles, and
the application features.
The Chapter 4 focuses entirely on the technical aspects of the development phase of
SynVis. It details the technology stack, the user flow, the user interface (UI), as well as
specific algorithms and techniques used in the implementation.
The testing strategy to thoroughly evaluate the application is described in Chapter 5.
Chapter 6 presents the results, which are divided into two sections: quantitative results
and qualitative results. The benchmark tests and the user study are evaluated statistically,
whereas the expert interviews are evaluated thematically.
Chapter 7 discusses the findings, stressing the study’s shortcomings and proposing
possibilities for future research.
The thesis finishes with Chapter 8, which is a brief conclusion of the entire work.
4
CHAPTER 2
Background and Related Work
The following chapter presents the background that forms the theoretical basis for the
thesis, as well as the literature already available on these topics. Additionally, the
“state-of-the-art” is elaborated on in this section.
2.1 Psychological Background
Researchers and academics from multiple disciplines have been drawn to synaesthesia,
a neurological condition that is characterised by the unintentional mixing of sensory
experiences. People that possess this ability are referred to as synaesthetes, and they are
able to interpret certain stimuli, such as letters, numbers, or noises, as evoking extrasensory
reactions, such as colours, forms, or tastes. This section focuses on investigating the
psychological aspects of synaesthesia to understand the typology, its effects on cognition,
emotion, and perception.
2.1.1 Word Origin and Definition
The word synaesthesia comes from the Greek and is split in two parts "syn" meaning
"union" or "together" and "aisthesis" meaning "sensation" or "perception", so it can be
understood as joint sensation or union of the senses [Cyt89, Cyt95]. In other words, the
term "synaesthesia" was created to characterise the state in which sensory experiences
converge, leading to the simultaneous and uncontrollable perception of several senses in
response to a single sensory stimulation. This word has been used as an umbrella term
for mixing senses in various contexts [Mei22]. It was first used by the German physician
and philosopher Georg Sachs in the 19th century [JDW09].
5
2. Background and Related Work
2.1.2 Types of Synaesthesia
Based on Sean A. Day [Day22] there are at least 75 different types of synaesthesia (see
Figure 2.1) but as research goes on, more types of synaesthesia are explored, since each
individual with synaesthesia may have a unique combination of sensory associations.
Figure 2.1: Seventy-five types of synaesthesia (Sean A. Day). Left column: inducers; top
row: concurrents. White: documented; red: unrecorded; black: not a type. (Reprinted
from: [Day22])
6
2.1. Psychological Background
According to Sean A. Day [Day22], the most prevalent form of synaesthesia is grapheme-
vision synaesthesia, in which for instance letters, numbers, or shapes evoke specific colours,
which currently have 162 million people in the world. The second-most known form is
time unit-vision synaesthesia, which allows people to interpret time units like days or
months as visually different colours, forms, or patterns. Chromaesthesia, in which for
instance sounds stimulate perceptions of colours, combining music with a spectacular
visual experience, is also a common form of synaesthesia. Another common kind of
synaesthesia is called spatial-sequence synaesthesia, which arranges numerals, months,
and days of the week into spatial patterns. Not so common forms include for instance
lexical-gustatory synaesthesia, in which words or specific sounds evoke tastes or mirror
touch synaesthesia, which allows individuals to physically feel the feelings felt by others
when seeing their touch or physical interactions. [SS15, MR13a]
People who have synaesthesia can, in some cases, experience numerous forms at once, a
condition known as "co-occurrence" or "multiple synaesthesia" [SMS+06, NCE11, Mei22].
2.1.3 Perception, Cognition and Personality
Developing synaesthesia is regarded as a typical cognitive variance in the general popula-
tion [War12]. Unfortunately, synaesthetes are frequently misunderstood, which makes
people avoid talking about their experiences and causes scientific research to understate
the incidence of synaesthesia in the general population [Day13, SS15].
Most research on synaesthesia focuses on its causes and the neurological mechanisms in-
volved. Studies investigating the positive and negative effects of synaesthesia on cognitive
abilities mostly focus on enhanced memory abilities in grapheme-colour synaesthetes as
compared to non-synaesthetes. They conclusively suggest a persistent memory advantage
in memory/recall tasks [LM20, GNCHCG11, SS15]. The study by Simner and Bain
[SB17] mentions benefits of synaesthesia in tasks testing processing speed and memo-
ry/recall of letters. A follow-up study by Smees et al. [SHCS19] reports significantly
enhanced performance in expressive and receptive vocabulary tests compared to non-
synaesthetes, but no benefits in sentence comprehension. The study by Palmeri et al.
[PBM+02] also clearly shows that synaesthetes are able to recognise shapes or patterns
way better because of the colouring of the graphemes than non-synaesthetes. What is
found by Mannix and Sørensen [MS22] is that synaesthetes are significantly poorer in
recognising people’s faces. Since it has a huge impact on the cognitive function, the study
by Sinke et al. [SNZ+14] demonstrates that synaesthetes are worse in speech perception,
while the study by McCarthy and Caplovitz [MC14] reveals a similar finding with regard
to motion perception.
Early studies indicate that synaesthesia occurs early in the processing of perception,
showing real sensory connections. Based on this evidence, one study by Ramachandran
and Hubbard [RH01] discovers that synaesthetes demonstrate a greater ability to recognise
more geometric shapes composed of digits when compared to their non-synaesthete
counterparts. Researchers found out that, synaesthetes score higher on the personality
7
2. Background and Related Work
trait of openness to experience, may score lower on the traits of agreeableness and
neuroticism, and have greater levels of schizotypy and inventiveness [Mei22]. The first one
mentioned suggests a stronger appreciation for novelty and creativity [LM18, WTLEK08,
Mei22, SS15].
A number of studies have already been conducted with the aim of making the benefits
of synaesthesia accessible to a wider audience. These studies span a variety of fields,
including the work of Reif and Alhalabi [RA18] on virtual reality (VR) induced artificial
synaesthesia, which seeks to guide patients’ attention for medical and therapeutic purposes,
such as pain relief.
2.2 Grapheme-Colour Synaesthesia
Grapheme-colour synaesthesia is documented as one of the most prevalent among the
75 known types [Day22]. Individuals with this form involuntarily and automatically
experience non-coloured, achromatic graphemes (letters and digits/numbers) as coloured
[HYS20, PVdSN11, PBM+02]. Consequently, this type involves a blending of two senses:
visual perception and colour perception.
Patricia Duffy, a colour synaesthete, describes in her book "Blue Cats and Chartreuse
Kittens: How Synesthetes Color Their Worlds" [Duf01] a scenario that occurred while
she was learning to write the alphabet:
To make an R, all I had to do was first write a P and then draw a line down
from its loop. And I was surprised that I could turn a yellow letter into an
orange letter just by adding a line.
This accurately describes the experience of synaesthetes, for whom graphemes (letters
and numerals) consistently elicit or trigger specific colour experiences. The "emotional
meaning" of this form of synaesthesia is crucial to note, since it might make someone
feel uncomfortable or disturbed to see symbols in hues that do not match their personal
associations [The21]. Patrizia Puff, for instance, could feel harmonic and right when she
sees the letter "P" when it is yellow, but she might feel uncomfortable and incorrect when
she sees it when it is green or violet, which might slow down her processing of letters
[ANS15].
What should be emphasised is that the colours of each letter/digit are highly individual
perceived by each and every person, and there is no single solution that fits for everyone.
Even monozygotic twins do not share the same colour associations, as found by Rich et
al. [RBM05], but there are some studies on colours based on different interviews with
synaesthetes that showed some similarities (see Section 2.2.5).
As mentioned in Section 2.1.3 different cognitive effects of synaesthesia, such as those
on memory, creativity, and imagery, may be experienced by synaesthetes. Researchers
in the field of synaesthesia often mention the "atypical cross activation" [SMS+06] or
8
2.2. Grapheme-Colour Synaesthesia
the "hyperconnectivity" [Mei22] of the brain. This can be described by the "Semantic
Representation of Synaesthesia" [Mei13] by Beat Meier (see Figure 2.2).
Figure 2.2: Illustration of semantic network activations in response to the letter "A":
synaesthete with red colour experience (left) vs. non-synaesthetic control (right).
(Reprinted from: [Mei13])
The paper demonstrates how a synaesthete who perceives colours for both letters and
words could link words depending on prominent vowels or starting letters. For instance,
the colour red is evoked by the letter "A", and words like "animal", or "apple" also do
the same. The links between colours and other items, like a rose and a fire engine, are
made more easily in synaesthetes than in non-synaesthetes thanks to these synaesthetic
associations, which generate an enhanced semantic network. Compared to those without
synaesthesia, the synaesthete’s increased semantic network enables them to produce
intriguing ideas and thoughts, enhancing the scope of experiences. [Mei13]
2.2.1 Related Types
There are several related types of synaesthesia which are related to each other. Beyond
basic colour associations, some synaesthetes have grapheme-shape/colour/texture/image
synaesthesia. Along to seeing colours, they might also be able to make out other sensory
elements such as shapes, or textures.
One of these linked types is "phoneme-colour" synaesthesia, where colours are connected to
spoken words based on how they sound rather than their written form [BMB+23, Sim07].
The difference here is, that a grapheme is the smallest unit of written language (a letter),
whereas a phoneme is the smallest unit of speech distinguishing one word from another
(a sound). Thus, hearing particular phonemes in a word may cause one to perceive
particular colours.
9
2. Background and Related Work
In some synaesthesia forms, like in "lexeme-colour" and "morpheme-colour" synaesthesia,
colours are associated with diverse portions of words in synaesthesia, not only with
particular letters or phonemes [BCG16]. The "morpheme", which provide grammatical
meaning, and the "lexeme", which is the word’s root, both have distinctive colour
connotations.
2.2.2 Terminology
Specific synaesthesia impressions are described in specialised vocabulary. This is explained
in this section.
Inducer and Concurrent
There are two important terms in regard to synaesthesia, named inducer and concurrent,
which describe specific aspects of perception. The inducer refers to the "triggering
stimulus", meaning the information or stimulus that causes or elicits the synaesthetic
perception [CR14, Mei22]. This can be for example a sound, a letter, a number, a scent,
a flavour, or even concepts.
The concurrent on the other hand refers to the "resultant experience", meaning it refers
to the sensation which comes from a reaction of certain exposed stimuli, so the additional
sensory experience or perception [CR14, Mei22]. This can be anything related to another
sense like for example a colour, a shape, a tactile sensation, a temperature sensation or
for instance spatial perceptions.
Based on the type of synaesthesia, this inducer-concurrent relationship includes different
sensory modalities [CR14].
Projective and Associative
Generally synaesthetic experiences can be present or perceived by individuals in two
different ways, either projective or associative (see Figure 2.3) referring to differences in
the concurrent [WLSS07]. The term "projector" is used to refer to those who perceive their
synaesthetic connections as if the additional sensory experiences are projected outwardly,
appearing in exterior space, or as if they were actually present in the environment, also
known as "out there on the page" [WLSS07, HYS20, CWT+15]. An individual who has
grapheme-colour synaesthesia, for instance, can perceive the colours connected to letters
or numbers as hovering in front of or covering the actual objects. This type of experience
is perceived just by the minority of synaesthetes, which is by approximately 10% [DS05].
Individuals that feel synaesthetic correlations in an internal, subjective sense, are referred
to as "associators", who not actually make the associations in the real surroundings but
just internalised "in their mind’s eye" [WLSS07, HYS20, CWT+15]. An associator with
grapheme-colour synaesthesia, for example, may mentally visualise the colours associated
with letters or numbers without actually seeing them in the external world.
10
2.2. Grapheme-Colour Synaesthesia
Individual Synaesthesia Experience Questionnaires (ISEQs) [SLM09] were developed to
distinguish between projectors and associators, enabling researchers to classify people
based on their subjective experiences and perception of synaesthetic relationships.
Figure 2.3: Differences in perception of letters, numbers, or words by projectors and
associators. Top: two projectors; bottom: three associators. (Reprinted from: [The21]
and [SLM09])
Lower and Higher Distinction
One can distinguish between two different levels at which the synaesthetic experience
is triggered, the lower and the higher distinction, referring to differences in the inducer
[WLSS07]. Lower synaesthesia is triggered by sensory or perceptual elements of the
stimuli that are instantaneous, whereby physical aspects of the stimuli, such as its form,
colour, or texture, might influence synaesthetic perception [WLSS07].
Higher synaesthesia, on the other hand, is characterised by the involvement of abstract
or conceptual variables in the initiation of synaesthetic sensations, which implies that a
stimulus’s symbolic or linguistic meaning affects how it is perceived by the brain, which
is important for synaesthesia [WLSS07].
Natural and Artificial
Generally, a distinction can be made between natural synaesthesia and artificial synaes-
thesia. Natural synaesthesia describes synaesthetic experiences that occur spontaneously
in people without outside help or modification. It emerges typically in early stages
of development [SHC+08]. Natural synaesthetes have constant and involuntary links
between sensory inputs, allowing them to perceive extra sensory experiences that are not
typically associated with the initial stimulus [RA18].
11
2. Background and Related Work
Artificial synaesthesia, on the other hand, refers to synaesthetic experiences that are
generated or assisted through external methods, frequently requiring some sort of sensory
stimulation or techniques, to enable anyone to experience synaesthesia. To create artificial
synaesthesia, so-called sensory substitution devices (SSDs) are used, which can be for
instance virtual reality or transcranial magnetic stimulation [SS15]. These seek to elicit
cross-modal associations or sensations from people who do not already have synaesthesia.
[RA18, WJG+21]
The study by Luke et al. [LLFT22] shows that synaesthesia can also appear in temporary
altered states of consciousness in which visual and aural hallucinations co-occur, which
are frequently generated by psychedelic substances [Mei22].
2.2.3 Development
Some researchers are investigating the development of synaesthesia in individuals. In
a particular study by Witthoft et al. [WWE15], the authors explore whether there are
similarities in how grapheme-colour synaesthesia develops. Some literature suggests that
coloured toys, television, or even refrigerator magnets might impact the development of
this type of synaesthesia [WW06, WW13, MR13b]. However, a user study by Witthoft
et al. [WWE15] of 6588 synaesthetes shows that only 1 out of 6 grapheme-colour
synaesthetes appear to have learned the associations through a coloured toy in a span
of 10 years. Some literature indicates that the development of synaesthesia is a lengthy
process, where the colours vary during the developmental phase of synaesthesia, as the
longitudinal study by Simner and Bain [SB13] discovers that 34% of these letter-colour
associations are fixed at the age of 6 to 7, 48% at the age of 7 to 8, and 71% at the age
of 10 to 11.
Consistency
There is also literature on its later stages, such as whether letter-colour associations
remain consistent throughout a person’s life or change in some way, since consistency
is a fundamental characteristic of synaesthesia. For instance, a study by Meier et al.
[MRW14] shows that associations with primary colours remain largely unchanged, but
that as people get older, bright colours become less frequent and more subdued colours
like brown and achromatic tones become more common. Uno et al. [UAY21] conduct a
longitudinal study examining the letter-colour associations of alphanumeric and Japanese
characters. They discover that whether the colour remains the same or not depends on
the grapheme, with more frequently used grapheme associations remaining the same and
less frequently used graphemes changing their colour over time.
Learning Synaesthesia
According to studies by Colizoli et al. [CMR12, CMSR17] learning to correlate graphemes
with colours is interesting when it comes to colouring text because of the potential
advantages. The authors pre-colour books for the participants and run Stroop tasks
12
2.2. Grapheme-Colour Synaesthesia
[Str35] with them before and after reading the book to compare the outcomes. It is found
that this "pseudo synaesthesia" can be trained. However, because this study focuses on
the outcome of the trained type of synaesthesia rather than the process and experience
of reading per se, the coloured book serves just as an artefact and is not central to the
study. A further study is conducted by Schwartzman et al. [SOR+23] to ascertain the
extent to which the two groups, designated the induced synaesthesia-like (ISL) group
and the natural occurring grapheme-colour synaesthetes (NOS), can be compared. They
find that besides the fact the ISL share the similarities of NOS, but that the participants
report that in the ISL the associations occur more or less "wilful" and in the NOS group
automatically, that the induction or an intensive "training of letter-colour associations
can alter the conscious perceptional experiences of non-synaesthetes".
2.2.4 Neurological Perspective
Researchers, especially neuroscientists, interested in understanding the nature of percep-
tion, cognition, and the adaptability of the human brain, identify several brain regions
that are consistently active during grapheme-colour synaesthesia. Specifically, the colour
area V4 and the posterior temporal grapheme areas (PTGA) are identified as regions
of the brain that are consistently active during the occurrence of this condition and
are researched as significant locations of interest. When viewing achromatic (black and
white) graphemes, the PTGA is activated in both synaesthetes and non-synaesthetes
because it is involved in processing letters and numbers, but V4 is only activated in
synaesthetes because only synaesthetes perceive the sensation of colour (see Figure 2.4).
[BHC+10, HARB05, SPL+06]
Figure 2.4: Representative synaesthete (A–C) and control (D–F) brains. Grapheme ROI
(light blue) and V4 ROI (dark blue) are shown. A and D: ROIs on non-inflated cortical
surfaces. B and E: ROIs on inflated brains; yellow box highlights region in C and F.
Synaesthetes showed activation in both grapheme (light blue) and V4 (dark blue) ROIs
when viewing achromatic letters and numbers (C). Controls showed activation only in
the grapheme ROI (light blue) (F). (Reprinted from: [BHC+10])
13
2. Background and Related Work
Based on these findings, conventional studies based on the physical colours perceived
through the retina cannot entirely explain the phenomenon [HYS20]. Researchers believe
that this mechanism is not limited to V4, the parietal lobe, and the fusiform cortex, but
may involve a more extensive network of brain regions [SNE+12].
2.2.5 Regulatory Factors
Regulatory Factors shape the inducer-concurrent relationship [RR21]. This section
elaborates on the similarities and differences of these relationships among synaesthetes.
Shared Codes
It is attempted to identify potential similarities in the inducer-concurrent relationships
in the perception of graphemes [Day04, WWE15]. Certain trends are observed, such
as "B" often appearing blue, "C" predominantly yellow, and "I" and "O" frequently
white. It is also found that colour associations are influenced by language and culture
[SWL+05, BK69].
Since most research has been conducted with English-speaking synaesthetes, it is noted
by Root et al. [RAM+21] that synaesthetic associations are influenced by linguistic and
prelinguistic factors. This spurs interest in how the inducer-concurrent relationships of
graphemes might vary across different languages. Root et al. [RRA+18] additionally
focus on the association between the letter "A" and the colour red across five languages.
Given the lack of a universal approach to colouring each grapheme in a way that aligns
with the experiences of all synaesthetes, the "Synesthesia Battery" [Dav24b, EKT+07]
was developed. This platform serves as a tool for individuals to test for synaesthesia and
as a resource for researchers, offering a comprehensive database of perceptions and their
associated colours. It includes a colour picker, allowing individuals to choose colours
for graphemes from a palette of 16.7 million options [BCG16] (see Figure 2.7). By
collecting this data, the developers aim to identify common colour associations among
synaesthetes. The aggregated data enables the authors to create graphs showing the
most frequent colours associated with each grapheme across all participating synaesthetes.
This platform has also been validated by other researchers [CDS+15].
Additionally, there are publicly accessible tools where synaesthetes can share their unique
colour experiences, such as adding a row in a publicly accessible Google Sheets File
[Goond] (see Figure 2.5).
14
2.2. Grapheme-Colour Synaesthesia
Figure 2.5: Screenshot of the publicly available shared GoogleSheets file [Goond] for
colouring cells in the experienced colour. (Reprinted from: [The21])
Different Appearances
Existing research shows the varied appearances of individual graphemes, but it has
less often focused on the appearance of whole words/texts. Nevertheless, a range of
manifestations exist in how words are perceived in colour. It is mentioned by Blazej et al.
[BCG16] that, for entire words and texts, perceiving each letter in its individual colour is
uncommon and instead, the colour of a word is often influenced by the colour of its initial
letter. This fact is also stressed by Simner et al. [Sim07, SGM06] since they mention
that the different colours of the letters compete and then only one colour dominates the
colour of the entire word. For compound words, additional colours can be affected by the
first letter of the second morpheme, indicating an interaction between linguistic structure
and synaesthetic perception.
Similar findings are reported by Sidoroff-Dorso et al. [SDJDuLM20] via an interview:
[..] the meaning of a word is inseparable from its “color shell”. The first letter
(I heard this is often the case with synesthetes) gives the word a dominant
tone. For example, the word “trait” is vibrant, dark red, because the letter
“t” is painted in this colour. The remaining letters are superimposed on the
“background” tone like a mosaic, set by the first letter.
Another synaesthete, interviewed by Sidoroff-Dorso et al. [SDJDuLM20], highlights the
influence of when a word is learned:
The colour of other words, typically learned later, tended to be driven by the
colour of the first letter.
15
2. Background and Related Work
Additionally, an earlier study by Simner [Sim07] suggests the impact of initial vowels on
word colouring. This research proposes that word colour is influenced not only by the
first letter but also by the first or stressed vowel. Syllable stress is identified as a primary
factor, with letter position being secondary.
Blazej et al. [BCG16] deals with appearances of whole words for people with grapheme-
colour synaesthesia and finds that for their participant most of the words are coloured in
the colour of the first letter, except for words staring with "I" or "O", which appear white
in isolation, these words are mostly shaded in a lighter shade of another letter appearing
in the word (see Figure 2.6).
(a) Example of "improvise". (b) Example of "output".
Figure 2.6: Appearance of words starting with "i" or "o". Left: experienced word; right:
individual letter colours. (Source: [BCG16])
2.3 Colouring Graphemes
This section looks at how graphemes are coloured. Different approaches to providing
colours to graphemes are investigated. In addition, light is shone on current research and
applications.
2.3.1 Colour Pickers
The developers of the "Synesthesia Battery" [Dav24a, EKT+07] use RGB values for the
colour picking task (see Figure 2.7) which is criticised by Blazej et al. [BCG16], as
they mention that the sensitivity could be improved by using other measures of colour
difference. Instead of using RGB values, they convert these RGB values to the CIE
L*a*b* colour space, which provides a more uniform colour space that better matches
human perception.
Figure 2.7: Screenshot of the synaesthesia battery’s example grapheme-colour picker test.
(Source: [Dav24a])
16
2.3. Colouring Graphemes
In the work by Rothen et al. [RSWW13] the authors use the presence of colour pickers
to let participants choose grapheme experiences from a huge colour palette on several
occasions and then measure the consistency of these selected colours. They compare
RGB and HSV colour representations with CIE L*a*b* and CIE L*u*v* colour models
and find that the latter ones can be used for maximising the sensitivity and specificity in
relation to other currently used measures for assessing synaesthesia.
Hamada et al. [HYS20] conduct a user study in which synaesthetes are first asked to
participate in a colour-selection task, where the researchers show them character cards
and the participants have to choose the most appropriate colour according to their
experience from the Munsell Book of Color, Matte Edition [Mun24]. This is analysed in
the CIE L*a*b* colour space, and then transferred to the CIE L*u*v* colour space to
match the Cambridge Colour Test technique.
A custom-made circular colour palette with adjustable luminance is created in the work
by Ásgeirsson et al. [ANS15] which is used to track and select the colours for individual
letters and digits one-by-one in the synaesthetic perception (see Figure 2.8).
Figure 2.8: Colour picker that allows the selection of a colour range and brightness
adjustment. (Reprinted from: [ANS15])
Kim et al. [KHL22] mention and oppose different colour picker possibilities on different
technological devices (desktop, mobile, VR) and mention the characteristics of the
different ones. Additionally, they develop a three-dimensional RGB and HSV colour
picker for virtual reality. They determine that, in the future, they will focus on developing
a perceptually uniform colour model, such as CIE L*u*v* or CIE L*a*b*. This decision
is based on the fact that they face challenges when selecting colours in certain regions of
RGB or HSV, such as the cut-off boundaries or, for instance, in the cone of HSV, given
the vast number of colours that can be represented in such a limited space.
It is also relevant which colour picker is used on which device since the colour selection
could change and some colour representations are in some situations not so suitable and
handy as others [KHL22].
In contrast to the aforementioned studies, the works by Spiller et al. [SHM+19], Simner et
al. [SMS+06] and Meier et al. [MRW14] opt for an alternative approach. They pre-define
17
2. Background and Related Work
a number of colour options prior to conducting the user study, resulting in a limited
range of random colours being made available for selection (see Figure 2.9).
Figure 2.9: Colour picker that allows to choose from 13 colours or "no colour". (Reprinted
from: [MRW14])
2.3.2 Studies and Applications
In two studies by Berger et al. [BHW+19, BHW+21] with grapheme-colour synaesthetes
using a calculator software with personalised digit colours to perform arithmetic tasks,
only marginal performance improvements are observed compared to displaying black
digits. However, the feedback of participants reveals a strong preference for congruently
coloured digits, with one subject considering it "life-changing". This shows that some
effort has been put into the personal colouring of the digits of a calculator. This software,
called SYNCalc (see Figure 2.10), is now available for download from both the App Store
and Google Play [syn24].
Figure 2.10: Screenshot of the SYNCalc application by Berger and Whittingham.
(Reprinted from: [The21])
18
2.3. Colouring Graphemes
In addition to this study, which focuses on natural synaesthesia, there is another similar
study by Plouznikoff et al. [PPR05] that induces artificial synaesthesia, which provides
synaesthetic experiences for non-synaesthetes to attempt and profit from the good features
of this digit-colour synaesthesia through the usage of a wearable device.
The interviews by Sidoroff-Dorso et al. [SDJDuLM20] offer a wealth of information on
synaesthetes, scientists, and artists regarding various visualisations and associations of
graphemes, such as the possibility that a word might be coloured based on its first letter
since this could give the word as a whole a dominant tone in the colour of the first letter,
for example by visualising it as a background through the rest of the word.
Google Chrome offers a number of extensions that provide users with coloured letters
while browsing the web. When one of these extensions is enabled, the letters are coloured
before the text is presented to the users, so each letter is presented in its unique colour,
but they partly do not support individual colouring schemes of graphemes by users (see
examples in Figure 2.11 and Figure 2.12).
Figure 2.11: Screenshot of the "SeeSynesthete" Google Chrome extension for colouring
web page fonts. (Reprinted from: [Chr20])
19
2. Background and Related Work
Figure 2.12: Screenshot of the "Synesthesia" Google Chrome extension for colouring web
page fonts. (Reprinted from: [Chr19])
2.4 Data Augmentation and Visualisation
To recognise, analyse, and visualise modified text, several processes are necessary. The
first step, text detection and localisation, involves identifying regions of textual content
for further analysis. Following this, text extraction or recognition takes place, utilising
optical character recognition (OCR) to convert visual text into machine-readable formats.
Finally, the extracted and recognised text is displayed in user-friendly formats through
text visualisation, such as on screens or in programs for translation and various other
purposes. [TO17]
2.4.1 Text Detection and Recognition
Text detection and recognition can be implemented through various methods (see 2.1).
Initially, it is necessary to develop different types of input sources for text detection,
including those based on images or videos. Subsequently, the choice of technology for
executing text detection and recognition depends on these differences; options include
AR technology or deep learning (DL) techniques [OHSBHW21].
Text detection and AR/VR technologies have seen significant development in the past.
For example, due to poor GPS accuracy indoors, Pivavaruk et al. [PFC22] aim to create
an indoor navigation app without relying on GPS. They develop a Unity application
20
2.4. Data Augmentation and Visualisation
Visualisation device Year Reference Technologies
Mobile Device
2022 [OHW22] Vuforia ([Vuf24])
2022 [OBHW22b] VGG-16, Vuforia
2022 [OBHW22a] VGG-19, Vuforia
2022 [PFC22] OpenCV Efficient and Accurate
Scene Text Detection (EAST),
Tesseract OCR ([Git24])
2021 [OHSBHW21] Vuforia
2020 [OHHW20] Vuforia
2018 [SH18] Tesseract OCR
2017 [PMI17] Tesseract OCR
HMD
2023 [SGB+23] Windows Runtime OCR API
([Mic24a]), Knowledge graph
2020 [Nik20] OpenCV ([Ope24]), Tesseract OCR
Stationary
Camera
2023 [CVRRS23] OpenCV, Tesseract OCR
Table 2.1: Comparison of relevant related work approaches for text detection and
recognition (Table adapted from: [OBHW22b])
that utilises the AR Foundation, smartphone photography, image processing, OpenCV’s
EAST DL approach, and the Tesseract OCR algorithm. This OCR algorithm runs in
conjunction with Python on a separate server to determine the user’s location based on
the image of a door sign. They select this OCR algorithm for its quick and accurate
performance and to offload processing power to the server [PFC22, Smi07].
Another study by Strecker et al. [SGB+23] seeks to enhance the transition between
non-digital and digital interactions in everyday settings by integrating AR with the
Windows Runtime OCR API, enabling real-time identification of printed text characters.
This project aims not to improve reading comprehension or perception, but to digitise
non-digital material and visualise it in AR. The paper also discusses OCR applications
in the AR domain extensively.
In order to develop a food menu translator app for translating Thai to Malay, Pu et
al. [PMI17] use a mobile device for image scanning, the Tesseract OCR engine for text
detection and recognition, and the Google translation service.
Meanwhile, Ouali et al. [OHW22] adopt a different approach to text detection by
developing their own AR-based Arabic text detection algorithm.
In another study, Nikolov [Nik20] explores the use of OCR algorithms with AR, focusing
on determining the most effective AR device based on the performance of integrated
cameras for text identification.
In some literature, DL techniques like the VGG-16 [OBHW22b] or VGG-19 [OBHW22a]
models are used alongside AR, the Unity 3D engine, Vuforia, and OCR Tesseract to
21
2. Background and Related Work
create a text detection and magnification application.
On the market, various tools employing OCR algorithms are available. For instance, the
"Ctrl F" [Goo23b] app is an AR application readily accessible in the marketplace. It allows
users to essentially use the CTRL+F function on their smartphones to search through
non-digital content, such as a book page. The app’s use of underlying OCR methods
enables it to display the highlighted searched word on the user’s screen. "SearchCam"
[App21] is another application that utilises a similar approach.
2.4.2 Relevant Visualisation Techniques
Visualising text on a screen can be achieved in various ways. For example, some
applications provide a coloured text overlay layer based on an image for purposes like
proofreading (see ScribeOCR in Figure 2.13). In these cases, after the text is detected
and recognised, its font and size are adjusted for readability, ensuring that the overlay
aligns closely with the individual letters.
Figure 2.13: Three visualisation versions. Left: original document; center: document
with overlay; right: new OCR layer. (Reprinted from: [Scrnd])
Another approach involves combining OCR algorithms with AR visualisation. This
technique uses images captured by smartphone cameras as the input source, allowing for
the capture of the environment through either photographing the text or recording it as
a video. This method is often used in translation apps (see Table 2.1).
In general, there are mostly works in this direction that have anything to do with
translators, as this visualisation is thus useful and relevant. For instance, the study by
Tatwany and Ouertani [TO17] addresses the general application of AR in translation
duties, reviews the studies and applications that are already available, and lists the
22
2.4. Data Augmentation and Visualisation
OCR technologies, tools, and programming languages that are utilised in a table. This
demonstrates that the indicated studies employ Tesseract, Android, Vuforia, ABBYY, or
commercial libraries for the OCR work, as well as technologies such as OpenCV, OpenGL,
Matlab, or Eclipse.
Vortmann et al. [VWP22] compare several translator apps, discussing how they employ
AR visuals, whether the original text is replaced, displayed as an overlay, or visualised
separately. An example of visualising the translation separately beneath the original
text is found in the research project by Chorfi and Tatwany [CT19], where real-time text
identification and translation is provided within seconds.
Additional visualisation options include overlaying the translated text onto the image in
a specific text colour or, in some cases, overlaying the text with a background to ensure
sufficient contrast ratio for readability (see Figure 2.14).
(a) Translate Lens (b) Google Lens
Figure 2.14: Two different AR translator applications using overlay and replacement as
visualisation techniques. (Reprinted from: [Goo24] and [Goo23a])
23
2. Background and Related Work
2.5 Reading Performance
There has been extensive research on how to measure reading comprehension. In general,
reading comprehension can be assessed on word, sentence, and text level [LFJH19b]. The
author Fletcher [Fle06] states that there are different elements of reading comprehension
based on the style of text presentation and the way the audience responses. They mention
the possibility of using multiple-choice tests, fill-in-the-blank or also called cloze exams
or retellings or summaries. It is found that inferences about a person’s comprehension
abilities can differ depending on the method of measurement, so it depends on the
reader and on the text as well as the measurement technique [Fle06, CCLG20a]. In the
work by Collins et al. [CCLG20b] the researchers evaluate various testing techniques
and assess their different influence factors on it. During the evaluation they come to
the same conclusion, so it depends on the "text, activity and reader to variance in
reading comprehension test scores" and that different response formats are suggested
since they are contributing to the variance in reading comprehension test scores. The
differences in the various methods of measuring reading comprehension can be interpreted
as varying degrees of imperfection in identifying the latent variables that comprise reading
comprehension, therefore, it is essential to keep in mind that reading comprehension
is a complicated construct that is impacted by a variety of elements and processes,
making it challenging to fully capture with a single approach [FSA+06, Fle06]. All the
above-mentioned methods can be combined with measuring the reading speed.
There are already different techniques for applying these methods, for instance developing
a mobile app for assessing the reading performance using a mixture of the cloze and the
multiple-choice methods [S2̈1].
There are nowadays also standardised measures for assessing reading comprehension like
SLS-Berlin [LFJH19b] by Lüdtke et al., Salzburger LeseScreening [MW03] by Mayringer
and Wimmer, PISA [OECnd], TOEFL [TOE24], which implement a mixture out of these
mentioned methods for assessing the reading performance. The "Sentence Verification
Technique (SVT)" [Roy05] by Royer assesses the reading comprehension by just focusing
on the sentence level.
One can then have a look on which impact coloured letters or coloured text has on the
reading performance. In this sense, the studies by Uccula et al. [UEM14] and Griffiths
et al. [GTHB16] have a look at the effect of coloured overlays while reading.
Smith et al. [SGMC14] as well as Gran Ekstrand and Nilsson Benfatto et al. [GENB21]
use eye tracking techniques to measure the eye-movements and saccades and evaluate
based on this the reading performance.
Another not so common technique for measuring reading performance is by using the
mental imagery [SL22].
But what still has to be in mind is that there could be a difference in the reading
performance when reading from the screen (for instance from a desktop or mobile phone)
24
2.5. Reading Performance
compared to reading something from a piece of paper [SP22, KSZ18, SBRL+23, CL19,
CCC+14, HE20, Par19].
25
CHAPTER 3
Design
To answer the research questions stated in Section 1.2, an AR system is proposed with
the purpose of creating a tool for recolouring real-world texts. The major goal of this
system is to allow users to view texts in customised colours instead of the traditional
black text on a white page.
3.1 Requirements
The following functional and non-functional criteria serve as a blueprint and provide a
direction for the development process.
3.1.1 Functional
Given the subjective nature of synaesthesia, it is essential to ensure that the approach
is highly customisable in order to accurately recreate the experience. This necessitates
the provision of a straightforward colour picker tool (see some potential examples in
Section 2.3.1) within the system, enabling users to select colours for each grapheme with
ease.
Moreover, it is essential to provide colouring rules for text on the word layer to afford
users the option to personalise the appearance of colours on text. This ensures the
approach is useful for a diverse range of synaesthetes, given that each person perceives
this phenomenon in a unique manner. To achieve this, different rule-sets are selected
and evaluated (more on that in Section 3.3).
Text scanning, detection and recognition are critical to the method’s functionality. The
detection and recognition algorithms have to work accurately and seamlessly to ensure
that the graphemes are coloured in the appropriate colours in the colouring phase.
27
3. Design
In order to facilitate the recolouring of real-world text in AR, it is essential to integrate text
scanning, detection and recognition functionalities. This encompasses the presentation of
text in AR and precise alignment and rendering in the user’s actual surroundings.
To ensure that the technique can be widely deployed and used for further research,
cross-platform compatibility must be a primary consideration. Consequently, SynVis is
designed with this objective in mind, offering a uniform UI across multiple platforms.
3.1.2 Non-Functional
Performance is a crucial non-functional criteria to use. It must operate efficiently,
facilitating text detection and recognition as well as colour applying with minimal
latency. This performance is vital for providing a seamless experience, thereby ensuring
the uninterrupted reading of the recoloured text. Consequently, benchmark tests are
conducted to evaluate the performance of SynVis.
Another crucial aspect is that of usability. The approach must possess an intuitive and
straightforward UI, thus clear instructions are integrated to guide users through the
process of selecting colours, establishing rules, scanning and reading recoloured text.
3.2 Design of the System
The system is designed and developed for usage on smartphones, taking use of its mobility
and integrated camera capabilities.
Users establish colour preferences for each grapheme. This customisation is supported by
a simple colour picker tool, which allows users to choose precise colours for individual
letters and digits based on their needs and perceptions.
The system allows for rule-based word colouring in addition to grapheme colouring on
its own. Users use predefined rules that specify how words are coloured based on the
colours assigned to each grapheme (see Section 3.3).
Once all of these parameters are set up, the system is ready to recolour texts in different
visualisation options, which works as shown in Figure 3.1. Users can scan a book page
line by line, and the system displays the recoloured words at the moment as they are
scanned. This feature enables users to read text in their preferred colours right from the
app, leading to a personalised reading experience.
28
3.3. Word Colouring Rules
Figure 3.1: System design sketch illustrating the scanning of real-world text and recolour-
ing via mobile AR.
3.3 Word Colouring Rules
The selection of word colouring rules for the application is based on an extensive literature
review of current research in the field of grapheme-colour synaesthesia.
Firstly, research indicates that the majority of grapheme-colour synaesthetes perceive
words in a single colour, which is primarily defined by either the initial letter, the initial
vowel, or a specific letter in the stressed or dominant syllable [The21, SGM06, JWA05].
The recolouring of words into a single colour, particularly the colour of the initial letter
(see at the top of Figure 3.2) or first vowel (see at the bottom of Figure 3.2), is a method
that is selected due to its prevalence in grapheme-colour synaesthesia, as well as its
simplicity and ease of implementation. This method does not require the use of complex
natural language processing (NLP) techniques, making it a feasible approach with basic
text recognition technology.
Figure 3.2: Single-colour word colouring rules selected based on literature review and
relevance to this thesis.
29
3. Design
In addition to perceiving words in a single colour, there are also individuals who perceive
them in multiple colours. For instance, if the word in question is a compound word, each
constituent word may be coloured according to the colour of the initial letter, as previously
discussed, or other complex rules [BCG16]. Alternatively, more complex rules may be
employed, whereby the word is coloured based on the colours of the vowels, with a colour
gradient applied to the consonants in between them. In addition, letters that occur prior
to the first vowel are represented in the colour associated with the first vowel, whereas
letters following the final vowel are represented in the colour associated with the final
vowel. It is less common for individuals with grapheme-colour synaesthesia to perceive
words in the individual letter colours, although this does occur [The21, BCG16, SGM06].
The aforementioned technique, vowel gradient colouring (see at the bottom of Figure 3.3),
is selected for its complexity and its particular relevance to this thesis. This type of
synaesthetic perception is experienced by a synaesthete who is interviewed after the
implementation of the prototype. They are also invited to test the prototype and provide
feedback, as well as evaluate the application’s effectiveness. The incorporation of this rule
allows for the comprehensive testing of the application’s capacity to manage sophisticated
colour transitions. While infrequent, the implementation of the rule for the colouring of
words in their individual letter colours (see at the top of Figure 3.3) is also included, as it
serves as an additional rule for colouring words in multiple colours, thereby encompassing
a broader range of individuals who perceive this form of synaesthesia.
Figure 3.3: Multi-colour word colouring rules selected based on literature review and
relevance to this thesis.
3.4 Visualisation and Colouring Types
The AR application’s visualisation approaches are selected with the objective of achieving
a balance between user familiarity and control over the environmental elements.
One approach that is chosen is the direct colouring of the "paper", that is to say, the
surface of the scanned real-world text (see at the top of Figure 3.4). This entails the
display and manipulation of each letter in specific colours as if they are actually printed
in those colours. This strategy is selected because it is consistent with the established
conventions that are used in text presentation. People are familiar with this style from
conventional print media, making it a convenient and straightforward alternative for
users. The method’s familiarity reduces the cognitive burden on the user.
30
3.4. Visualisation and Colouring Types
An additional visualisation style is the addition of an extra layer for the presentation of
the text (see at the bottom of Figure 3.4). This method involves the placement of the
text on a controlled background, resulting in a uniform colour display. This style is of
particular significance given that the background colour may affect colour perception
[Ell15]. By isolating the text on a controlled background, the application ensures that
users see the colours as intended.
Figure 3.4: The two different visualisation types.
In order to facilitate the addressing of customers’ diverse interests, a variety of colouring
types is included, in recognition of the fact that associative and projective synaesthetes
may have differing requirements (for instance see Figure 2.3 in Chapter 2).
A direct approach to colouring the text is to paint the letters themselves, which simulates
writing the letters with a coloured pen.
Another option is colouring the background by applying colour to the area behind each
letter, while the letter itself is left black, similar to how a highlighter pen is used.
The creation of a coloured outline represents a more subtle option, whereby only the
letter’s outline is coloured, while the interior remains black. This approach is less
obtrusive, yet nevertheless serves to effectively draw attention to the text as well as to
the colour.
These different colouring types (see Figure 3.5) enable users to select the style that is
most aligned with their perception and preferences.
Figure 3.5: The three different colouring types.
31
3. Design
3.5 Application Features
This section examines the main characteristics of the application, emphasising its features
and functionality.
3.5.1 Colour Definition for Each Grapheme
Users first create a customised palette that affects later text rendering by using a colour
picker to assign distinct hues to each grapheme (letters and digits).
3.5.2 Rule-Set Definition for Word Colouring
The application’s key feature is its ability to apply several colouring rules to the text.
The four selected rule-sets (single-colour: first letter, first vowel; multi-colour: individual,
vowel gradient) are incorporated into the application.
To help with the selecting process of the above-mentioned possible rule-sets, the applica-
tion shows the term "Sample Text" on both black and white backgrounds (see Figure 3.6).
This enables users to quickly establish the best colour scheme for different background
settings, simplifying the decision and customisation process.
Figure 3.6: Details of a screenshot of the word-colouring screen. It displays the term
"Sample Text" for each rule-set as a preview in the following order: individual, first letter,
first vowel, vowel gradient.
3.5.3 Visualisation Style Definition
The application’s visualisation and colouring capabilities cover five different types (see
Figure 3.7).
32
3.5. Application Features
Figure 3.7: Various options for displaying recoloured text, organised by visualisation
styles (columns) and colouring styles (rows).
The defined two visualisation styles are implemented as stated below:
• Texture Direct: Directly alters the scanned image’s pixels, changing their colours
in accordance with the rule-set specified. This solution keeps the document’s
original texture while applying the new colour palette.
• Text Mesh Pro (TMP) Overlay: Generates a new Unity game object, which
serves as a container for components that define the behaviour and properties of
an object, for each word. This game object is made out of a Unity image object
that acts as a background with a TMP object placed on top, covering the complete
bounding box for the word.
3.5.4 Text Scan Functionality
The application’s last major feature is its capacity to scan textual information with the
device camera and display the recoloured words based on user-defined choices on the
screen.
33
CHAPTER 4
SynVis Implementation
This chapter provides a comprehensive account of the implementation process for the
application outlined in the previous chapter. It covers the technology stack, the user flow
as well as the UI design, and specific algorithms and techniques.
4.1 Tech Stack
To implement the aforementioned system, a whole stack of technologies is used. Unity
(Version 2022.3.20f1) is chosen as the major development environment, and C# as the
programming language.
4.1.1 Unity Libraries
Several important Unity libraries are integrated into the project to provide robust
performance and a state-of-the-art user experience:
• Flexible Colour Picker: This tool allows users to conveniently select and apply
colours to each grapheme.
• Newtonsoft JSON: This library is used to efficiently serialise and deserialise JSON,
which is essential for preserving user data such as grapheme colours and word
colouring rules.
• AR Foundation: This package ensures compatibility between many devices and
makes it easier to construct cross-platform AR experiences.
• TextMesh Pro: This package improves text rendering and visualisation in Unity,
assuring high-quality text display. This high-quality text is critical for the in-app
texts on the screen, as well as the visualisation of the recoloured text.
35
4. SynVis Implementation
4.1.2 Text Detection and Recognition
Several options for implementing OCR and visualisation operations were investigated,
with each bringing unique problems and capabilities.
Tesseract OCR was initially picked since it is open-source and has out-of-the-box capability
on Windows. Tesseract was adapted for macOS and Android to provide cross-platform
functionality. However, Tesseract was discovered to only support MONO scripting
backend and not IL2CPP, which is required for iOS app development, resulting in its
final rejection.
Another option considered was Vuforia. After discovering that the text recognition
functionality was no longer supported and available, an alternative technique was tried
which involved pre-scanning text and creating image targets for each word, with the
aim of projecting recoloured words over the original text. However, this strategy proved
unworkable because letters and words were not different enough to serve as unique image
targets, resulting in projection mistakes and the decision not to proceed with Vuforia.
Finally, the "Open Source Computer Vision Library" OpenCV for Unity [Uni24] was
selected due to its extensive capabilities and fit for the project’s requirements. OpenCV
for Unity uses the EAST method and Convolutional Recurrent Neural Network (CRNN)
for text detection and recognition. Despite being a paid version, it was chosen for
its dependability, and the fact that OpenCV plus Unity [Uni19] is outdated and only
supports Windows and Android.
OpenCV for Unity was critical for creating numerous visualisation modalities, making
the AR application as convenient and pleasant as feasible. The visualisation sought
to closely mimic the original text without changing the font or background, instead
recolouring exactly the words according to predetermined principles. This option provides
the capability required to meet the project’s objectives effectively.
4.1.3 Used Devices
With the determination of the requirements and capabilities for constructing the AR
application, the implementation procedure is carried out on a Mac computer running
macOS 14.5. The app is built with the Xcode application 15.2, which ensures compatibility
and best performance on iOS devices.
The application is developed and tested on an iPhone running iOS 17.5.1. This combina-
tion of devices and software guarantees that the development environment is up-to-date
with the newest tools and operating systems.
4.2 User Flow
The UI is designed to ensure a pleasant and straightforward user experience throughout
the entire system.
36
4.3. User Interface
Once the application is opened, the UI flow begins with an introductory screen, which
prompts the user to provide their name. The user is sent to the menu screen after entering
their name. In the next step, they use a colour picker tool to assign a distinct colour to
each grapheme (letter and digit). After selecting the colours for the graphemes, the user
is prompted to pick a rule-set that determines how the words are coloured. Following this,
the user selects the visualisation mode. The final stage in the flow is to scan the text. To
scan a text with the application, the user aligns the text within a specified recognition
box. When the text is scanned, the application recolours it using the previously defined
grapheme colours and rule-set.
Figure 4.1: User flow diagram of the application, showing each step from starting the
app to closing the app.
4.3 User Interface
The following points are included to offer a positive user experience and simple navigation
of the application:
4.3.1 Intuitiveness and How-To Guidance
The AR app is designed with the goal of providing a straightforward user experience. Each
touchpoint in the app is designed to be self-explanatory, allowing users to readily grasp
and use the app without requiring additional instructions. In case the app’s operation is
unclear, a "how-to" instruction is provided (see Figure 4.2). This is especially critical for
operations such as text scanning, which requires the user to rotate the phone and scan
within a certain recognition area. In general, the software displays detailed information
on the screen to help users through the procedure. This instructional help guarantees
that users can comfortably utilise the app’s functions without becoming frustrated.
Figure 4.2: Detail of a screenshot of the how-to screen that is displayed prior to the
scanning process.
37
4. SynVis Implementation
4.3.2 Simplicity and Mode Indication
Simplicity is an important aspect in the UI design. The interface is maintained clear
and uncomplicated to prevent distracting or confusing the user. This basic design (black
background with white text) allows users to focus on the app’s main capabilities without
distractions, making interaction simple and straightforward. Additionally, to improve
usability, the app displays the currently selected mode via visual hints on the buttons (see
Figure 4.3). When a mode is active, the corresponding button is highlighted, ensuring
that users are constantly aware of the app’s current state. This feature avoids confusion
and helps users understand and regulate their interactions with the app more efficiently.
Figure 4.3: Detail of a screenshot of the visualisation mode selection screen, which shows
that the direct + font option is currently selected.
4.3.3 Consistency and Colour Scheme
Consistency goes hand in hand with the aforementioned point. It is maintained across
all UI components to create a cohesive design. The backdrop is always black, while the
text is white, ensuring strong contrast and readability (see screens in Figure 4.4). This
decision is intended to reduce discomfort caused by viewing letters in "wrong" colours
while maintaining an appealing, clean appearance. The same idea is used for the app’s
logo, resulting in a streamlined visual identity. The text makes use of the same typeface
and employs a uniform font size, based on the type of text (heading, paragraph, etc.), thus
ensuring a consistent visual presentation. In addition, the colour blue (hex #308cea) is
utilised frequently to highlight menu elements and buttons for the user. This consistency
in button designs, font selections, and layout structures allows users to anticipate the
behaviour of various elements based on past interactions.
38
4.3. User Interface
Figure 4.4: Screenshots of all application screens, labelled by name. User flow is indicated
by brown arrows.
39
4. SynVis Implementation
4.3.4 User Data Persistence
When the user closes the app, the software saves their data, using the name which is
typed in when starting the app (see Figure 4.5). This feature allows users to resume
their activities from where they left off without losing any progress. For example, if a
user is in the middle of configuring colours, the app allows them to resume from the
same position when they return. Setting graphemes to the precise observed colour is one
activity where this feature is essential.
Figure 4.5: Detail of a screenshot of the introductory screen, which requires users to
enter their first and last names in order to save their data.
4.4 Algorithms and Techniques
The code is divided into components, that include logic for colour definition, text detection,
colouring, visualisation, storage, and UI testing for usability purposes. Because it is
developed with design patterns in mind, such as the Factory Pattern, this structure allows
for easy extension of the code, especially for the rule-sets with are of great importance.
4.4.1 User Data Structure
The user data file is a JSON file that contains every configuration that a user defines
as shown in Listing 4.1. This consists of the colours for each grapheme, word colouring
rules, and visualisation options (type and mode). The latter mentioned parameters are
recorded as enumerations, guaranteeing an organised and uniform manner.
40
4.4. Algorithms and Techniques
1 {
2 "graphemeColours": {
3 "A": {
4 "r": 1.0,
5 "g": 0.0,
6 "b": 0.7687535
7 },
8 "B": {
9 "r": 0.1432643,
10 "g": 0.0,
11 "b": 1.0
12 },
13 "C": {
14 "r": 0.0,
15 "g": 0.973161459,
16 "b": 1.0
17 },
18 .
19 .
20 },
21 "graphemeMode": 1,
22 "colouringType": 1,
23 "colouringMode": 1
24 }
Listing 4.1: Excerpt of a User Data JSON file, displaying colours per grapheme, saved
word colouring rule (grapheme mode), visualisation style (colouring type), and colouring
style (colouring mode).
4.4.2 Detection and Colourisation Procedure
Text detection and recognition is carried out using the TextDetectionAndRecognition-
CRNN algorithm offered by OpenCV for Unity.
Whenever a new camera frame is recognised, the text detection procedure begins. The
algorithm analyses the frame for the presence of text, making use of the CRNN model’s
ability to reliably recognise letters and words inside the image.
After the text is identified and recognised, the OverlayManager class does word-by-word
processing. This phase involves applying the desired visualisation settings, which might
be texture direct or TMP overlay. The OverlayManager class renders the identified text
based on these parameters.
4.4.3 Texture Direct Pixel Manipulation
The texture direct manipulation visualisation option (see Figure 4.6) requires a thorough
method to ensure that each grapheme within a word is correctly recognised and coloured.
41
4. SynVis Implementation
This approach starts by inspecting the full word’s bounding box and then processes each
grapheme independently.
Figure 4.6: Detail of a screenshot during the text scanning process in the texture direct
mode.
Initially, the bounding box of the word is examined to define the region of interest (ROI).
A variety of image processing algorithms are used to segregate each grapheme within the
bounding box.
In order to smooth out the image and minimise noise, the ROI is first processed by using
a Gaussian blur, see Listing 4.2 line number 1.
Next, the colours inside the ROI are converted into greyscale to prepare for the next step,
see Listing 4.2 line number 2. The black regions of the ROI are then isolated using a
threshold function, which successfully isolates the text from the backdrop, see Listing 4.2
line number 3.
Then, a structural element is generated, and dilation, a morphological operation that
expands the boundaries of regions in a binary or greyscale image, is used to improve
the bounds of the text elements, see Listing 4.2 line numbers 4 and 5. This procedure
makes items larger and fills in small holes or gaps in the image, which is necessary for
combining characters, such as the body and dot of the letter "i".
Following the preparation of the text elements, each grapheme is distinguished by its
contours, see Listing 4.2 line number 6. The number of contours is then compared to
the number of letters in the recognised word to guarantee accuracy. If the counts are
equal, the contours are arranged in an array from the leftmost contour on the x-axis to
the rightmost contour.
After the contours have been successfully sorted, the user-defined word colouring rule
is retrieved. According to this rule, the colours for each letter are obtained from the
WordColouriser class. Each contour is then coloured using the appropriate colour from
the colour array, ensuring that each grapheme is presented in the correct colour as
indicated by the user’s preferences.
42
4.4. Algorithms and Techniques
1 Imgproc.GaussianBlur(roiLetterMat, roiLetterMat, new Size(5, 5), 0);
2 Imgproc.cvtColor(roiLetterMat, roiLetterMat, Imgproc.COLOR_BGR2GRAY);
3 Imgproc.threshold(roiLetterMat, roiLetterMat, 100, 255, Imgproc.
THRESH_BINARY_INV + Imgproc.THRESH_OTSU);
4 kernel = Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size
(2.5, 16));
5 Imgproc.dilate(roiLetterMat, roiLetterDilateMat, kernel);
6 Imgproc.findContours(roiLetterDilateMat, contours, hierarchy, Imgproc
.RETR_EXTERNAL, Imgproc.CHAIN_APPROX_SIMPLE);
Listing 4.2: Code snippet showing the OpenCV Code for extracting individual graphemes
from the word’s region of interest.
4.4.4 TMP Overlay
The TMP overlay visualisation option (see Figure 4.7) starts by cloning a reusable,
pre-configured game object, called a prefab which contains a TMP text object, which
is an advanced text component that provides high quality text rendering and rich text
formatting. This text object is displayed on a white background, which is represented as
an image object in Unity. This prefab serves as the basis for presenting the identified
text.
Figure 4.7: Detail of a screenshot during the text scanning process in the TMP overlay
mode.
After cloning, the newly created game object is translated into on-screen space and
positioned exactly above the bounding box of the detected word, ensuring that it is
perfectly aligned with the position of the text in the camera frame.
The WordColouriser class then processes the detected word in accordance with the user’s
word colouring rule selections, see Listing 4.3 line number 1 and 3.
This colourised word is then set as the prefab’s text using the rich text format, see
Listing 4.3 line number 2 and 4. This style provides for extensive text formatting and
colouring, ensuring that each letter is presented in the appropriate colour.
43
4. SynVis Implementation
Finally, the text is set to automatically span the whole bounding box, ensuring that
the text appears as large as it can be within the available area. This assures optimum
readability.
1 if (userData.GetColouringMode() == COLOURING_MODE.FONT)
2 colouredWord += $"{text[i]}";
3 else
4 colouredWord += $"{text[i]}";
Listing 4.3: Code snippet for applying the correct colours to letters in TMP.
44
CHAPTER 5
Testing and Evaluation Design
To evaluate the approach, a test strategy including both qualitative and quantitative
evaluation is developed.
Benchmark tests are conducted to evaluate the technical aspects of the application,
which includes for instance the time taken from scanning text to displaying the text
in a recoloured visualisation on the screen. User studies are conducted to evaluate the
readability and visualisation preference of users, as well as the usability of the whole
application. Expert interviews are conducted as a qualitative research method to obtain
in-depth knowledge and feedback about SynVis.
5.1 Benchmark Testing
This section explains how benchmark testing is conducted on the AR system’s two
visualisation modes, texture direct and TMP overlay. The tests focus on three major
performance indicators.
5.1.1 Performance Metrics
The benchmark test metrics are (1) frames per second (FPS) / frame time (FT), (2)
response rate (RR), and (3) error rate (ER).
In order to evaluate the system’s rendering performance, first the FPS / FT is used,
which gives a clear indication of how smoothly the visualisation is presented on screen.
Second, the RR measures the amount of time that passes between when the text is first
detected and when the visualisation is entirely displayed on the screen. This statistic is
essential for assessing how well the system processes and displays text in real time.
Third, the ER focuses on the accuracy of the actual visual outputs compared to expected
outputs, with the assumption that the OCR is fully functioning. Because the texture
45
5. Testing and Evaluation Design
direct and TMP overlay approaches use different visualisation algorithms, the ER for
this investigation is very relevant. The ER is calculated based on the correctness of the
visual representation.
5.1.2 Implementation of Testing Environment and Tools
A custom logger called "StatsLogger.cs" is created particularly for Unity to record FPS
(per frame) and RR (per word). Figure 5.1 shows how it is implemented to switch between
these two tracking modes. Furthermore, full-screen captures using Apple’s inbuilt video-
screen capture feature are obtained in order to assess the ER of the visualisations and
examine them later.
Figure 5.1: Checkbox added to the menu screen for selecting whether to log FPS or RR
during benchmark testing.
5.1.3 Testing Procedure
Three sessions, each lasting two minutes, are used to assess each performance metric for
each visualisation approach. To ensure that other performance logs are not weighted, only
one performance statistic from each session is examined at a time. With the identical
device (Apple iPhone 11 Pro) and predefined text settings (font: Helvetica Regular
font size: 12, and line space: 3), a controlled test environment is set up. The text (see
Appendix 8) is scanned word by word, line by line, from left to right, throughout the
test.
5.2 Pilot Study
Before conducting the user study, a pilot study was set up to ensure all organisational
aspects are in order and to see if everything runs as intended. To provide participants in
the user study a clear understanding of how long it will take, the pilot study also aimed
to quantify how long it takes for each participant.
Three participants (2 males, 1 female), aged between 24 and 27 years,volunteered for this
study.
46
5.2. Pilot Study
5.2.1 User Study Protocol
Before the study begins, all necessary equipment is prepared. This includes an iPhone
11 Pro with the SynVis app loaded, a text for the experiment (see Appendix 8), and a
questionnaire (see Appendix 8).
Upon arriving, each participant is given a brief introduction to the app. This introduction
includes subjects such as grapheme-colour synaesthesia, the app’s purpose, and the study’s
goals. This makes sure that before starting the tasks, participants understand the context
and goals clearly.
After the introduction and when the participant feels ready, they are given the smartphone
with the application installed. Participants are then asked to assign colours to each
grapheme (see Figure 5.2). Subsequently, they are required to select a specific word
colouring rule within the app.
Figure 5.2: Photograph depicting the process of selecting a colour for each grapheme,
with the current iteration featuring a shade of blue for the letter "g".
After that, participants are instructed to try out all the app’s visualisation options
independently (see Figure 5.3). This entails selecting each option and using it for
scanning and reading some text from the prepared text book. Through this stage,
participants are able to test and assess the various visualisation options offered by the
app.
47
5. Testing and Evaluation Design
Figure 5.3: Photograph depicting a participant engaged in the text scanning process,
utilising the texture direct + background visualisation option with colouring based on the
first letter colour rule.
After trying all the visualisation modes, participants are invited to complete a short
questionnaire (see Appendix 8 for the entire questionnaire) in their selected language
(German or English), which contains four sections:
• Demographics
• Rating the readability of the five different visualisation modes
• SUS
• Open-ended question for individual feedback and improvement ideas
After completing the questionnaire, participants are thanked for their time and effort,
which marks the end of their involvement in the study.
5.3 User Study
A user study was conducted to determine the app’s technological viability. In addition to
gathering usability input, the various visualisation styles were assessed. The methodology
employed was identical to that utilised in the pilot study, see Section 5.2.1.
48
5.4. Expert Interviews
5.3.1 Participants
This study included twelve volunteers (5 females and 7 males). They were invited through
word-of-mouth. The study ran from the end of June until the beginning of July in 2024.
None of the subjects experienced grapheme-colour synaesthesia. All subjects had normal
or corrected-to-normal vision, with the latter group wearing glasses or lenses.
The participants ranged in age from 23 to 60 years, with a mean of 38,333 and a standard
deviation of 16,289.
Every participant needed roughly 20 to 30 minutes for the session.
5.3.2 Data Collection
Throughout the usage of the app, participants’ remarks are collected via taking notes
using the think-aloud method, which involves verbalising their ideas and feelings as they
interact with the app. During the try-out period, participants’ behaviour and interactions
with the app are also observed. This observational data gives contextual information
about their experiences of and with the app. Data is also collected from completed
questionnaires.
5.4 Expert Interviews
As an essential component of the qualitative research process, expert interviews are
carried out to supplement the quantitative evaluation of the prototype and findings. The
purpose of these interviews is to get contextual insights and in-depth knowledge. This
method not only improves comprehension of the technical components, but it also aids
in providing a full picture that is required for holistic study results.
5.4.1 Selection of Experts
The research questions as well as the hypotheses were carefully considered in the selection
of participants for the expert interviews. This made sure that the knowledge acquired is
highly useful and relevant to the study’s findings.
A synaesthesia research expert was selected to mainly address the first research question,
namely the formalisation of synaesthetic perceptions into rule-sets. This participant’s
extensive scholarly experience is critical in validating the literature findings and inter-
pretations acquired throughout the thesis’ earlier stages. The major responsibility of
this expert was to conduct a critical examination of the identified concepts, assuring
their validity and relevance to current scientific understanding. Furthermore, the expert
was asked to evaluate the implemented prototype, providing feedback on its design and
functionality. This evaluation seeks to determine the prototype’s validity in reproducing
synaesthetic experiences, allowing to draw findings for the research questions via a
combination of theoretical insight and practical assessment.
49
5. Testing and Evaluation Design
The second participant, a person with grapheme-colour synaesthesia, was selected to offer
a user-centric view on the prototype’s utility and success. Engaging with a potential user
who has direct experience with synaesthesia allows the research to integrate personal
input into how the prototype functions in real-world scenarios. This participant assessed
the prototype to see if it can accurately reproduce their synaesthetic sensations, and
gave input on both its strengths and places for development. Their input is essential to
understand the real-world use of the prototype and to meet the needs of the synaesthetic
community. This participant’s input helps to address both research questions by providing
an in-depth view of the user experience.
5.4.2 Interview Procedure
To invite experts for interviews, a one-page overview of the project is developed and sent.
The goals of the project are outlined in this document, which also emphasise the format
of the interview.
The interview with the expert in synaesthesia research is done via Zoom. In order to
enable a thorough discussion of the application’s functioning and research implications,
a demonstration of the application is given during this virtual session using video and
screen-sharing capabilities. The interviewing guideline is shown in Figure 5.4.
In contrast, the interview with the person experiencing grapheme-colour synaesthesia is
conducted in person. In this setting, the expert engages with the application directly
and gives immediate input on its efficacy and correctness, depending upon their colour
perceptions. This hands-on experience is crucial for getting real user insights and
accurately analysing the application’s potential for reproducing synaesthetic sensations.
The interviewing guideline is shown in Figure 5.5.
The interviews are conducted in a semi-structured manner that permit both guided
questions and unstructured talk. This strategy makes it possible to gather focused data
while giving experts the freedom to delve deeper into subjects that come up during the
discussion.
After the interviews, each participant receives a PDF file including a personalised, hand-
drawn thank-you note. The purpose of the gesture is to show a heartfelt appreciation for
their efforts and important contributions to the project.
Zoom is used to record the first semi-structured interview, while Photo Booth (a Mac
application for taking photos and videos using the built-in camera) is used for the second.
50
5.4. Expert Interviews
Figure 5.4: Interview guide for the semi-structured interview with the synaesthesia
researcher. The yellow background indicates that the questions are identical to those
posed in the second expert interview.
51
5. Testing and Evaluation Design
Figure 5.5: Interview guide for the semi-structured interview with the synaesthete. The
yellow background indicates that the questions are identical to those posed in the second
expert interview.
52
CHAPTER 6
Results
Addressing the research questions outlined in Section 1.2, this section presents the findings
of the testing phase.
To address RQ1 (How can we formalise and reproduce individual grapheme-colour
synaesthetic experiences on a digital screen?), a comprehensive literature review is
conducted, trying to find and formalise common patterns of rules, and challenging
these findings through the two different expert interviews (synaesthesia researcher and
synaesthete).
In order to answer RQ2 (What technical developments are necessary to align an AR
visualisation with the experience of grapheme-colour synaesthesia?), benchmark tests
for FT, RR and ER, a user study to evaluate the experience of the application and the
readability of the superimposed and recoloured text, and expert interviews, in particular
the try-out session with the synaesthete, are conducted.
6.1 Quantitative Analysis
The quantitative data analysis is carried out with Jeffreys’s Amazing Statistics Program
(JASP) [JAS18].
The following results are reported as statistically significant at p < 0.05.
6.1.1 Benchmark Tests
To address RQ2, it is necessary to evaluate the performance of the application. To this
end, the following hypothesis is tested:
53
6. Results
H2-a: “The performance of the AR application, measured in terms of frame rate
and latency, will be within acceptable ranges (maintaining a frame rate
above 30 FPS and a latency below 100 milliseconds (ms)) when rendering
grapheme-colour synaesthetic visualisations.”
The descriptive statistics of the performance metrics defined in Section 5.1.1 are shown in
Table 6.1. The metrics are divided in two sections each, naming the different visualisation
modes, texture direct and TMP overlay.
Frame Time (FT) Response Rate (RR) Error Rate (ER)
Texture TMP Texture TMP Texture TMP
Median 258.804 288.898 133.000 131.000 30.000 7.000
Mean 255.610 278.909 137.239 132.743 30.000 7.333
Std. Deviation 44.747 47.845 54.749 55.768 0.000 0.577
Range 197.041 208.601 336.000 404.000 0.000 1.000
Minimum 136.293 124.732 8.000 8.000 30.000 7.000
Maximum 333.333 333.333 344.000 412.000 30.000 8.000
Table 6.1: Descriptive statistics of benchmark values, all of which are reported in
milliseconds (ms)
FPS are converted to FT for the benchmark analysis due to the linear relationship with
performance.
As FPS grows, the effect of additional frames on perceived performance diminishes. The
non-linearity complicates statistical analysis and interpretation.
FPS is inversely correlated with FT, which is the amount of time (measured in milliseconds)
required to render a single frame. Converting FPS to FT yields a linear measure in
which each unit of time has a consistent and interpretable impact on performance. The
conversion is done as followed:
FT in (ms) = 1000
FPS (6.1)
The normality tests, using the Shapiro-Wilk method, yield p-values less than 0.001 for
all measurements in both options, see Table 6.2. This means that neither of the data
distributions is normally distributed.
FT RR ER
Texture TMP Texture TMP Texture TMP
Shapiro-Wilk (W) 0.983 0.914 0.968 0.963 NaN 0.750
P-value of Shapiro-Wilk < .001 < .001 < .001 < .001 NaN < .001
Table 6.2: Normality tests of benchmark values (NaN means: all values are identical)
54
6.1. Quantitative Analysis
Given the divergence from normality, the metrics are evaluated using the non-parametric
paired-samples statistical test called the Wilcoxon signed-rank test.
This test is used to compare the performance of texture direct mode with TMP overlay
mode for the FT measure (see Table 6.3). The test shows a statistically significant
difference between the two modes (z = −7.139, p < 0.001). What this means is
that texture direct mode renders frames more efficiently, resulting in a smoother visual
experience. The high degree of significance indicates that the differences seen in the
descriptive statistics are unlikely to arise by coincidence. The rank-biserial correlation,
which measures the effect size, is calculated to be rrb = −0.452, with a standard error of
0.063 indicating a medium negative effect size according to Cohen’s conventions [Coh88].
Measure 1 Measure 2 W z p Rank-
Biserial
Correla-
tion
SE Rank-
Biserial
Correla-
tion
FT-
Texture
FT-TMP 15144.000 −7.139 < .001 −0.452 0.063
RR-
Texture
RR-TMP 555585.500 3.457 < .001 0.106 0.031
Table 6.3: Wilcoxon signed-rank test - FT and RR
(a) Mean and standard deviation for the two
different visualisation styles.
(b) Raw data points, box plots, and distri-
butions of the FT.
Figure 6.1: Figures illustrating details of the FT benchmark.
As previously conducted for the FT, the Wilcoxon signed-rank test is used to compare
the two different visualisation modes for the RR (see Table 6.3). The test indicates
a statistically significant better RR for the TMP overlay mode than for the texture
direct mode (z = 3.457, p < 0.001), indicating a more responsive user experience, see
Figure 6.2. Measuring the rank-biserial correlation results in an effect size of rrb = 0.106,
55
6. Results
with a standard error of 0.031 indicating a small positive effect size according to Cohen’s
conventions [Coh88].
(a) Mean and standard deviation for the two
different visualisation styles.
(b) Raw data points, box plots, and distri-
butions of the RR.
Figure 6.2: Figures illustrating details of the RR benchmark.
The detailed examination of RRs reveal an interesting pattern. Specifically, there are
situations where response times are nearly zero milliseconds (see Figure 6.2b). This occurs
when a user moves the camera from the rightmost word on a line to the leftmost word
on the next line, as no words are detected and therefore the entire detection, colouring
and display algorithm does not need to be performed.
For each run, 167 words plus two digits are scanned to determine the ER. It can be seen
in Table 6.1 that the texture direct mode has more errors or problems with an average
percentage of 17.75%, while the TMP overlay mode has a lower ER with an average
percentage of 4.34%, indicating a higher dependability.
Interestingly, across all criteria, texture direct mode has smaller standard deviations and
narrower ranges than TMP overlay mode. This means that texture direct mode has less
variability and is more consistent overall.
6.1.2 Pilot Study
The principal findings of the pilot study are as follows:
Input Field Adjustment
For ease of usage during the development process, a predetermined name was utilised in
the input field on the introduction screen. However, during the study, it became clear
that the pre-filled name box was inconvenient because participants had to erase the
pre-filled name before typing in their own. To solve this, the name field was replaced
with placeholder text, prompting users to enter their names directly, with no pre-filled
information. This change shortens the procedure and minimises any early friction for
participants.
56
6.1. Quantitative Analysis
Display Adjustments
The phone was equipped with a privacy glass screen (which is a protective layer that
limits the viewing angle so that only the person directly in front of you can see the
display, while others see a darkened screen) from previous use, but it has been found
that its darkening effect interfered with visibility and the correct colour settings in the
AR app. In order to provide participants with an accurate and distortion-free view of
the AR content, this privacy screen protecting glass was removed for the user study.
Furthermore, to ensure optimal visibility of the AR information, the phone’s brightness
was adjusted to 100% throughout all experiments.
Focus Mode Activation
In response to the observation that users were distracted by notifications from other
applications, a dedicated "focus mode" was set up and enabled on the mobile device.
This mode disables all alerts, and push notifications, thereby allowing users to interact
with the AR software without interruption.
Questionnaire Format
During the pilot study, it was observed that when the questionnaire was accessed in
portrait mode, users were not aware that they had to swipe left in order to see the full
1-5 Likert scale, resulting in only the first three values being visible. To address this issue,
participants are instructed to complete the questionnaire in landscape mode, thereby
enabling them to see and select from a comprehensive range of options.
These improvements are crucial in ensuring that the user study operates as well as
possible, offering a dependable and efficient experience for the participants.
6.1.3 User Study
In order to respond to RQ2, it is additionally necessary to evaluate two factors: firstly,
the user experience of the application and secondly, the readability of the visualised,
recoloured text. The following hypotheses are tested:
H2-b: “Users will find the AR application intuitive and easy to use, as indicated by
achieving a score of at least 80 on the SUS when visualising grapheme-colour
synaesthetic experiences.”
H2-c: “Combining text detection and recognition of printed text in AR with various
methods for visualising the reprinted text will successfully represent grapheme-
colour synaesthetic experiences in AR.”
Furthermore, H2-c is assessed from a psychological perspective through a trial session
with a synaesthete (see 6.2).
57
6. Results
The findings gathered from the user study, including the analysis of the SUS rating,
visualisation preferences, and the readability evaluation, are presented in this section.
System Usability Scale
Because the SUS questionnaire utilised was previously developed, refined, and verified, it
is interpreted in accordance with the guidelines provided by Brooke [Bro95].
The SUS scores are determined by appropriately adding each participant’s responses and
multiplying them by 2.5. This gives a mean SUS score of 88.75, a standard deviation of
7.797, the range of 22.5 and the interquartile range of 10.625. These values are illustrated
in Figure 6.3. According to the Sauro-Lewis curved grading system [LUM15], this score
falls inside the A+ range (84.1-100), indicating excellent usability.
Figure 6.3: Distribution of the SUS scores of the 12 participants for SynVis.
Visualisation Preferences
Participants were asked to choose their preferred visualisation style from two options.
The majority (91.7%) picked texture direct visualisation. In the text field provided, they
gave the following reasons for the choice of this particular type:
• "Standard view mode", "closer to the original" and "more appealing": This mode
seems familiar and comfortable to participants, since the visualisation nearly
matches the original text format and looks better than in the other mode.
• "Colours show up better": Participants state that the colours seem more vibrant
and sharp.
• "Better readability": The content becomes simpler to read than with the other
mode, and the letters are clearly distinguishable, according to the participants.
• "Less cluttered background": Some say, that the background is not congested, which
improves legibility.
58
6.1. Quantitative Analysis
In comparison, 8.3% of responders favoured the TMP overlay mode. The reason mentioned
is:
• Optimal contrast: "The contrast is optimal due to the background field, if the
background is not perfectly white."
Readability
Using a school grading system, participants assess the readability of the five different
visualisation options (texture direct + font, texture direct + outline, texture direct +
background, TMP + font, TMP + background) from 1 (very good) to 5 (very poor).
Having a look at the descriptive statistics in Table 6.4 and the box plots in Figure 6.4b,
the texture direct + outline mode has the highest reading rating, with an average grade
of 1.5. It is closely followed by texture direct + font with an average grade of 1.5833 as
well as TMP + font with 1.75.
Rating
TD_F TD_O TD_B TMP_F TMP_B
Median 1.500 1.000 3.000 1.500 3.500
Mean 1.583 1.500 3.167 1.750 3.500
Std. Deviation 0.669 0.674 1.115 0.965 1.314
Table 6.4: Descriptive statistics of readability scores, all of which are reported in school
grades ranging from 1 (very good) to 5 (very poor).
(a) Mean and confidence interval of 95% for
the five different visualisation options.
(b) Box plots for the five different visualisa-
tion options.
Figure 6.4: Readability scores (on a scale from 1 to 5) where TD_F is texture direct +
font, TD_O is texture direct + outline, TD_B is texture direct + background, TMP_F is
TMP overlay + font and TMP_B is TMP overlay + background. Lower scores indicate
better readability.
59
6. Results
Since Shapiro-Wilk tests for normality yield p-values less than 0.05 for all five visualisation
options, the data deviates from normality, see Table 6.5.
Rating
TD_F TD_O TD_B TMP_F TMP_B
Shapiro-Wilk (W) 0.768 0.732 0.859 0.778 0.818
P-value of Shapiro-Wilk 0.004 0.002 0.048 0.005 0.015
Table 6.5: Normality tests of readability scores
The Friedman test indicates significant differences in ratings between the five visualisation
options, χ2(4) = 26.022, p =< .001. Post-hoc pairwise comparisons using the Conover
test with Bonferroni correction reveals significant differences between texture direct + font
and texture direct + background (pbonf = 0.037), texture direct + font and TMP overlay
+ background (pbonf = 0.013), texture direct + outline and texture direct + background
(pbonf = 0.016), texture direct + outline and TMP overlay + background (pbonf = 0.006)
and TMP overlay + font and TMP overlay + background (pbonf = 0.037), see Table 6.6
and Figure 6.4a.
T-Stat df Wi Wj p pbonf
TD_F TD_O 0.292 44 27.000 25.000 0.772 1.000
TD_B 3.066 44 27.000 48.000 0.004 0.037
TMP_F 0.365 44 27.000 29.500 0.717 1.000
TMP_B 3.431 44 27.000 50.500 0.001 0.013
TD_O TD_B 3.358 44 25.000 48.000 0.002 0.016
TMP_F 0.657 44 25.000 29.500 0.515 1.000
TMP_B 3.723 44 25.000 50.500 < .001 0.006
TD_B TMP_F 2.701 44 48.000 29.500 0.010 0.098
TMP_B 0.365 44 48.000 50.500 0.717 1.000
TMP_F TMP_B 3.066 44 29.500 50.500 0.004 0.037
Table 6.6: Conover’s post hoc comparisons - visualisation option
6.2 Qualitative Analysis
In order to address RQ1, it is necessary to obtain qualitative feedback from both an
expert in synaesthesia research (R) and a synaesthete (S). During this process, the
following hypothesis is tested:
H1-a: “It is possible to identify consistent patterns in grapheme-colour synaesthetic
experiences for the majority of synaesthetes that suggest potential rules for
digital reproduction.”
60
6.2. Qualitative Analysis
The two semi-structured interviews with the synaesthesia researcher and the synaes-
thete are examined using the inductive thematic data analysis technique, involving the
six phases: “Familiarisation”, “Coding”, “Generating themes”, “Reviewing themes”,
“Defining and naming themes”, and “Writing up” [BC06].
In the first phase, the interview audio files are extracted from the video files and converted
into text using an online transcription tool from Microsoft Word via Google Chrome
[Mic24b].
The online tool “Miro” [Mir24] is used to go through the transcripts and place the
material on a shared board. As a result, similar remarks are summarised and grouped
on the board by colour-coding the "codes" purple and the "statements and quotes from
experts" yellow.
Once this is completed, the post-its are reorganised, see Figure 6.5, to make everything
simpler to read and more appealing, which supports the last processes, "generating
themes" and "reviewing themes".
Figure 6.5: Screenshot of the Miro board in progress during the iterative thematic data
analysis process.
6.2.1 Identified Themes
Four major themes are identified: “Rule-Set”, “Visualisation”, “Challenges” and “In-
teraction”, see Figure 6.6. In the following, each of these themes is discussed in more
detail.
61
6. Results
Figure 6.6: Screenshot of the Miro board displaying the identified themes (Rule-Set,
Visualisation, Challenges, and Interaction) along with their grouped subtopics.
Rule-Set
The thematic analysis of the grapheme-colour synaesthesia rule-sets reveals numerous
patterns corresponding to different forms of synaesthetic experience, ranging from simple
single-colour associations up to complex multi-colour mappings.
The examination reveals that the words in grapheme-colour synaesthesia "would typically
appear in one colour" (R). In this context, the term "typical" refers to the majority of
individuals who experience grapheme-colour synaesthesia. This is based on the observation
that "the colour is usually determined, either by the first letter or the prevailing vowel"
(R).
Compared to rule-sets that only include one colour, numerous colours are linked to other
forms of synaesthesia, such as for instance ticker-tape synaesthesia. What needs to be
noted here is that besides this, the expert who perceives grapheme-colour synaesthesia has
a form with multiple colour appearance, who perceives vowels with colours and consonants
without colours, so "if you have a word with several vowels, it will be cross-faded in the
consonants between them" (S).
Special words and sequences, such as weekdays and months, typically comply with a
specific colour order. Furthermore, it is mentioned as a side note, that the initial letters
of male names are frequently recognised in colours of blue, whilst for female names first
letters are connected with pink.
Moreover, common themes among the synaesthetic experiences of many people are
discovered. For example, certain letters such as "A is red" (R), "B is blue" (R), and "C is
yellow" (R), as well "0 and 1 are often perceived as black/white, since it is connected with
the binary system" (S) are reported in research and seen during the interviews. These
62
6.2. Qualitative Analysis
similarities underline the universal characteristics of synaesthetic perception ("I myself
can only confirm that these tendencies are generally the case" (S)).
Discussing long texts reveals an interesting component of synaesthetic perception. It is
found that the vividness of synaesthetic colours can fade when reading long texts, as the
attention shifts to the story or content rather than individual graphemes or words - "when
they read a book, for example, it gradually fades" (R). This observation bears similarities
to the diminishing perception of unique typefaces with extended reading, indicating a
potential adaptive function of perceptual attention in synaesthesia. In addition, the same
phenomenon is mentioned with numbers, so if one sees a number with two dominant
colours, the overall colour of the number fades somehow (an example of a violet 9 with a
grass-green 6, so the number 96 is then perceived in a violet shade, even though the green
6 is perceived as more dominant, so one perceives "some washed-out sensory impressions"
(S)).
The technological possibility of recreating synaesthetic experiences via digital interfaces
using the stated rule-sets is addressed. Experts in the subject matter state that sim-
ple synaesthetic experiences (such as the appearances of the majority who experience
grapheme-colour synaesthesia) might be easily recreated, "for them, I believe the app
works really well" (S). "You can’t exactly put these thoughts into my head via the app,
but when it is precisely combined, it is useful. So, when I see exactly this colour while
reading the text, then it comes, I believe, relatively close" (S).
Last but not least, some researchers (on synaesthesia and neuroscience) and their papers
are mentioned in the interviews to check on their findings if they are in line with the
findings discussed.
Visualisation
This theme includes a broad variety of topics. Firstly, during the demonstration phase of
the interviews, the outline method is praised for its non-intrusive nature, since "nothing
is actually changed in the text" (S), while providing a subtle enhancement that improves
legibility without overwhelming the reader. This method is described as offering a "hint
of supportive colouring" (S), which serves to subtly highlight text without altering it,
since "I think it’s important that you can still see the text somehow" (S).
The smooth execution of TMP overlay visualisation is appreciated for adding to the
reading experience’s fluidity. Longer texts are thought to be more suited for this approach,
since it improves readability without breaking the text’s natural flow. Some disadvantages
are noted, such the lack of punctuation and the restriction to using just lowercase
characters, "although, I don’t find them so annoying now, because chat communication
is now all lower case anyway" (S). Another downside, which is also addressed, is the
difficulty of reading, for example, yellow text on a white background. The suggestion is to
colour the background with the maximum possible contrast of the font colour, which could
on the other hand lead to a visually overwhelming display experience. Notwithstanding
63
6. Results
these difficulties, the approach is praised for its capacity to clearly emphasise every word,
which is not common, but is liked.
The idea of pre-coloured texts sparks thought-provoking conversations on how these
characteristics affect perception and learning. "That’s where it gets really difficult, because
emojis, for instance, themselves always bring colour with them" (S). Research on the
effects of pre-coloured components, such as logos or symbols, on cognitive processes
is scarce, but early findings indicate that pre-coloured texts may help people develop
colour-word connections, even those who are not synaesthetes [CMR12]. This could also
lead to a possibly complementary, interplay between learnt and innate colour associations.
It is also mentioned in the interviews that non-synaesthetes can build colour-word or
colour-letter connections by exposure to pre-coloured books.
Challenges
Expert interviews show some general challenges in designing a grapheme-colour synaes-
thesia system. "Significant individual variability among synaesthetes" (R) is one of the
main issues. It is critical to develop rule-sets that are wide and flexible enough to account
for these variances. Furthermore, individual fine-tuning can be required to take into
consideration differences in perception brought on by things like the time of day or
personal circumstances. Ensuring that the system can accommodate both associator and
projector synaesthetes is mentioned as a good way of providing different visualisation
options.
Language variations add to the difficulty. While it is thought by the synaesthete that
German and English synaesthetic sensations are comparable, there is considerable doubt
concerning other languages, such as French, where "for example, vowels are pronounced
quite differently" (S). These variations may have an influence on the synaesthetic experi-
ence.
Similar to the previous challenge, the semantic meaning of words in a text or sentence
can impact the synaesthetic colour depiction, according to both the synaesthete and the
synaesthesia researcher. Words having different meanings within the same context may be
seen in different hues, necessitating many possibilities inside the system to accommodate
this variety. For example, the synaesthete mentions that happy words may be seen
differently from sad words, and words associated with the past may have different colours
to those associated with the future. Another mentioned example is "Monday, which could
be the first day of the working week, might have a different colour than if you say I was
born on a Monday or something, because the semantics are completely different" (S).
Furthermore, the prototype is presently intended for the Latin alphabet, and it is
uncertain how it will handle letters from other alphabets, such as those found in Asian
languages.
If colour perception varies so much that users have to constantly adjust their settings, a
significant difficulty arises. The synaesthete advises that in such instances, it could be
64
6.2. Qualitative Analysis
better to render the text in black on a white backdrop rather than risking inaccurate
colour representation, which could lead to an even worse experience in reading.
In addition, the synaesthete emphasises the necessity of allowing users to fine-tune the
colours for certain letters separately after going through the initial set-up phase. For
example, if a user incorrectly sets the colour for the letter "u", they should be able to
change it without resetting the entire alphabet. Screen settings and calibration can
also influence colour appearance, necessitating specific modifications to assure proper
representation - "E is really a sunshine yellow, and this one is more of an orange now"
(S), "the blue of the A, that is actually already a relatively light I would almost say sky
blue, and that is now getting so dark" (S). This is mentioned as being important not only
when technical devices do not display colour correctly, but also to ensure that the system
remains effective and accurate over the lifetime of the user.
Interaction
The expert interviews reveal important details about the colour-space used. The RGB
colour model is used in the prototype, but because it is non-linear, it is found to be less
effective. The experts recommend using the HSV model instead, as it emphasises the
hue more. The application’s synaesthetic experience would probably be more accurate
and relevant if the HSV model was used.
The issue mentioned above goes hand in hand with the next finding, the design of the
colour picker. Experts note that brightness changes, which are frequently invisible to
the human eye, take up a significant amount of the screen’s space - "the hue value, that
was just this horizontal bar, I would have needed more area, more detail and less for the
brightness" (S). Furthermore, usage is made harder by the colour picker’s keeping of the
previous set-letter’s colour, particularly when a very dark colour is selected. Experts
recommend that improving visibility would include returning the picker to its basic,
default, for instance, white colour after each selection.
It is observed that, rather than seeing one colour for each letter in a word, synaesthetes
may see many colours for a single letter. Given this perception, it is possible that the
neighbouring vowel on the left side of the grapheme exerts a greater influence on the
colours of the left side than the neighbouring vowel on the right - "left is 100% a right is
100% e" (S). In this form of grapheme-colour synaesthesia, the colour of each pixel can
vary, giving a more nuanced and complex experience than another.
Furthermore, it is discovered that no numerals are coloured in the prototype, just in the
"individual letter colouring mode", and that the vowel "y", which is considered a vowel in
certain languages, is left out at this point.
The experts make positive comments about the prototype, such as "very interesting
how it works, how pleasant it is to read the text" (S). They call it a prototype that is
really well done, "looks pretty cool" (S) and "is easy to navigate" (S). Its usefulness as a
proof-of-concept is acknowledged by professionals, who especially value "its great potential
65
6. Results
as a research tool" (R) to get to know more about the experiences on a personal level. If
the prototype might make it possible for specialists to read research articles more rapidly
in the future, then they could be more likely to utilise it on a regular basis.
Besides the positive remarks, there is some space for improvement noted. It is "not
completely responsive" (S) on mobile devices and "heated up a bit" (S) when using it.
Experts advise outsourcing OCR processing to a server in order to boost speed, which
might increase the responsiveness and general smoothness of the application.
66
CHAPTER 7
Discussion of Results
This chapter contrasts and discusses the findings in relation to the hypotheses (see
Chapter 6) and research questions (see Section 1.2), based on the results reported in the
preceding chapter. This chapter also discusses the limitations and possible directions for
further research.
The hypothesis H1-a proposes that the inducer-concurrent relationship of grapheme-
colour synaesthesia can be formalised as rule-sets. This study’s findings support this
concept, which builds on earlier research and expert interviews.
There have been significant prior studies conducted to find patterns that can be used
to formalise rule-sets for grapheme-colour synaesthesia (see Section 2.2.5). This has
provided a framework for exploring different formalisation methods, but often encounters
difficulties due to the differences in individual perceptions. It is possible to formalise
distinct perspectives into rule-sets, according to the expert interviews done for this study.
However, it should be noted that colour adjustments for individual graphemes may be
needed in the future. As people age, their perception of colours may deteriorate, requiring
occasional recalibration of the grapheme colours to maintain accuracy.
In addition, when including the semantic meaning of words in texts, a flexible approach to
rule-sets is required in relation to "special words" (e.g. weekdays). The expert interviews
emphasise the need of allowing users to customise rule settings for each word while
reading. This adaptability is essential for dealing with special words that change colour
depending on their context (see Section 6.2.1).
The synaesthesia researcher, provides extremely useful insights. They agree that inducer-
concurrent relationships, particularly the "simple" ones, might be formalised into rule-sets.
Simple relationships are here defined as those that do not change depending on the time
of day, semantic meaning, or emotional state. This confirmation from someone with
first-hand knowledge highlights the feasibility of formalising these linkages and validates
the hypothesis.
67
7. Discussion of Results
One noteworthy finding is that it is not always required for people to assign colours to each
and every grapheme. Some synaesthetes exclusively see colours, for instance in vowels,
whereas consonants do not have inherent colours but are influenced by neighbouring
vowel colours. This selective perception emphasises the importance of an option for
flexible colour definition that can accommodate such individual differences. Thus, the
capacity to re-adjust and specify colours for certain graphemes is critical for effectively
representing the synaesthetic experience.
Based on these findings, an extensive diagram is created to illustrate the various ways in
which synaesthetes perceive and associate colours with graphemes (see Figure 7.1). This
diagram also serves to answer RQ1.
Figure 7.1: Diagram illustrating the formalisation of grapheme-colour experiences into
rule-sets.
68
Several grapheme-colour synaesthetes experience words in a single colour, according to
the expert interview with the synaesthesia researcher. This interpretation is frequently
based on the first or emphasised syllable letter/vowel, as illustrated on the left side of
the picture. This is consistent with prior research findings presented in Section 2.2.5,
which suggests that the first letter or stressed vowel has a substantial influence on the
perceived colour of the word.
The hypothesis H2-a anticipates that the application will execute at a rate exceeding 30
FPS and a RR of less than 100 milliseconds (ms). The results of the benchmark analysis
are 3.912 FPS and 3.585 FPS for the two visualisation options when the average FTs are
converted back to FPS (using the reformulated formula in Equation (6.1)). Unfortunately,
this does not reach the estimated performance requirement. Also, when having a look
at the response times, both options texture direct and TMP overlay, do not meet the
below 100 ms requirement with the benchmark results of 137.239 ms and 132.743 ms
respectively. It can be concluded that the data does not support H2-a, therefore the null
hypothesis is not rejected. User comments support this conclusion, suggesting that the
app "stalled when too much text was scanned at once".
Throughout the development phase, usability was a top priority. The key goal was to
ensure that the app was both functional and user-friendly. The usability is of critical
importance, as a confusing or difficult-to-use interface may dissuade users, thereby
reducing overall effectiveness and adoption rates. Therefore, the hypothesis H2-b states
that a SUS score greater than 80 is obtained, which is associated with an excellent
usability. The results support this hypothesis, with the average SUS score of 88.75.
This high score demonstrates the effectiveness of design decisions that prioritise user
experience.
According to the hypothesis H2-c, it is possible to successfully represent grapheme-
colour synaesthesia in AR via text detection and recognition of printed text. The expert
interviews and try-out session support this hypothesis.
Benchmark testing demonstrates that good text detection and recognition alone are
insufficient for successfully representing grapheme-colour synaesthesia in AR. The visual-
isation algorithm is equally important. In particular, the texture direct mode has a very
high ER, suggesting that the encoding of grapheme-colour connections is error-prone in
the absence of a robust visualisation algorithm.
One of the most significant concerns observed is the difficulty in accurately recognising
contours for letters that occur close together, such as double-f or double-t. Section 4.4.3
describes the algorithm for gathering contours and matching them to the number of
letters in the recognised word. The algorithm, however, finds it difficult to distinguish
accurately between letters that are quite near to one another. To solve this, the algorithm
may be able to recognise individual letters more correctly by increasing the distance
between characters, which would lower the ER.
Punctuation marks are another major hurdle for the OCR system. Words followed
by sentence marks (such as commas, periods, and semicolons) frequently produce an
69
7. Discussion of Results
additional contour, leading the OCR algorithm to misidentify the word. Furthermore,
punctuation symbols like commas and hyphens are occasionally misidentified as letters.
Regarding the visualisation options, it is found that most participants do not like the
background colouring options, either in texture direct or in the TMP overlay mode. Eight
participants mentioned that the black graphemes of words are hardly at all readable if
the grapheme colours is set to a dark colour.
Thus, based on all of the above findings, hypothesis H2-c is supported in theory, but
further technological developments are needed to fully realise its potential in practice.
In regard to RQ2 it is therefore necessary to take the findings from RQ1 and then
primarily focus on the OCR and visualisation algorithms during the implementation.
This is not only to ensure accurate detection, recognition and visualisation of the text, but
also to guarantee optimal performance and seamless operation of the entire application.
7.1 Limitations
Some limitations of this work give rise to further research, which is discussed in this
section.
7.1.1 Colouring
The system is limited to the word colouring rule "individual" in order to realise the
colouring of the digits. This means that colouring schemes, such as those based on the
first or most dominating digit in a series, cannot be used. The digits in all other colouring
settings are black and lack any colour.
7.1.2 Rule-Sets
Furthermore, the rule-set is restricted and does not account for individual words or
contextual changes. Also, the visualisation possibilities available in the system are
currently restricted to basic letter-by-letter colouring of the font, outline, and background;
no letter-level variants (i.e., various colours for a single letter) are offered.
7.1.3 Recognition and Detection
Because of the chosen OCR algorithm, the prototype only supports texts in English at
this time, which limits its applicability in multilingual settings or for users who engage
with texts primarily in other languages.
As noted in Section 6.2.1, vowel recognition does not extend to the letter "y". This
decision is consistent with the most frequent usage in English, but it may not fully reflect
its vocalic role in many words and settings, thereby compromising the aesthetic and
functional effects of the "vowel gradient colouring" mode.
70
7.2. Potential Future Work
Punctuation and capitalization are not supported throughout the prototype due to the
chosen OCR algorithm. This affects the TMP overlay visualisation modes alone, where
the recoloured words are superimposed over the words, rather than the texture direct
visualisation options.
7.2 Potential Future Work
The system’s development and assessment revealed some promise for further study and
advancement. Addressing present constraints and creating more features may expand
the possibilities for utilising this prototype as a research tool.
7.2.1 Colouring
Future developments may include redesigning the colour picker interface to use an HSV
(hue, saturation, value) model, as suggested by a synaesthete, or even the CIE L*a*b* or
CIE L*u*v* colour models (as mentioned in Section 2.3.1) to ensure maximum sensitivity
to allow more accurate colour selection. This update could solve difficulties with the
existing picker, which wastes a lot of screen area when making brightness adjustments,
which is not as relevant as colour hues.
Furthermore, as proposed during the try-out session, after selecting colours for each
grapheme, resetting the colour picker to white to ensure their visibility, would be a useful
feature to add next.
Third, additional implementations might enable users to modify grapheme colours after
the first configuration. The need for this capability stems from the fact that colours
appear to be different on the screen, which is often not apparent to the user until they
have scanned a word for the first time.
Another interesting area for improvement is dynamic colour adjustment. Modifications
to the background and text colours in response to brightness and colour contrast could
improve readability. This approach is exemplified by SYNCalc [syn24], which employs
coloured digits on a high-contrast background. This may include automatically altering
text to a brighter hue against darker backgrounds and vice versa, although thorough
testing would be required to regulate the visual impact.
7.2.2 Rule-Sets
Using NLP techniques, more advanced rule-sets based on syllabic or morphemic structures
might be implemented, allowing for more subtle colouring schemes. These systems can
involve colouring words according to the first vowel of the first or stressed syllable.
7.2.3 Recognition and Detection
Extending the system to handle other languages would increase its worldwide applicability,
making it more valuable to a wider group of users.
71
7. Discussion of Results
Due to present restrictions to the limited recognition zone, enabling text recognition
across wider areas or even the full screen would be a potential future implementation.
Addressing this may solve the triggering problem for some people where a letter is not
fully in the recognition box and is therefore recognised as a different letter, e.g. "o"
instead of "b", and appears in the wrong colour until it is fully in the box.
7.2.4 Performance
A detailed examination of the pipeline reveals that the FT and RR are not optimal, which
can be attributed to the used OCR algorithm. Consequently, subsequent implementations
may consider integrating a mobile-specific OCR algorithm, such as Google OCR, to
enhance frame rate, particularly on resource-constrained devices, such as mobile devices.
Moreover, the outsourcing of computationally demanding activities to a server or the
integration of text tracking with the detection and recognition of text are potential avenues
for consideration. This approach can also be beneficial if the system is implemented for
the use of HMD’s in subsequent stages.
7.2.5 Further Research and Development
Conducting in-depth studies, maybe in partnership with a psychological research depart-
ment, could give useful insights into how colouring graphemes influences reading ability.
Eye-tracking technology could offer information about reading speed, recognition rate,
and understanding.
Expanding the system’s functionality to incorporate HMD devices, such AR headsets,
would allow for new applications in immersive learning and reading settings.
72
CHAPTER 8
Conclusion
This thesis expands on the idea of formalising the regulatory factors of the inducer-
concurrent relationship of grapheme-colour synaesthesia, combining data from previous
research with expert interviews, and assessing it directly with a grapheme-colour synaes-
thete.
As a proof-of-concept that these sensations may be represented using technology, notably
AR, a prototype is created to obtain greater insights into individual experiences of
grapheme-colour synaesthesia.
The formalisation of the various individual inducer-concurrent perceptions of grapheme-
colour synaesthesia into machine-readable rule-sets is demonstrated to be a feasible
undertaking. A diagram is created to illustrate the manner in which this can be achieved.
Additionally, this thesis indicates that it is possible to represent grapheme-colour synaes-
thesia through the use of AR. A grapheme-colour synaesthete corroborates this conclusion.
Furthermore, the findings indicate the potential for achieving this in AR with an excellent
user experience (SUS of 88.75).
The technical criteria for improving the prototype are defined, ensuring that subsequent
iterations are more efficient.
In terms of readability, the user study posits that it is preferable to apply colour directly
to the pixels delineating the graphemes, as opposed to, for instance, colouring the
background of the graphemes. This is because, in certain instances, the application of
colour to the entire background of a grapheme, particularly when the colour is dark and
the grapheme itself is black, can impede reading.
Considering the importance of reading in daily life, this thesis provides a solid foundation
for future studies pertaining to natural synaesthetic reading.
73
8. Conclusion
On top of that, this thesis enables non-synaesthetes to benefit from the potential advan-
tages of coloured texts while mitigating the disadvantages of achromatic text reading for
synaesthetes. It enables synaesthetes to share their colour experiences and perceptions,
and also provides an application that can be further improved and altered for usage with
HMDs, allowing for real-time colour representation of texts in any context.
Therefore, this project lays a solid foundation for further research, serving as a "useful
research-tool" (quote synaesthesia researcher).
74
List of Figures
1.1 Screenshot of the prototype showing the text recolouring feature, which allows
users to personalise their reading experience. . . . . . . . . . . . . . . . . 3
2.1 Seventy-five types of synaesthesia (Sean A. Day). Left column: inducers; top
row: concurrents. White: documented; red: unrecorded; black: not a type.
(Reprinted from: [Day22]) . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Illustration of semantic network activations in response to the letter "A":
synaesthete with red colour experience (left) vs. non-synaesthetic control
(right). (Reprinted from: [Mei13]) . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Differences in perception of letters, numbers, or words by projectors and
associators. Top: two projectors; bottom: three associators. (Reprinted from:
[The21] and [SLM09]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Representative synaesthete (A–C) and control (D–F) brains. Grapheme ROI
(light blue) and V4 ROI (dark blue) are shown. A and D: ROIs on non-inflated
cortical surfaces. B and E: ROIs on inflated brains; yellow box highlights
region in C and F. Synaesthetes showed activation in both grapheme (light
blue) and V4 (dark blue) ROIs when viewing achromatic letters and numbers
(C). Controls showed activation only in the grapheme ROI (light blue) (F).
(Reprinted from: [BHC+10]) . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Screenshot of the publicly available shared GoogleSheets file [Goond] for
colouring cells in the experienced colour. (Reprinted from: [The21]) . . . 15
2.6 Appearance of words starting with "i" or "o". Left: experienced word; right:
individual letter colours. (Source: [BCG16]) . . . . . . . . . . . . . . . . . 16
2.7 Screenshot of the synaesthesia battery’s example grapheme-colour picker test.
(Source: [Dav24a]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.8 Colour picker that allows the selection of a colour range and brightness
adjustment. (Reprinted from: [ANS15]) . . . . . . . . . . . . . . . . . . . 17
2.9 Colour picker that allows to choose from 13 colours or "no colour". (Reprinted
from: [MRW14]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.10 Screenshot of the SYNCalc application by Berger and Whittingham. (Reprinted
from: [The21]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.11 Screenshot of the "SeeSynesthete" Google Chrome extension for colouring web
page fonts. (Reprinted from: [Chr20]) . . . . . . . . . . . . . . . . . . . . 19
75
2.12 Screenshot of the "Synesthesia" Google Chrome extension for colouring web
page fonts. (Reprinted from: [Chr19]) . . . . . . . . . . . . . . . . . . . . 20
2.13 Three visualisation versions. Left: original document; center: document with
overlay; right: new OCR layer. (Reprinted from: [Scrnd]) . . . . . . . . . 22
2.14 Two different AR translator applications using overlay and replacement as
visualisation techniques. (Reprinted from: [Goo24] and [Goo23a]) . . . . . 23
3.1 System design sketch illustrating the scanning of real-world text and recolour-
ing via mobile AR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Single-colour word colouring rules selected based on literature review and
relevance to this thesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Multi-colour word colouring rules selected based on literature review and
relevance to this thesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4 The two different visualisation types. . . . . . . . . . . . . . . . . . . . . . 31
3.5 The three different colouring types. . . . . . . . . . . . . . . . . . . . . . . 31
3.6 Details of a screenshot of the word-colouring screen. It displays the term
"Sample Text" for each rule-set as a preview in the following order: individual,
first letter, first vowel, vowel gradient. . . . . . . . . . . . . . . . . . . . . 32
3.7 Various options for displaying recoloured text, organised by visualisation styles
(columns) and colouring styles (rows). . . . . . . . . . . . . . . . . . . . . 33
4.1 User flow diagram of the application, showing each step from starting the app
to closing the app. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Detail of a screenshot of the how-to screen that is displayed prior to the
scanning process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Detail of a screenshot of the visualisation mode selection screen, which shows
that the direct + font option is currently selected. . . . . . . . . . . . . . 38
4.4 Screenshots of all application screens, labelled by name. User flow is indicated
by brown arrows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.5 Detail of a screenshot of the introductory screen, which requires users to enter
their first and last names in order to save their data. . . . . . . . . . . . . 40
4.6 Detail of a screenshot during the text scanning process in the texture direct
mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.7 Detail of a screenshot during the text scanning process in the TMP overlay
mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.1 Checkbox added to the menu screen for selecting whether to log FPS or RR
during benchmark testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 Photograph depicting the process of selecting a colour for each grapheme,
with the current iteration featuring a shade of blue for the letter "g". . . . 47
5.3 Photograph depicting a participant engaged in the text scanning process,
utilising the texture direct + background visualisation option with colouring
based on the first letter colour rule. . . . . . . . . . . . . . . . . . . . . . . 48
76
5.4 Interview guide for the semi-structured interview with the synaesthesia re-
searcher. The yellow background indicates that the questions are identical to
those posed in the second expert interview. . . . . . . . . . . . . . . . . . 51
5.5 Interview guide for the semi-structured interview with the synaesthete. The
yellow background indicates that the questions are identical to those posed in
the second expert interview. . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.1 Figures illustrating details of the FT benchmark. . . . . . . . . . . . . . . 55
6.2 Figures illustrating details of the RR benchmark. . . . . . . . . . . . . . . 56
6.3 Distribution of the SUS scores of the 12 participants for SynVis. . . . . . 58
6.4 Readability scores (on a scale from 1 to 5) where TD_F is texture direct +
font, TD_O is texture direct + outline, TD_B is texture direct + background,
TMP_F is TMP overlay + font and TMP_B is TMP overlay + background.
Lower scores indicate better readability. . . . . . . . . . . . . . . . . . . . 59
6.5 Screenshot of the Miro board in progress during the iterative thematic data
analysis process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.6 Screenshot of the Miro board displaying the identified themes (Rule-Set,
Visualisation, Challenges, and Interaction) along with their grouped subtopics. 62
7.1 Diagram illustrating the formalisation of grapheme-colour experiences into
rule-sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
77
List of Tables
2.1 Comparison of relevant related work approaches for text detection and recog-
nition (Table adapted from: [OBHW22b]) . . . . . . . . . . . . . . . . . . 21
6.1 Descriptive statistics of benchmark values, all of which are reported in mil-
liseconds (ms) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.2 Normality tests of benchmark values (NaN means: all values are identical) 54
6.3 Wilcoxon signed-rank test - FT and RR . . . . . . . . . . . . . . . . . . . 55
6.4 Descriptive statistics of readability scores, all of which are reported in school
grades ranging from 1 (very good) to 5 (very poor). . . . . . . . . . . . . 59
6.5 Normality tests of readability scores . . . . . . . . . . . . . . . . . . . . . 60
6.6 Conover’s post hoc comparisons - visualisation option . . . . . . . . . . . 60
79
Acronyms
AR Augmented Reality. xi, xiii, 1–3, 20–23, 27–29, 35–37, 45, 53, 54, 57, 69, 72, 73, 76
CRNN Convolutional Recurrent Neural Network. 36, 41
DL Deep Learning. 20, 21
EAST Efficient and Accurate Scene Text Detection. 21, 36
ER Error Rate. xiii, 45, 46, 53, 54, 56, 69
FPS Frames Per Second. 45, 46, 54, 69, 76
FT Frame Time. xiii, 45, 53–55, 69, 72, 77, 79
HMD Head Mounted Display. 21, 72, 74
NLP Natural Language Processing. 29, 71
OCR Optical Character Recognition. 20–23, 36, 45, 66, 69–72, 76
RR Response Rate. xiii, 45, 46, 53–56, 69, 72, 76, 77, 79
SUS System Usability Scale. 1, 48, 57, 58, 69, 73, 77
TMP Text Mesh Pro. xvi, 33, 41, 43–46, 54–56, 59, 60, 63, 69–71, 76, 77
UI User Interface. 4, 28, 35–38, 40
VR Virtual Reality. 8, 17, 20
81
Bibliography
[ANS15] Árni Gunnar Ásgeirsson, Maria Nordfang, and Thomas Alrik S. Compo-
nents of attention in grapheme-color synesthesia: A modeling approach.
PLOS ONE, 10(8):1–19, 08 2015.
[App21] Apple App Store - Omer Faruk Ozturk. Searchcam - ctrl-f camera app,
2021. Accessed: August 5, 2024.
[BC06] Virginia Braun and Victoria Clarke. Using thematic analysis in psychology.
Qualitative Research in Psychology, 3:77–101, 01 2006.
[BCG16] Laura J. Blazej and Ariel M. Cohen-Goldberg. Multicolored words:
Uncovering the relationship between reading mechanisms and synesthesia.
Cortex, 75:160–179, 2016.
[BHC+10] David Brang, Edward Hubbard, Seana Coulson, Ming-Xiong Huang,
and Vilayanur Ramachandran. Magnetoencephalography reveals early
activation of v4 in grapheme-color synesthesia. NeuroImage, 53:268–74,
10 2010.
[BHW+19] Joshua Berger, Irina Harris, Karen Whittingham, Zoe Terpening, and
John Watson. Substantiating synesthesia: a novel aid in a case of
grapheme-colour synesthesia and concomitant dyscalculia. Neurocase,
26:1–7, 11 2019.
[BHW+21] Joshua Berger, Irina Harris, Karen Whittingham, Zoe Terpening, and
John Watson. Sharing the load: How a personally coloured calculator for
grapheme-colour synaesthetes can reduce processing costs. PLOS ONE,
16:e0257713, 09 2021.
[BK69] Brent Berlin and Paul Kay. Basic color terms: Their Universality and
Evolution. Berkeley, CA: University of California Press, 1969.
[BMB+23] Lucie Bouvet, Cynthia Magnen, Clara Bled, Julien Tardieu, and Nathalie
Ehrlé. “i have to translate the colors”: Description and implications of a
genuine case of phoneme color synaesthesia. Consciousness and Cognition,
111:103509, 2023.
83
[Brind] British Council. How humans evolved language, n.d. Accessed: August 5,
2024.
[Bro95] John Brooke. Sus: A quick and dirty usability scale. Usability Eval. Ind.,
189, 11 1995.
[CCC+14] Guang Chen, Wei Cheng, Tingwen Chang, Xiaoxia Zheng, and Ronghuai
Huang. A comparison of reading comprehension across paper, computer
screens, and tablets: Does tablet familiarity matter? Journal of Comput-
ers in Education, 1:213–225, 11 2014.
[CCLG20a] Alyson Collins, Donald Compton, Esther Lindström, and Jennifer Gilbert.
Performance variations across reading comprehension assessments: Ex-
amining the unique contributions of text, activity, and reader. Reading
and Writing, 33, 03 2020.
[CCLG20b] Alyson Collins, Donald Compton, Esther Lindström, and Jennifer Gilbert.
Performance variations across reading comprehension assessments: Ex-
amining the unique contributions of text, activity, and reader. Reading
and Writing, 33, 03 2020.
[CDS+15] D.A. Carmichael, M.P. Down, R.C. Shillcock, D.M. Eagleman, and J. Sim-
ner. Validating a standardised test battery for synesthesia: Does the
synesthesia battery reliably detect synesthesia? Consciousness and
Cognition, 33:375–385, 2015.
[Chr19] Chrome Web Store - mr.bearengineer. Synesthesia chrome extension,
2019. Accessed: August 5, 2024.
[Chr20] Chrome Web Store - emmawebdeveloper00. Seesynesthete chrome exten-
sion, 2020. Accessed: August 5, 2024.
[CL19] Virginia Clinton-Lisell. Reading from paper compared to screens: A
systematic review and meta-analysis. Journal of Research in Reading,
42:288–324, 05 2019.
[CMR12] Olympia Colizoli, Jaap M. J. Murre, and Romke Rouw. Pseudo-
synesthesia through reading books with colored letters. PLOS ONE,
7(6):1–10, 06 2012.
[CMSR17] Olympia Colizoli, Jaap Murre, H. Scholte, and Romke Rouw. Creating
colored letters: Familial markers of grapheme–color synesthesia in parietal
lobe activation and structure. Journal of Cognitive Neuroscience, 29:1–14,
02 2017.
[Coh88] J. Cohen. Statistical Power Analysis for the Behavioral Sciences. Lawrence
Erlbaum Associates, 1988.
84
[CR14] Rocco Chiou and Anina Rich. The role of conceptual knowledge in
understanding synaesthesia: Evaluating contemporary findings from a
“hub-and-spokes” perspective. Frontiers in Psychology, 5, 2014.
[CT19] Henda Chorfi and Lama Tatwany. Augmented reality based mobile
application for real-time arabic language translation. Communications in
Science and Technology, 4:30–37, 07 2019.
[CVRRS23] Abbineni Charishma, Alla Amrutha Vaishnavi, D Rajeswara Rao, and
Tirumalasetti Teja Sri. Smart reader for visually impaired. In 2023 9th
International Conference on Advanced Computing and Communication
Systems (ICACCS), volume 1, pages 349–352, 2023.
[CWT+15] Michael Cohen, Kathrin Weidacker, Judith Tankink, H. Scholte, and
Romke Rouw. Grapheme-color synesthesia subtypes: Stable individ-
ual differences reflected in posterior alpha-band oscillations. Cognitive
neuroscience, 6:1–12, 04 2015.
[Cyt89] Richard Cytowic. Cytowic, r. e. synesthesia and mapping of subjective
sensory dimensions. neurology 39, 849-850. Neurology, 39:849–50, 07
1989.
[Cyt95] Richard Cytowic. Synesthesia: Phenomenology and neuropsychology a
review of current knowledge. Psyche, 2, 01 1995.
[Dav24a] David Eagleman. Grapheme-colour picker test, 2005-2024. Accessed:
August 5, 2024.
[Dav24b] David Eagleman. The synesthesia battery, 2005-2024. Accessed: August
5, 2024.
[Day04] Sean A. Day. Trends in synesthetically colored graphemes and phonemes
– 2004 revision. 2004.
[Day13] Sean A. Day. 903Synesthesia: A First-Person Perspective. In Oxford
Handbook of Synesthesia. Oxford University Press, 12 2013.
[Day22] Sean A. Day. Types of synesthesia, 2022. Accessed: August 5, 2024.
[DS05] Mike J. Dixon and Daniel Smilek. The importance of individual differences
in grapheme-color synesthesia. Neuron, 45(6):821–823, 2005.
[Duf01] Patricia Lynne Duffy. Blue cats and chartreuse kittens. Macmillan, 11
2001.
[EKT+07] David Eagleman, Arielle Kagan, Steffie Tomson, Deepak Sagaram, and
Anand Sarma. A standardized test battery for the study of synesthesia.
Journal of neuroscience methods, 159:139–45, 02 2007.
85
[Ell15] Andrew J. Elliot. Color and psychological functioning: a review of
theoretical and empirical work. Frontiers in Psychology, 6, 2015.
[Fle06] Jack Fletcher. Measuring reading comprehension. Scientific Studies of
Reading - SCI STUD READ, 10, 07 2006.
[FSA+06] David Francis, Catherine Snow, Diane August, Coleen Carlson, Jon
Miller, and Aquiles Iglesias. Measures of reading comprehension: A latent
variable analysis of the diagnostic assessment of reading comprehension.
scientific studies of reading, 10 (3), 301-322. Scientific Studies of Reading,
10:301–322, 01 2006.
[GENB21] Anna Carin Gran Ekstrand, Mattias Nilsson Benfatto, and Gustaf Öqvist
Seimyr. Screening for reading difficulties: Comparing eye tracking out-
comes to neuropsychological assessments. Frontiers in Education, 6,
2021.
[Git24] GitHub. Tesseract ocr, 2024. Accessed: August 5, 2024.
[GNCHCG11] Veronica Gross, Sandy Neargarder, Catherine Caldwell-Harris, and Alice
Cronin-Golomb. Superior encoding enhances recall in color-graphemic
synesthesia. Perception, 40:196–208, 02 2011.
[Goo23a] Google Play Store - Google LLC. Google lens, 2023. Accessed: August 5,
2024.
[Goo23b] Google Play Store - StuckInBasement. Ctrl-f - search text in documents,
2023. Accessed: August 5, 2024.
[Goo24] Google Play Store - Dream Dijital. Translate lens: Photo & camera, 2024.
Accessed: August 5, 2024.
[Goond] Google Sheets. Our alphabets- add your colors on a new row at the
bottom, use "custom" under the "fill color" tool to find more colors, n.d.
Accessed: August 5, 2024.
[GRJ12] Bradley Gibson, Gabriel Radvansky, and Ann Johnson. Grapheme–color
synesthesia can enhance immediate memory without disrupting the en-
coding of relational cues. Psychonomic bulletin review, 19, 07 2012.
[GTHB16] Philip Griffiths, Robert Taylor, Lisa Henderson, and Brendan Barrett.
The effect of coloured overlays and lenses on reading: a systematic review
of the literature. Ophthalmic and Physiological Optics, 36:519–544, 09
2016.
[HARB05] Edward Hubbard, Andi Arman, Vilayanur Ramachandran, and Geoffrey
Boynton. Individual differences among grapheme-color synesthetes: Brain-
behavior correlations. Neuron, 45:975–85, 04 2005.
86
[HE20] Vered Halamish and Elisya Elbaz. Children’s reading comprehension and
metacomprehension on screen versus on paper. Computers Education,
145:103737, 2020.
[HYS20] Daisuke Hamada, Hiroki Yamamoto, and Jun Saiki. Association between
synesthetic colors and sensitivity to physical colors changed by type of
synesthetic experience in grapheme-color synesthesia. Consciousness and
Cognition, 83:102973, 2020.
[JAS18] JASP. Jasp - a fresh way to do statistics, 2018. Accessed: August 5,
2024.
[JDW09] Jörg Jewanski, Sean Day, and Jamie Ward. A colorful albino: The first
documented case of synaesthesia, by georg tobias ludwig sachs in 1812.
Journal of the history of the neurosciences, 18:293–303, 07 2009.
[JWA05] Julia Simner Jamie Ward and Vivian Auyeung. A comparison of lexical-
gustatory and grapheme-colour synaesthesia. Cognitive Neuropsychology,
22(1):28–41, 2005. PMID: 21038239.
[KHL22] Jieun Kim, Jae-In Hwang, and Jieun Lee. Vr color picker: Three-
dimensional color selection interfaces. IEEE Access, 10:65809–65824,
2022.
[KSZ18] Yiren Kong, Young Sik Seo, and Ling Zhai. Comparison of reading
performance on screen and on paper: A meta-analysis. Computers
Education, 123:138–149, 2018.
[LFJH19a] Jana Lüdtke, Eva Froehlich, Arthur M. Jacobs, and Florian Hutzler.
The sls-berlin: Validation of a german computer-based screening test to
measure reading proficiency in early and late adulthood. Frontiers in
Psychology, 10, 2019.
[LFJH19b] Jana Lüdtke, Eva Froehlich, Arthur M. Jacobs, and Florian Hutzler.
The sls-berlin: Validation of a german computer-based screening test to
measure reading proficiency in early and late adulthood. Frontiers in
Psychology, 10, 2019.
[LLFT22] David P. Luke, Laura Lungu, Ross Friday, and Devin B. Terhune. The
chemical induction of synaesthesia. Human Psychopharmacology: Clinical
and Experimental, 37(4):e2832, 2022.
[LM18] Katrin Lunke and Beat Meier. Creativity and involvement in art in
different types of synaesthesia. British Journal of Psychology, 110, 11
2018.
[LM20] Katrin Lunke and Beat Meier. A persistent memory advantage is specific
to grapheme-colour synaesthesia. Scientific Reports, 10, 02 2020.
87
[LUM15] James Lewis, Brian Utesch, and Deborah Maher. Measuring perceived
usability: The sus, umux-lite, and altusability. International Journal of
Human-Computer Interaction, 31:150625095336004, 06 2015.
[MC14] Dan McCarthy and Gideon Caplovitz. Color synesthesia improves color
but impairs motion perception. Trends in cognitive sciences, 18, 02 2014.
[Mei13] Beat Meier. Semantic representation of synaesthesia. Theoria et Historia
Scientiarum, 10, 12 2013.
[Mei22] Beat Meier. Synesthesia. In Sergio Della Sala, editor, Encyclopedia of
Behavioral Neuroscience, 2nd edition (Second Edition), pages 561–569.
Elsevier, Oxford, second edition edition, 2022.
[Mic24a] Microsoft Developer. Windows.media.ocr api, 2024. Accessed: August 5,
2024.
[Mic24b] Microsoft Support. Transcribe your recordings, 2024. Accessed: August
5, 2024.
[Mir24] Miro. Miro - online collaboration tool, 2024. Accessed: August 5, 2024.
[MR13a] Beat Meier and Nicolas Rothen. Grapheme-color synaesthesia is associated
with a distinct cognitive style. Frontiers in psychology, 4:632, 09 2013.
[MR13b] Myrto Mylopoulos and Tony Ro. Synesthesia: a colorful word with a
touching sound? Frontiers in Psychology, 4, 2013.
[MRW14] Beat Meier, Nicolas Rothen, and Stefan Walter. Developmental aspects of
synaesthesia across the adult lifespan. Frontiers in human neuroscience,
8:129, 03 2014.
[MS22] Thea Mannix and Thomas Sørensen. Face-processing differences present
in grapheme-color synesthetes. Cognitive Science, 46, 04 2022.
[Mun24] Munsell Color. Munsell book of color - matte edition, 2024. Accessed:
August 5, 2024.
[MW03] H. Mayringer and Heinz Wimmer. Salzburger lese-screening (sls) für die
klassenstufen. Göttingen: Hogrefe, pages 1–4, 01 2003.
[NCE11] Scott Novich, Sherry Cheng, and David Eagleman. Is synaesthesia one
condition or many? a large-scale analysis reveals subgroups. Journal of
neuropsychology, 5:353–71, 09 2011.
[Nik20] Ivan Nikolov. In Proceedings of the 23rd International Conference on
Academic Mindtrek, AcademicMindtrek ’20, page 153–156, New York,
NY, USA, 2020. Association for Computing Machinery.
88
[OBHW22a] Imene OUALI, Mohamed Ben Halima, and Ali WALI. Real-time applica-
tion for recognition and visualization of arabic words with vowels based
dl and ar. In 2022 International Wireless Communications and Mobile
Computing (IWCMC), pages 678–683, 2022.
[OBHW22b] Imene Ouali, Mohamed Ben Halima, and Ali Wali. Text Detection and
Recognition Using Augmented Reality and Deep Learning, pages 13–23.
03 2022.
[OECnd] OECD. Reading performance (pisa), n.d. Accessed: August 5, 2024.
[OHHW20] Imene OUALI, Mohamed Saifeddine HADJ SASSI, Mohamed BEN HAL-
IMA, and Ali WALI. A new architecture based ar for detection and
recognition of objects and text to enhance navigation of visually impaired
people. Procedia Computer Science, 176:602–611, 2020. Knowledge-Based
and Intelligent Information Engineering Systems: Proceedings of the
24th International Conference KES2020.
[OHSBHW21] Imene Ouali, Mohamed Saifeddine Hadj Sassi, Mohamed Ben Halima,
and Ali Wali. Architecture for real-time visualizing arabic words with
diacritics using augmented reality for visually impaired people. In Leonard
Barolli, Isaac Woungang, and Tomoya Enokido, editors, Advanced In-
formation Networking and Applications, pages 285–296, Cham, 2021.
Springer International Publishing.
[OHW22] Imene OUALI, Mohamed BEN HALIMA, and Ali WALI. Augmented
reality for scene text recognition, visualization and reading to assist vi-
sually impaired people. Procedia Computer Science, 207:158–167, 2022.
Knowledge-Based and Intelligent Information Engineering Systems: Pro-
ceedings of the 26th International Conference KES2022.
[Ope24] OpenCV. Opencv library, 2024. Accessed: August 5, 2024.
[Par19] Parlindungan Pardede. Print vs digital reading comprehension in efl. 5:77,
07 2019.
[PBM+02] Thomas Palmeri, Randolph Blake, Rene Marois, Marci Flanery, and
William Whetsell. The perceptual reality of synesthetic color. Proceedings
of the National Academy of Sciences of the United States of America,
99:4127–31, 04 2002.
[PFC22] Ilya Pivavaruk and Jorge Ramón Fonseca Cacho. Ocr enhanced augmented
reality indoor navigation. In 2022 IEEE International Conference on
Artificial Intelligence and Virtual Reality (AIVR), pages 186–192, 2022.
[PMI17] Muhammad Pu, Nazatul Majid, and Bahari Idrus. Framework based on
mobile augmented reality for translating food menu in thai language to
89
malay language. International Journal on Advanced Science, Engineering
and Information Technology, 7:153, 02 2017.
[PPR05] N. Plouznikoff, A. Plouznikoff, and J.-M. Robert. Artificial grapheme-
color synesthesia for wearable task support. In Ninth IEEE International
Symposium on Wearable Computers (ISWC’05), pages 108–111, 2005.
[PVdSN11] Chris Paffen, Maarten Van der Smagt, and Tanja Nijboer.
Colour–grapheme synesthesia affects binocular vision. Frontiers in Psy-
chology, 2, 2011.
[RA18] John H. Reif and Wadee Alhalabi. Advancing attention control using
vr-induced multimodal artificial synesthesia. Preprints.org, August 2018.
[RAM+21] Nicholas Root, Michiko Asano, Helena Melero, Chai-Youn Kim, Anton V.
Sidoroff-Dorso, Argiro Vatakis, Kazuhiko Yokosawa, Vilayanur Ramachan-
dran, and Romke Rouw. Do the colors of your letters depend on your
language? language-dependent and universal influences on grapheme-color
synesthesia in seven languages. Consciousness and Cognition, 95:103192,
2021.
[RBM05] A.N. Rich, J.L. Bradshaw, and J.B. Mattingley. A systematic, large-scale
study of synaesthesia: implications for the role of early experience in
lexical-colour associations. Cognition, 98(1):53–84, 2005.
[RG19] Mariagrazia Ranzini and Luisa Girelli. Colours + numbers differs from
colours of numbers: cognitive and visual illusions in grapheme-colour
synaesthesia. Attention, Perception, Psychophysics, 81, 03 2019.
[RH01] Vilayanur Ramachandran and Edward Hubbard. Psychophysical inves-
tigation into the neural basis of synaesthesia. Proceedings. Biological
sciences / The Royal Society, 268:979–83, 06 2001.
[Roy05] James Royer. Uses for the sentence verification technique for measuring
language comprehension. 01 2005.
[RR21] Nicholas Root and Romke Rouw. A unifying model of grapheme-color
associations in synesthetes and controls. Annual Meeting of the Cognitive
Science Society, 43, 2021.
[RRA+18] Nicholas B. Root, Romke Rouw, Michiko Asano, Chai-Youn Kim, Helena
Melero, Kazuhiko Yokosawa, and Vilayanur S. Ramachandran. Why is
the synesthete’s “a” red? using a five-language dataset to disentangle the
effects of shape, sound, semantics, and ordinality on inducer–concurrent
relationships in grapheme-color synesthesia. Cortex, 99:375–389, 2018.
90
[RSWW13] Nicolas Rothen, Anil K. Seth, Christoph Witzel, and Jamie Ward. Di-
agnosing synaesthesia with online colour pickers: Maximising sensitivity
and specificity. Journal of Neuroscience Methods, 215(1):156–160, 2013.
[S2̈1] Andreas Säuberli. Measuring text comprehension for people with reading
difficulties using a mobile application. In Proceedings of the 23rd Inter-
national ACM SIGACCESS Conference on Computers and Accessibility,
ASSETS ’21, New York, NY, USA, 2021. Association for Computing
Machinery.
[SB13] Julia Simner and Angela Bain. A longitudinal study of grapheme-color
synesthesia in childhood: 6/7 years to 10/11 years. Frontiers in Human
Neuroscience, 7, 2013.
[SB17] Julia Simner and Angela Bain. Do children with grapheme-colour synaes-
thesia show cognitive benefits? British journal of psychology (London,
England : 1953), 109, 03 2017.
[SBRL+23] Jennifer J. Stiegler-Balfour, Zoe S. Roberts, Abby S. LaChance, Aubrey M.
Sahouria, and Emily D. Newborough. Is reading under print and digital
conditions really equivalent? differences in reading and recall of expository
text for higher and lower ability comprehenders. International Journal
of Human-Computer Studies, 176:103036, 2023.
[Scrnd] Scribe OCR. Scribe ocr documentation, n.d. Accessed: August 5, 2024.
[SDJDuLM20] A.V. Sidoroff-Dorso, J. Jewanski, S.A. Day, and Universitäts und Landes-
bibliothek Münster. Synaesthesia: Opinions and Perspectives: 30 Inter-
views with Leading Scientists, Artists and Synaesthetes. Wissenschaftliche
Schriften der WWU Münster / 8. 2020.
[SGB+23] Jannis Strecker, Kimberly García, Kenan Bektaş, Simon Mayer, and
Ganesh Ramanathan. Socrar: Semantic ocr through augmented reality.
In Proceedings of the 12th International Conference on the Internet of
Things, IoT ’22, page 25–32, New York, NY, USA, 2023. Association for
Computing Machinery.
[SGM06] Julia Simner, Louise Glover, and Alice Mowat. Linguistic determinants of
word colouring in grapheme-colour synaesthesia. Cortex, 42(2):281–289,
2006.
[SGMC14] Nicholas Smith, Fiona Glen, Vera Mönter, and David Crabb. Using eye
tracking to assess reading performance in patients with glaucoma: A
within-person study. Journal of ophthalmology, 2014:120528, 05 2014.
[SH18] Abdul Saudagar and Mohammed Habeebvulla. Augmented reality mobile
application for arabic text extraction, recognition and translation. Journal
of Statistics and Management Systems, 21:617–629, 07 2018.
91
[SHC+08] Julia Simner, Jenny Harrold, Harriet Creed, Louise Monro, and Louise
Foulkes. Early detection of markers for synaesthesia in childhood popula-
tions. Brain, 132(1):57–64, 11 2008.
[SHCS19] Rebecca Smees, James Hughes, Duncan Carmichael, and Julia Simner.
Learning in colour: children with grapheme-colour synaesthesia show
cognitive benefits in vocabulary and self-evaluated reading. Philosophical
Transactions of the Royal Society B: Biological Sciences, 374:20180348,
10 2019.
[SHM+19] Mary Jane Spiller, Lee Harkry, Fintan McCullagh, Volker Thoma, and
Clare N. Jonas. Exploring the relationship between grapheme colour-
picking consistency and mental imagery. Philosophical Transactions of
the Royal Society B, 374, 2019.
[Sim07] Julia Simner. Beyond perception: synaesthesia as a psycholinguistic
phenomenon. Trends in Cognitive Sciences, 11(1):23–29, 2007.
[SL22] Sebastian Suggate and Wolfgang Lenhard. Mental imagery skill predicts
adults’ reading performance. Learning and Instruction, 80:101633, 2022.
[SLM09] Richard Skelton, Casimir Ludwig, and Christine Mohr. A novel, illustrated
questionnaire to distinguish projector and associator synaesthetes. Cortex,
45(6):721–729, 2009.
[Smi07] R. Smith. An overview of the tesseract ocr engine. In Ninth Interna-
tional Conference on Document Analysis and Recognition (ICDAR 2007),
volume 2, pages 629–633, 2007.
[SMS+06] Julia Simner, Catherine Mulvenna, Noam Sagiv, Elias Tsakanikos, Sarah
Witherby, Christine Fraser, Kirsten Scott, and Jamie Ward. Synaesthesia:
The prevalence of atypical cross-modal experiences. Perception, 35:1024–
33, 02 2006.
[SNE+12] C. Sinke, J. Neufeld, H.M. Emrich, W. Dillo, S. Bleich, M. Zedler, and
G.R. Szycik. Inside a synesthete’s head: A functional connectivity analysis
with grapheme-color synesthetes. Neuropsychologia, 50(14):3363–3369,
2012.
[SNZ+14] Christopher Sinke, Janina Neufeld, Markus Zedler, Hinderk Emrich,
Stefan Bleich, Thomas Münte, and Gregor Szycik. Reduced audiovisual
integration in synesthesia - evidence from bimodal speech perception.
Journal of neuropsychology, 8:94–106, 03 2014.
[SOR+23] David J Schwartzman, Ales Oblak, Nicolas Rothen, Daniel Bor, and Anil K
Seth. Extensive phenomenological overlap between training-induced
and naturally-occurring synaesthetic experiences. Collabra: Psychology,
9(1):73832, 04 2023.
92
[SP22] Susanne Seifert and Lisa Paleczek. Comparing tablet and print mode of
a german reading comprehension test in grade 3: Influence of test order,
gender and language. International Journal of Educational Research,
113:101948, 2022.
[SPL+06] Julia M. Sperling, David Prvulovic, David E.J. Linden, Wolf Singer,
and Aglaja Stirn1. Neuronal correlates of colour-graphemic synaesthesia:
Afmri study. Cortex, 42(2):295–303, 2006.
[SS15] Avinoam Safran and Nicolae Sanda. Color synesthesia. insight into
perception, emotion, and consciousness. Current opinion in neurology,
28:36–44, 02 2015.
[Str35] J Ridley Stroop. Studies of interference in serial verbal reactions. Journal
of experimental psychology, 18(6):643, 1935.
[SWL+05] Julia Simner, Jamie Ward, Monika Lanz, Ashok Jansari, Krist Noonan,
Louise Glover, and David Oakley. Nonrandom associations of graphemes
to colours in synaesthetic and normal populations. Cognitive neuropsy-
chology, 22:1069–85, 12 2005.
[syn24] Syncalc: Calculator for synesthetes, 2024. Accessed: August 7, 2024.
[The21] The Synesthesia Tree. Grapheme-colour synesthesia, 2021. Accessed:
August 5, 2024.
[TO17] Lamma Tatwany and Henda Chorfi Ouertani. A review on using aug-
mented reality in text translation. In 2017 6th International Conference
on Information and Communication Technology and Accessibility (ICTA),
pages 1–6, 2017.
[TOE24] TOEFL Resources by Michael Goodine. Toefl reading section, 2024.
Accessed: August 5, 2024.
[UAY21] Kyuto Uno, Michiko Asano, and Kazuhiko Yokosawa. Consistency of
synesthetic association varies with grapheme familiarity: A longitudi-
nal study of grapheme-color synesthesia. Consciousness and Cognition,
89:103090, 2021.
[UEM14] Arcangelo Uccula, Mauro Enna, and Claudio Mulatti. Colors, colored
overlays, and reading skills. Frontiers in Psychology, 5, 2014.
[Uni19] Unity Asset Store - Paper Plane Tools. Opencv plus unity, 2019. Accessed:
August 5, 2024.
[Uni24] Unity Asset Store - Enox Software. Opencv for unity, 2024. Accessed:
August 5, 2024.
93
[Vuf24] Vuforia Developer. Vuforia developer portal, 2024. Accessed: August 5,
2024.
[VWP22] Lisa-Marie Vortmann, Pascal Weidenbach, and Felix Putze. Atawar
translate: Attention-aware language translation application in augmented
reality for mobile phones. Sensors, 22(16), 2022.
[War12] Jamie Ward. Synesthesia. Annual review of psychology, 64, 06 2012.
[WAS+14] Matthew R. Watson, Kathleen A. Akins, Charlotte Spiker, Lindsay
Crawford, and James T. Enns. Synesthesia and learning: a critical review
and novel theory. Frontiers in Human Neuroscience, 8:98, 2014.
[WJG+21] Ryan Joseph Ward, Fred Paul Mark Jjunju, Elias J. Griffith, Sophie M.
Wuerger, and Alan Marshall. Artificial odour-vision syneasthesia via
olfactory sensory argumentation. IEEE Sensors Journal, 21(5):6784–6792,
2021.
[WLSS07] Jamie Ward, Ryan Li, Shireen Salih, and Noam Sagiv. Varieties of
grapheme-colour synaesthesia: A new theory of phenomenological and
behavioural differences. Consciousness and Cognition, 16(4):913–931,
2007.
[WS20] Jamie Ward and Julia Simner. Chapter 13 - synesthesia: The current state
of the field. In K. Sathian and V.S. Ramachandran, editors, Multisensory
Perception, pages 283–300. Academic Press, 2020.
[WTLEK08] Jamie Ward, Daisy Thompson-Lake, Roxanne Ely, and Flora Kaminski.
Synaesthesia, creativity and art: What is the link? British journal of
psychology (London, England : 1953), 99:127–41, 03 2008.
[WW06] Nathan Witthoft and Jonathan Winawer. Synesthetic colors determined
by having colored refrigerator magnets in childhood. Cortex; a journal
devoted to the study of the nervous system and behavior, 42:175–83, 03
2006.
[WW13] Nathan Witthoft and Jonathan Winawer. Learning, memory, and synes-
thesia. Psychological science, 24, 01 2013.
[WWE15] Nathan Witthoft, Jonathan Winawer, and David M. Eagleman. Preva-
lence of learned grapheme-color pairings in a large online sample of
synesthetes. PLOS ONE, 10(3):1–10, 03 2015.
94
Appendix
User Study Questionnaire
95
96
97
98
99
Texts for Testing
A reading comprehension text for C1 level English reading called "How humans evolved
language" [Brind] is chosen as sample text for the benchmark test as well as for the user
study:
A
Thanks to the field of linguistics we know much about the development of the 5,000 plus
languages in existence today. We can describe their grammar and pronunciation and see
how their spoken and written forms have changed over time. For example, we understand
the origins of the Indo-European group of languages, which includes Norwegian, Hindi
and English, and can trace them back to tribes in eastern Europe in about 3000 BC.
So, we have mapped out a great deal of the history of language, but there are still areas
we know little about. Experts are beginning to look to the field of evolutionary biology
to find out how the human species developed to be able to use language. So far, there
are far more questions and half-theories than answers.
B
We know that human language is far more complex than that of even our nearest and
most intelligent relatives like chimpanzees. We can express complex thoughts, convey
subtle emotions and communicate about abstract concepts such as past and future. And
we do this following a set of structural rules, known as grammar. Do only humans use
an innate system of rules to govern the order of words? Perhaps not, as some research
may suggest dolphins share this capability because they are able to recognise when these
rules are broken.
100
C
If we want to know where our capability for complex language came from, we need to
look at how our brains are different from other animals. This relates to more than just
brain size; it is important what other things our brains can do and when and why they
evolved that way. And for this there are very few physical clues; artefacts left by our
ancestors don’t tell us what speech they were capable of making. One thing we can see
in the remains of early humans, however, is the development of the mouth, throat and
tongue. By about 100,000 years ago, humans had evolved the ability to create complex
sounds. Before that, evolutionary biologists can only guess whether or not early humans
communicated using more basic sounds.
D
Another question is, what is it about human brains that allowed language to evolve in a
way that it did not in other primates? At some point, our brains became able to make
our mouths produce vowel and consonant sounds, and we developed the capacity to
invent words to name things around us. These were the basic ingredients for complex
language. The next change would have been to put those words into sentences, similar
to the ’protolanguage’ children use when they first learn to speak. No one knows if the
next step – adding grammar to signal past, present and future, for example, or plurals
and relative clauses – required a further development in the human brain or was simply
a response to our increasingly civilised way of living together.
Between 100,000 and 50,000 years ago, though, we start to see the evidence of early
human civilisation, through cave paintings for example; no one knows the connection
between this and language. Brains didn’t suddenly get bigger, yet humans did become
more complex and more intelligent. Was it using language that caused their brains to
develop? Or did their more complex brains start producing language?
E
More questions lie in looking at the influence of genetics on brain and language devel-
opment. Are there genes that mutated and gave us language ability? Researchers have
found a gene mutation that occurred between 200,000 and 100,000 years ago, which
seems to have a connection with speaking and how our brains control our mouths and
face. Monkeys have a similar gene, but it did not undergo this mutation. It’s too early
to say how much influence genes have on language, but one day the answers might be
found in our DNA.
101