<div class="csl-bib-body">
<div class="csl-entry">Gander, A. (2021). <i>Text analysis using colexification networks</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2021.83049</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2021.83049
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/17862
-
dc.description
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers
-
dc.description.abstract
The phenomenon of colexification describes occurrences in natural language in which two concepts are expressed by the same word in at least one language. We deploy this linguistic principle to construct a theory-driven text analysis method. Compared to many state-of-the-art natural language processing (NLP) models, this method is fully interpretable, allowing precise insights into the structure of the model. Such theory-driven approaches are increasingly in demand since when using other large NLP models it is difficult for developers to understand a models’ dynamics and implications thereof. Furthermore, the proposed method is domain-independent because it is constructed on the language-layer itself as compared to the majority of state-of-the-art methods, which are trained using large corpora of texts.The text analysis method here proposed is based on a word similarity measure built on top of a colexification network, i.e. a network of concepts linked by occurrences of colexification. Inspired by similar approaches in other domains, we compute the word similarity measure as the stationary visiting distribution in each node and validate it using several of the most used word similarity datasets in NLP. The results show that the colexification-based method significantly outperforms other word and graph embedding approaches in the task of word similarity prediction. After the validation of the word similarity metric we define a text similarity measure inspired by a state-of-the-art approach to the same task. Performing various experiments based on databases of English texts, we validate the measure by showing that it is able to distinguish text excerpts on the basis of their genre, author and text of origin with reasonable accuracy. We compare the results of the method with the ones of a standard NLP approach on the genre recognition task and find that the two models reach comparable performances.The text analysis method developed in this work allows us to validate the hypothesis that colexification occurrences encode semantic relationships between concepts. Furthermore, we show that a colexification-based approach to NLP has significant merits in various text analysis tasks, leading to meaningful insights. For instance, we perform a historical analysis of American English fiction literature, showing that the style and content of fiction literature has become more diverse over time, with the rate of change increasing particularly sharply in recent decades. These insights can be linked to other findings in computational social science, suggesting that the flux of cultural content has been increasing during the last decades.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
Kolexification
de
dc.subject
Textanalyse
de
dc.subject
Textähnlichkeit
de
dc.subject
Wortähnlichkeit
de
dc.subject
Computerwissenschaft
de
dc.subject
Computergestützte Sprachverarbeitung
de
dc.subject
Linguistik
de
dc.subject
Machine Learning
de
dc.subject
colexification
en
dc.subject
text analysis
en
dc.subject
text similarity
en
dc.subject
word similarity
en
dc.subject
computational science
en
dc.subject
nlp
en
dc.subject
linguistics
en
dc.subject
machine learning
en
dc.title
Text analysis using colexification networks
en
dc.title.alternative
Textanalyse mit colexification Netzwerken
de
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2021.83049
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Armin Gander
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
tuw.publication.orgunit
E194 - Institut für Information Systems Engineering