Explainability of hate speech classification for Albanian language using rule based systems and neural networks

Kaçuri, Muhamet

doi:10.34726/hss.2023.105780

Datensatz Zitierlink:

https://doi.org/10.34726/hss.2023.105780
http://hdl.handle.net/20.500.12708/188586

Titel:

Explainability of hate speech classification for Albanian language using rule based systems and neural networks

Zitat:

Kaçuri, M. (2023). Explainability of hate speech classification for Albanian language using rule based systems and neural networks [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.105780

reposiTUm-DOI:

10.34726/hss.2023.105780

CatalogPlus:

AC16954904

Publikationstyp:

Hochschulschrift - Diplomarbeit

Sprache:

Englisch

Autor_innen:

Kaçuri, Muhamet

Betreuer_in:

Hanbury, Allan

Mitbetreuer_innen:

Recski, Gábor

Organisationseinheit:

E194 - Institut für Information Systems Engineering

Datum (veröffentlicht):

2023

Umfang:

Keywords:

Hate Speech Classification; Explainability; Albanian; Rationales; Rules; Deep Learning; Rule-based Systems

Abstract:

Offensive language is a growing problem in online communication. It can be used to bully, harass, or intimidate others, and it can create a hostile and unsafe environment. Offensive language classification is the task of automatically identifying offensive text. The focus of this master thesis is to aim for the interpretability of model predictions and not only the accuracy. Recent deep learning models are showing high accuracy but they have low explainability of the predictions compared to rule-based systems that have high explainability. During the work of this thesis we have extended the Shaj dataset to include human-annotated rationales that show the reason why a sentence was classified as hatespeech. During this process, we have annotated all samples that were classified as hatespeech in the Shaj dataset. Each sample was annotated by three people and in total six people were part of the process. For this study an annotating interface was developed, allowing human annotators to select rationales for each sample that was classified as hatespeech. We report a score of 0.47 for the agreement between human annotators for rationales, this was calculated using the average pairwise Jaccard overlap of rationales. Our results are comparable to the HateXplain, they report a value of 0.54 for the agreement between human annotators. We have built a rule-based system capable of classifying hate speech using Shaj the abusive/offensive language detection dataset for Albanian language. Human-annotated rationales for this dataset were a great use when defining rules. We have used regular expressions to define rules because of the ease of using rationales as a source of information when extracting rules from the dataset. We use methods to create regular expressions automatically from rationales. The rule-based system has performed well on the test set but it comes with a shortcoming when new words form the test set that are rationales are introduced. Since these words were not present when defining the rules, rule-based system will not classify them as hate speech. During this work, we have trained and evaluated deep learning models. We have followed the work introduced by HateXplain to train and evaluate the deep learning models. We have used human-annotated rationales to train deep learning models. We have noticed an increase in the accuracy of the BERT model when using rationales to train the attention of the BERT model. We calculate explainability of the BERT models using ERASER benchmark. The BERT model trained with rationales performed slightly better in terms of comprehensiveness but is weaker in terms of sufficiency.

Lizenz:

Urheberrechtsschutz

Enthalten in den Sammlungen:

Thesis