Gemes, K. A., Kovacs, A., & Recski, G. (2023). Offensive text detection across languages and datasets using rule-based and hybrid methods. In G. Drakopoulos & E. Kafeza (Eds.), CIKM-WS 2022. Proceedings of the CIKM 2022 Workshops. CEUR-WS.org. https://doi.org/10.34726/4341
E194-04 - Forschungsbereich Data Science E194 - Institut für Information Systems Engineering
-
Published in:
CIKM-WS 2022. Proceedings of the CIKM 2022 Workshops
-
Volume:
3318
-
Date (published):
8-Jan-2023
-
Event name:
CIKM’22: Advances in Interpretable Machine Learning and Artificial Intelligence Workshop
en
Event date:
17-Oct-2022 - 21-Oct-2022
-
Event place:
Atlanta, US-GA, United States of America (the)
-
Number of Pages:
10
-
Publisher:
CEUR-WS.org
-
Peer reviewed:
Yes
-
Keywords:
offensive text; rule-based methods; human in the loop learning
en
Abstract:
We investigate the potential of rule-based systems for the task of offensive text detection in English and German, and demonstrate their effectiveness in low-resource settings, as an alternative or addition to transfer learning across tasks and languages. Task definitions and annotation guidelines used by existing datasets show great variety, hence state-of-the-art machine learning models do not transfer well across datasets or languages. Furthermore, such systems lack explainability and pose a critical risk of unintended bias. We present simple rule systems based on semantic graphs for classifying offensive text in two languages and provide both quantitative and qualitative comparison of their performance with deep learning models on 5 datasets across multiple languages and shared tasks.