Offensive text detection across languages and datasets using rule-based and hybrid methods

Gemes, Kinga Andrea; Kovacs, Adam; Recski, Gábor

doi:10.34726/4341

Datensatz Zitierlink:

http://hdl.handle.net/20.500.12708/177653
https://doi.org/10.34726/4341

Titel:

Offensive text detection across languages and datasets using rule-based and hybrid methods

Zitat:

Gemes, K. A., Kovacs, A., & Recski, G. (2023). Offensive text detection across languages and datasets using rule-based and hybrid methods. In G. Drakopoulos & E. Kafeza (Eds.), CIKM-WS 2022. Proceedings of the CIKM 2022 Workshops. CEUR-WS.org. https://doi.org/10.34726/4341

reposiTUm-DOI:

10.34726/4341

CatalogPlus:

AC17204584

Publikationstyp:

Konferenzbeitrag - Full-Paper Contribution

Sprache:

Englisch

Autor_innen:

Gemes, Kinga Andrea
Kovacs, Adam
Recski, Gábor

Organisationseinheit:

E194-04 - Forschungsbereich Data Science
E194 - Institut für Information Systems Engineering

Erschienen in:

CIKM-WS 2022. Proceedings of the CIKM 2022 Workshops

Band:

3318

Datum (veröffentlicht):

8-Jan-2023

Veranstaltungsname:

CIKM’22: Advances in Interpretable Machine Learning and Artificial Intelligence Workshop

Veranstaltungszeitraum:

17-Okt-2022 - 21-Okt-2022

Veranstaltungsort:

Atlanta, US-GA, Vereinigte Staaten von Amerika

Umfang:

Verlag:

CEUR-WS.org

Peer Reviewed:

Keywords:

offensive text; rule-based methods; human in the loop learning

Abstract:

We investigate the potential of rule-based systems for the task of offensive text detection in English and German, and demonstrate their effectiveness in low-resource settings, as an alternative or addition to transfer learning across tasks and languages. Task definitions and annotation guidelines used by existing datasets show great variety, hence state-of-the-art machine learning models do not transfer well across datasets or languages. Furthermore, such systems lack explainability and pose a critical risk of unintended bias. We present simple rule systems based on semantic graphs for classifying offensive text in two languages and provide both quantitative and qualitative comparison of their performance with deep learning models on 5 datasets across multiple languages and shared tasks.

Link (extern):

https://github.com/GKingA/offensive_text

Forschungsschwerpunkte:

Information Systems Engineering: 100%

Wissenschaftszweig:

1020 - Informatik: 90%
5020 - Wirtschaftswissenschaften: 10%

Lizenz:

CC BY 4.0