Offensive text detection on English Twitter with deep learning models and rule-based systems

Gemes, Kinga Andrea; Kovacs, Adam; Reichel, Markus; Recski, Gábor

doi:10.34726/4342

Record link:

http://hdl.handle.net/20.500.12708/177654
https://doi.org/10.34726/4342

Title:

Offensive text detection on English Twitter with deep learning models and rule-based systems

Citation:

Gemes, K. A., Kovacs, A., Reichel, M., & Recski, G. (2021). Offensive text detection on English Twitter with deep learning models and rule-based systems. In P. Mehta, T. Mandl, P. Majumder, & M. Mitra (Eds.), FIRE-WN 2021 [FIRE 2021 Working Notes] (pp. 283–296). CEUR-WS.org. https://doi.org/10.34726/4342

reposiTUm DOI:

10.34726/4342

CatalogPlus:

AC17203138

Publication Type:

Inproceedings - Full-Paper Contribution

Language:

English

Authors:

Gemes, Kinga Andrea
Kovacs, Adam
Reichel, Markus
Recski, Gábor

Organisational Unit:

E194-04 - Forschungsbereich Data Science
E194 - Institut für Information Systems Engineering

Published in:

FIRE-WN 2021 [FIRE 2021 Working Notes]

Volume:

3159

Date (published):

2021

Event name:

Forum for Information Retrieval Evaluation

Event date:

13-Dec-2021 - 17-Dec-2021

Event place:

Gandhinagar, India

Number of Pages:

Publisher:

CEUR-WS.org

Peer reviewed:

Yes

Keywords:

social media data; hate speech detection; rule-based methods; deep learning; text classification

Abstract:

This paper describes the systems the TUW-Inf team submitted for the HASOC 2021 shared task on identifying offensive comments in social media. Besides a simple BERT-based classifier that achieved one of the highest F-scores on the binary classification task, we also build a high-precision rule-based classifier using a custom framework for human-in-the-loop learning. Both of our approaches are also evaluated qualitatively by manual analysis of 150 tweets, which also highlights possible controversies in the ground truth labels of the HASOC dataset

Link (external):

https://github.com/GKingA/tuw-inf-hasoc2021

Research Areas:

Information Systems Engineering: 100%

Science Branch:

1020 - Informatik: 90%
5020 - Wirtschaftswissenschaften: 10%

License:

CC BY 4.0