<div class="csl-bib-body">
<div class="csl-entry">Pachinger, P., Goldzycher, J., Planitzer, A. M., Neidhardt, J., & Hanbury, A. (2025). A Disaggregated Dataset on English Offensiveness Containing Spans. In G. Abercrombie, V. Basile, S. Frenda, S. Tonelli, & S. Dudy (Eds.), <i>Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP</i>. Association for Computational Linguistics. https://doi.org/10.18653/v1/2025.nlperspectives-1.1</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/225535
-
dc.description.abstract
Toxicity labels at sub-document granularity and disaggregated labels lead to more nuanced and personalized toxicity classification and facilitate analysis. We re-annotate a subset of 1983 posts of the Jigsaw Toxic Comment Classification Challenge and provide disaggregated toxicity labels and spans that identify inappropriate language and targets of toxic statements. Manual analysis shows that five annotations per instance effectively capture meaningful disagreement patterns and allow for finer distinctions between genuine disagreement and that arising from annotation error or inconsistency. Our main findings are: (1) Disagreement often stems from divergent interpretations of edge-case toxicity (2) Disagreement is especially high in cases of toxic statements involving non-human targets (3) Disagreement on whether a passage consists of inappropriate language occurs not only on inherently questionable terms, but also on words that may be inappropriate in specific contexts while remaining acceptable in others (4) Transformer-based models effectively learn from aggregated data that reduces false negative classifications by being more sensitive towards minority opinions for posts to be toxic. We publish the new annotations under the CC BY 4.0 license.
en
dc.description.sponsorship
WWTF Wiener Wissenschafts-, Forschu und Technologiefonds
-
dc.description.sponsorship
Christian Doppler Forschungsgesells
-
dc.language.iso
en
-
dc.subject
Offensiveness detection
en
dc.subject
Human label variation
en
dc.subject
dataset creation
en
dc.subject
text classification
en
dc.subject
perspectivism
en
dc.subject
in-context learning
en
dc.subject
Fine-Tuning
en
dc.title
A Disaggregated Dataset on English Offensiveness Containing Spans
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.contributor.affiliation
University of Zurich, Switzerland
-
dc.contributor.affiliation
University of Vienna, Austria
-
dc.contributor.editoraffiliation
Heriot-Watt University, United Kingdom of Great Britain and Northern Ireland (the)
-
dc.relation.isbn
979-8-89176-350-0
-
dc.relation.doi
10.18653/v1/2025.nlperspectives-1
-
dc.relation.grantno
ICT20-015
-
dc.relation.grantno
CDL Neidhardt
-
dc.type.category
Full-Paper Contribution
-
tuw.booktitle
Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP
-
tuw.relation.publisher
Association for Computational Linguistics
-
tuw.project.title
Transparente Automatisierte Inhaltsmoderation
-
tuw.project.title
Christian Doppler Labor für Weiterentwicklung des State-of-the-Art von Recommender-Systemen in mehreren Domänen
-
tuw.researchTopic.id
I4
-
tuw.researchTopic.name
Information Systems Engineering
-
tuw.researchTopic.value
100
-
tuw.publication.orgunit
E194-04 - Forschungsbereich Data Science
-
tuw.publication.orgunit
E056-23 - Fachbereich Innovative Combinations and Applications of AI and ML (iCAIML)
-
tuw.publisher.doi
10.18653/v1/2025.nlperspectives-1.1
-
dc.description.numberOfPages
14
-
tuw.author.orcid
0000-0002-0706-810X
-
tuw.author.orcid
0000-0001-8181-6615
-
tuw.author.orcid
0000-0001-7184-1841
-
tuw.author.orcid
0000-0002-7149-5843
-
tuw.editor.orcid
0000-0002-6546-3562
-
tuw.event.name
The 4th Workshop on Perspectivist Approaches to NLP
en
tuw.event.startdate
08-11-2025
-
tuw.event.enddate
08-11-2025
-
tuw.event.online
On Site
-
tuw.event.type
Event for scientific audience
-
tuw.event.place
Suzhou
-
tuw.event.country
CN
-
tuw.event.presenter
Pachinger, Pia
-
wb.sciencebranch
Informatik
-
wb.sciencebranch
Wirtschaftswissenschaften
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.oefos
5020
-
wb.sciencebranch.value
90
-
wb.sciencebranch.value
10
-
item.grantfulltext
none
-
item.fulltext
no Fulltext
-
item.cerifentitytype
Publications
-
item.openairecristype
http://purl.org/coar/resource_type/c_5794
-
item.languageiso639-1
en
-
item.openairetype
conference paper
-
crisitem.author.dept
E194-04 - Forschungsbereich Data Science
-
crisitem.author.dept
University of Zurich, Switzerland
-
crisitem.author.dept
University of Vienna, Austria
-
crisitem.author.dept
E194-04 - Forschungsbereich Data Science
-
crisitem.author.dept
E194-04 - Forschungsbereich Data Science
-
crisitem.author.orcid
0000-0002-0706-810X
-
crisitem.author.orcid
0000-0001-7184-1841
-
crisitem.author.orcid
0000-0002-7149-5843
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering
-
crisitem.project.funder
WWTF Wiener Wissenschafts-, Forschu und Technologiefonds