<div class="csl-bib-body">
<div class="csl-entry">Toborek, V., Busch, M., Boßert, M., Bauckhage, C., & Welke, P. (2023). A New Aligned Simple German Corpus. In <i>Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics</i> (pp. 11393–11412). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.638</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/188931
-
dc.description.abstract
“Leichte Sprache”, the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people.We present a new sentence-aligned monolingual corpus for Simple German – German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods.We evaluate our alignments based on a manually labelled subset of aligned documents.The quality of our sentence alignments, as measured by the F1-score, surpasses previous work.We publish the dataset under CC BY-SA and the accompanying code under MIT license.
en
dc.description.sponsorship
WWTF Wiener Wissenschafts-, Forschu und Technologiefonds
-
dc.language.iso
en
-
dc.subject
Text simplification
en
dc.subject
monolingual translation
en
dc.subject
dataset
en
dc.title
A New Aligned Simple German Corpus
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.contributor.affiliation
University of Bonn, Germany
-
dc.contributor.affiliation
University of Bonn, Germany
-
dc.contributor.affiliation
University of Bonn, Germany
-
dc.contributor.affiliation
University of Bonn, Germany
-
dc.description.startpage
11393
-
dc.description.endpage
11412
-
dc.relation.grantno
ICT22-059
-
dc.type.category
Full-Paper Contribution
-
tuw.booktitle
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics
-
tuw.container.volume
1
-
tuw.relation.publisher
Association for Computational Linguistics
-
tuw.project.title
Structured Data Learning with Generalized Similarities
-
tuw.researchTopic.id
I4
-
tuw.researchTopic.name
Information Systems Engineering
-
tuw.researchTopic.value
100
-
tuw.publication.orgunit
E194-06 - Forschungsbereich Machine Learning
-
tuw.publisher.doi
10.18653/v1/2023.acl-long.638
-
dc.description.numberOfPages
20
-
tuw.author.orcid
0009-0009-8372-8251
-
tuw.author.orcid
0000-0001-6615-2128
-
tuw.event.name
61st Annual Meeting of the Association for Computational Linguistics
en
tuw.event.startdate
09-07-2023
-
tuw.event.enddate
15-07-2023
-
tuw.event.online
On Site
-
tuw.event.type
Event for scientific audience
-
tuw.event.place
Toronto
-
tuw.event.country
CA
-
tuw.event.presenter
Toborek, Vanessa
-
tuw.event.presenter
Welke, Pascal
-
tuw.event.track
Multi Track
-
wb.sciencebranch
Informatik
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.value
100
-
item.openairecristype
http://purl.org/coar/resource_type/c_5794
-
item.languageiso639-1
en
-
item.fulltext
no Fulltext
-
item.grantfulltext
none
-
item.openairetype
conference paper
-
item.cerifentitytype
Publications
-
crisitem.project.funder
WWTF Wiener Wissenschafts-, Forschu und Technologiefonds
-
crisitem.project.grantno
ICT22-059
-
crisitem.author.dept
University of Bonn
-
crisitem.author.dept
University of Bonn
-
crisitem.author.dept
University of Bonn
-
crisitem.author.dept
University of Bonn
-
crisitem.author.dept
E194-06 - Forschungsbereich Machine Learning
-
crisitem.author.orcid
0009-0009-8372-8251
-
crisitem.author.orcid
0000-0001-6615-2128
-
crisitem.author.orcid
0000-0002-2123-3781
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering