Syntactically Rich Discriminative Training: An Effective Method for Open Information Extraction

Mtumbuka, Frank; Lukasiewicz, Thomas

doi:10.18653/v1/2022.emnlp-main.401

DC Element

Wert

Sprache

dc.contributor.author

Mtumbuka, Frank

dc.contributor.author

Lukasiewicz, Thomas

dc.date.accessioned

2024-02-05T11:11:38Z

dc.date.available

2024-02-05T11:11:38Z

dc.date.issued

2022

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Mtumbuka, F., & Lukasiewicz, T. (2022). Syntactically Rich Discriminative Training: An Effective Method for Open Information Extraction. In <i>Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</i> (pp. 5972–5987). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.emnlp-main.401</div> </div>

dc.identifier.uri

http://hdl.handle.net/20.500.12708/193384

dc.description.abstract

Open information extraction (OIE) is the task of extracting facts ''(Subject, Relation, Object){''} from natural language text. We propose several new methods for training neural OIE models in this paper. First, we propose a novel method for computing syntactically rich text embeddings using the structure of dependency trees. Second, we propose a new discriminative training approach to OIE in which tokens in the generated fact are classified as {``}real{''} or {``}fake{''}, i.e., those tokens that are in both the generated and gold tuples, and those that are only in the generated tuple but not in the gold tuple. We also address the issue of repetitive tokens in generated facts and improve the models{'} ability to generate implicit facts. Our approach reduces repetitive tokens by a factor of 23{\%}. Finally, we present paraphrased versions of the CaRB, OIE2016, and LSOIE datasets, and show that the models{'} performance substantially improves when trained on augmented datasets. Our best model beats the SOTA of IMoJIE on the recent CaRB dataset, with an improvement of 39.63{\%} in F1 score.

dc.language.iso

dc.rights.uri

http://creativecommons.org/licenses/by/4.0/

dc.subject

open information extraction

dc.subject

dependency trees

dc.subject

discriminative training approach

dc.title

Syntactically Rich Discriminative Training: An Effective Method for Open Information Extraction

dc.type

Inproceedings

dc.type

Konferenzbeitrag

dc.rights.license

Creative Commons Namensnennung 4.0 International

dc.rights.license

Creative Commons Attribution 4.0 International

dc.contributor.affiliation

University of Oxford, United Kingdom of Great Britain and Northern Ireland (the)

dc.description.startpage

5972

dc.description.endpage

5987

dc.type.category

Full-Paper Contribution

tuw.booktitle

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

tuw.peerreviewed

true

tuw.relation.publisher

Association for Computational Linguistics

tuw.researchTopic.id

tuw.researchTopic.name

Information Systems Engineering

tuw.researchTopic.value

100

tuw.publication.orgunit

E192-07 - Forschungsbereich Artificial Intelligence Techniques

tuw.publisher.doi

10.18653/v1/2022.emnlp-main.401

dc.identifier.libraryid

AC17203068

dc.description.numberOfPages

dc.rights.identifier

CC BY 4.0

dc.rights.identifier

CC BY 4.0

tuw.event.name

The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)

tuw.event.startdate

07-12-2022

tuw.event.enddate

11-12-2022

tuw.event.online

On Site

tuw.event.type

Event for scientific audience

tuw.event.place

Abu Dhabi

tuw.event.country

tuw.event.presenter

Mtumbuka, Frank

wb.sciencebranch

Informatik

wb.sciencebranch

Mathematik

wb.sciencebranch.oefos

1020

wb.sciencebranch.oefos

1010

wb.sciencebranch.value

item.mimetype

application/pdf

item.openairetype

conference paper

item.cerifentitytype

Publications

item.grantfulltext

open

item.languageiso639-1

item.openairecristype

http://purl.org/coar/resource_type/c_5794

item.openaccessfulltext

Open Access

item.fulltext

with Fulltext

crisitem.author.dept

University of Oxford

crisitem.author.dept

E192-07 - Forschungsbereich Artificial Intelligence Techniques

crisitem.author.parentorg

E192 - Institut für Logic and Computation

Enthalten in den Sammlungen:

Conference Paper

Volltext (Version of Record (published version))

Adobe PDF

(397.44 kB)

CC BY 4.0

Zur Kurzanzeige

Seiten Aufrufe

227

aufgerufen am 05.02.2024

Download(s)

aufgerufen am 05.02.2024

Google Scholar^TM

Check

Seiten Aufrufe

Download(s)

Google ScholarTM

Google Scholar^TM