<div class="csl-bib-body">
<div class="csl-entry">Veyhe, B. E., Sagi, T., & Hose, K. (2023). Scientific Data Extraction from Oceanographic Papers. In Y. Ding, J. Tang, J. Sequeda, C. Castillo, & G.-J. Houben (Eds.), <i>WWW ’23 Companion: Companion Proceedings of the ACM Web Conference 2023</i> (pp. 800–804). Association for Computing Machinery. https://doi.org/10.1145/3543873.3587595</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/177649
-
dc.description.abstract
Scientific data collected in the oceanographic domain is invaluable to researchers when performing meta-analyses and examining changes over time in oceanic environments. However, many of the data samples and subsequent analyses published by researchers are not uploaded to a repository leaving the scientific paper as the only available source. Automated extraction of scientific data is, therefore, a valuable tool for such researchers. Specifically, much of the most valuable data in scientific papers are structured as tables, making these a prime target for information extraction research. Using the data relies on an additional step where the concepts mentioned in the tables, such as names of measures, units, and biological species, are identified within a domain ontology. Unfortunately, state-of-the-art table extraction leaves much to be desired and has not been attempted on a large scale on oceanographic papers. Furthermore, while entity linking in the context of a full paragraph of text has been heavily researched, it is still lacking in this harder task of linking single concepts. In this work, we present an annotated benchmark dataset of data tables from oceanographic papers. We further present the result of an evaluation on the extraction of these tables and the linking of the contained entities to the domain and general-purpose knowledge bases using the current state of the art. We highlight the challenges and quantify the performance of current tools for table extraction and table-concept linking.
en
dc.language.iso
en
-
dc.subject
table extraction
en
dc.subject
scientific data
en
dc.subject
entity linking
en
dc.title
Scientific Data Extraction from Oceanographic Papers
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.contributor.affiliation
Aalborg University, Denmark
-
dc.contributor.affiliation
Aalborg University, Denmark
-
dc.contributor.editoraffiliation
data.world
-
dc.relation.isbn
9781450394192
-
dc.description.startpage
800
-
dc.description.endpage
804
-
dc.type.category
Full-Paper Contribution
-
tuw.booktitle
WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023
-
tuw.peerreviewed
true
-
tuw.relation.publisher
Association for Computing Machinery
-
tuw.relation.publisherplace
New York
-
tuw.researchTopic.id
I1
-
tuw.researchTopic.id
I4
-
tuw.researchTopic.name
Logic and Computation
-
tuw.researchTopic.name
Information Systems Engineering
-
tuw.researchTopic.value
80
-
tuw.researchTopic.value
20
-
tuw.publication.orgunit
E192-02 - Forschungsbereich Databases and Artificial Intelligence
-
tuw.publication.orgunit
E192 - Institut für Logic and Computation
-
tuw.publisher.doi
10.1145/3543873.3587595
-
dc.description.numberOfPages
5
-
tuw.author.orcid
0000-0002-7497-5690
-
tuw.author.orcid
0000-0002-8916-0128
-
tuw.author.orcid
0000-0001-7025-8099
-
tuw.editor.orcid
0000-0002-3991-2539
-
tuw.editor.orcid
0000-0003-3112-9299
-
tuw.event.name
WWW '23: The ACM Web Conference 2023
en
dc.description.sponsorshipexternal
Israel PBC
-
dc.description.sponsorshipexternal
Independent Research Fund Denmark
-
dc.relation.grantnoexternal
100009443
-
dc.relation.grantnoexternal
8048-00051B
-
tuw.event.startdate
30-04-2023
-
tuw.event.enddate
05-05-2023
-
tuw.event.online
On Site
-
tuw.event.type
Event for scientific audience
-
tuw.event.place
Austin, Texas
-
tuw.event.country
US
-
tuw.event.presenter
Veyhe, Bartal Eyðfinsson
-
wb.sciencebranch
Informatik
-
wb.sciencebranch
Mathematik
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.oefos
1010
-
wb.sciencebranch.value
80
-
wb.sciencebranch.value
20
-
item.grantfulltext
none
-
item.fulltext
no Fulltext
-
item.openairetype
conference paper
-
item.languageiso639-1
en
-
item.openairecristype
http://purl.org/coar/resource_type/c_5794
-
item.cerifentitytype
Publications
-
crisitem.author.dept
Aalborg University, Denmark
-
crisitem.author.dept
E192-02 - Forschungsbereich Databases and Artificial Intelligence
-
crisitem.author.dept
E192-02 - Forschungsbereich Databases and Artificial Intelligence