<div class="csl-bib-body">
<div class="csl-entry">Konstantinou, N., Abel, E., Bellomarini, L., Bogatu, A. T., Civili, C., Irfanie, E., Köhler, M., Lacramioara, M., Sallinger, E., Fernandes, A. A. A., Gottlob, G., Keane, J. A., & Paton, N. W. (2019). VADA: an architecture for end user informed data preparation. <i>Journal Of Big Data</i>, <i>6</i>(74). https://doi.org/10.1186/s40537-019-0237-9</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/143242
-
dc.description.abstract
Background: Data scientists spend considerable amounts of time preparing data for analysis. Data preparation is labour intensive because the data scientist typically takes fine grained control over each aspect of each step in the process, motivating the development
of techniques that seek to reduce this burden.
Results: This paper presents an architecture in which the data scientist need only describe the intended outcome of the data preparation process, leaving the software to determine how best to bring about the outcome. Key wrangling decisions on matching, mapping generation, mapping selection, format transformation and data repair are taken by the system, and the user need only provide: (i) the schema of the data target; (ii) partial representative instance data aligned with the target; (iii) criteria to be prioritised when populating the target; and (iv) feedback on candidate results. To
support this, the proposed architecture dynamically orchestrates a collection of loosely coupled wrangling components, in which the orchestration is declaratively specified and includes self-tuning of component parameters.
Conclusion: This paper describes a data preparation architecture that has been designed to reduce the cost of data preparation through the provision of a central role for automation. An empirical evaluation with deep web and open government data investigates the quality and suitability of the wrangling result, the cost-effectiveness of
the approach the impact of self-tuning, and scalability with respect to the numbers of sources.
en
dc.relation.ispartof
Journal Of Big Data
-
dc.subject
Hardware and Architecture
-
dc.subject
Information Systems
-
dc.subject
Data integration
-
dc.subject
Computer Networks and Communications
-
dc.subject
Data quality
-
dc.subject
Information Systems and Management
-
dc.subject
Data preparation
-
dc.title
VADA: an architecture for end user informed data preparation
-
dc.type
Artikel
de
dc.type
Article
en
dc.contributor.affiliation
University of Manchester, United Kingdom of Great Britain and Northern Ireland (the)
-
dc.contributor.affiliation
Samsung (United Kingdom), United Kingdom of Great Britain and Northern Ireland (the)
-
dc.type.category
Original Research Article
-
tuw.container.volume
6
-
tuw.container.issue
74
-
tuw.journal.peerreviewed
true
-
tuw.peerreviewed
true
-
wb.publication.intCoWork
International Co-publication
-
tuw.researchTopic.id
I1
-
tuw.researchTopic.name
Logic and Computation
-
tuw.researchTopic.value
100
-
dcterms.isPartOf.title
Journal Of Big Data
-
tuw.publication.orgunit
E192-02 - Forschungsbereich Databases and Artificial Intelligence
-
tuw.publisher.doi
10.1186/s40537-019-0237-9
-
dc.identifier.eissn
2196-1115
-
dc.description.numberOfPages
32
-
tuw.author.orcid
0000-0001-6863-0162
-
tuw.author.orcid
0000-0003-1604-5097
-
tuw.author.orcid
0000-0002-4357-3509
-
wb.sci
false
-
wb.sciencebranch
Informatik
-
wb.sciencebranch.oefos
1020
-
wb.facultyfocus
Logic and Computation (LC)
de
wb.facultyfocus
Logic and Computation (LC)
en
wb.facultyfocus.faculty
E180
-
item.openairecristype
http://purl.org/coar/resource_type/c_2df8fbb1
-
item.openairetype
research article
-
item.fulltext
no Fulltext
-
item.grantfulltext
none
-
item.cerifentitytype
Publications
-
crisitem.author.dept
University of Manchester
-
crisitem.author.dept
Samsung (United Kingdom)
-
crisitem.author.dept
E192-02 - Forschungsbereich Databases and Artificial Intelligence
-
crisitem.author.dept
E192-02 - Forschungsbereich Databases and Artificial Intelligence