<div class="csl-bib-body">
<div class="csl-entry">Filzmoser, P., & Mazak-Huemer, A. (2023). Massive Data Sets – Is Data Quality Still an Issue? In B. Vogel-Heuser & M. Wimmer (Eds.), <i>Digital Transformation</i> (Vol. 1, pp. 269–279). Springer Vieweg. https://doi.org/10.1007/978-3-662-65004-2_11</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/175993
-
dc.description.abstract
The term “big data” has become a buzzword in the last years, and it refers to the possibility to collect and store huge amounts of information, resulting in big data bases and data repositories. This also holds for industrial applications: In a production process, for instance, it is possible to install many sensors and record data in a very high temporal resolution. The amount of information grows rapidly, but not necessarily does the insight into the production process. This is the point where machine learning or, say, statistics needs to enter, because sophisticated algorithms are now required to identify the relevant parameters which are the drivers of the quality of the product, as an example. However, is data quality still an issue? It is clear that with small amounts of data, single outliers or extreme values could affect the algorithms or statistical methods. Can “big data” overcome this problem? In this article we will focus on some specific problems in the regression context, and show that even if many parameters are measured, poor data quality can severely influence the prediction performance of the methods.
en
dc.language.iso
en
-
dc.subject
Data Management
en
dc.subject
Data Analytics
en
dc.subject
Model Integration
en
dc.subject
Cloud Computing
en
dc.subject
Blockchain
en
dc.title
Massive Data Sets – Is Data Quality Still an Issue?