<div class="csl-bib-body">
<div class="csl-entry">Matys Grygar, T., Radojičić, U., Pavlu, I., Greven, S., Nešlehová, J. G., Tůmová, Š., & Hron, K. (2024). Exploratory functional data analysis of multivariate densities for the identification of agricultural soil contamination by risk elements. <i>Journal of Geochemical Exploration</i>, <i>259</i>, Article 107416. https://doi.org/10.1016/j.gexplo.2024.107416</div>
</div>
-
dc.identifier.issn
0375-6742
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/209198
-
dc.description.abstract
Geochemical mapping of risk element concentrations in soils is performed in many countries around the world. It results in numerous large datasets of high analytical quality, which can be used to identify soils that violate individual legislative limits for safe food production. However, there is a lack of advanced data mining tools that would be suitable for sensitive exploratory data analysis of big data while respecting the natural variability of soil composition. To distinguish anthropogenic contamination from natural variations, the analysis of the entire data distribution for smaller subareas is key. In this article, we propose a new data mining methodology for geochemical mapping data based on functional data analysis of probability densities in the framework of Bayes spaces after post-stratification of a big dataset to smaller districts. The tools we propose allow us to analyse the entire distribution, going well beyond a superficial detection of extreme concentration anomalies. We illustrate the proposed methodology on a dataset gathered according to the Czech national legislation (1990–2009), whose information content has not yet been fully exploited. Taking into account specific properties of probability density functions and recent results for orthogonal decomposition of multivariate densities enabled us to reveal real contamination patterns that were so far only suspected in Czech agricultural soils. We process the above Czech soil composition dataset for Cu, Pb, and Zn by first compartmentalizing it into spatial units, the so-called districts, and by subsequently clustering these districts according to diagnostic features of their uni- and multivariate distributions at high concentration levels. These clusters were seen to correspond to compartments that show known features of contamination, such as historical metallurgy of non-ferrous metals and iron and steel production. Comparison between compartments, notably neighbouring districts with similar natural factors controlling soil variability, is key to the reliable distinction of diffuse contamination. In this work, we used soil contamination by Cu-bearing pesticides as an example for empirical testing of the proposed data mining approach. In general, there are no natural and justifiable thresholds of risk element concentrations that would be valid for geographical areas with too much natural heterogeneity. Therefore, national (or larger) soil geochemistry datasets cannot be processed as a whole. As we demonstrate in this paper, empirical knowledge and careful tailoring of statistical tools for the characteristic types of soil contamination are essential for unequivocal identification of the anthropogenic component in real datasets.
en
dc.description.sponsorship
FWF - Österr. Wissenschaftsfonds
-
dc.language.iso
en
-
dc.publisher
ELSEVIER
-
dc.relation.ispartof
Journal of Geochemical Exploration
-
dc.subject
Bayes spaces
en
dc.subject
Compartmentalisation
en
dc.subject
Cu-bearing pesticides
en
dc.subject
FDA for geochemical maps
en
dc.subject
FDA of univariate and multivariate densities
en
dc.subject
Identification of Czech agricultural soil contamination
en
dc.title
Exploratory functional data analysis of multivariate densities for the identification of agricultural soil contamination by risk elements