Exploratory functional data analysis of multivariate densities for the identification of agricultural soil contamination by risk elements

Matys Grygar, Tomas; Radojičić, Una; Pavlu, Ivana; Greven, Sonja; Nešlehová, Johanna G.; Tůmová, Štěpánka; Hron, Karel

doi:10.1016/j.gexplo.2024.107416

DC Field

Value

Language

dc.contributor.author

Matys Grygar, Tomas

dc.contributor.author

Radojičić, Una

dc.contributor.author

Pavlu, Ivana

dc.contributor.author

Greven, Sonja

dc.contributor.author

Nešlehová, Johanna G.

dc.contributor.author

Tůmová, Štěpánka

dc.contributor.author

Hron, Karel

dc.date.accessioned

2025-01-21T13:09:49Z

dc.date.available

2025-01-21T13:09:49Z

dc.date.issued

2024-04

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Matys Grygar, T., Radojičić, U., Pavlu, I., Greven, S., Nešlehová, J. G., Tůmová, Š., & Hron, K. (2024). Exploratory functional data analysis of multivariate densities for the identification of agricultural soil contamination by risk elements. <i>Journal of Geochemical Exploration</i>, <i>259</i>, Article 107416. https://doi.org/10.1016/j.gexplo.2024.107416</div> </div>

dc.identifier.issn

0375-6742

dc.identifier.uri

http://hdl.handle.net/20.500.12708/209198

dc.description.abstract

Geochemical mapping of risk element concentrations in soils is performed in many countries around the world. It results in numerous large datasets of high analytical quality, which can be used to identify soils that violate individual legislative limits for safe food production. However, there is a lack of advanced data mining tools that would be suitable for sensitive exploratory data analysis of big data while respecting the natural variability of soil composition. To distinguish anthropogenic contamination from natural variations, the analysis of the entire data distribution for smaller subareas is key. In this article, we propose a new data mining methodology for geochemical mapping data based on functional data analysis of probability densities in the framework of Bayes spaces after post-stratification of a big dataset to smaller districts. The tools we propose allow us to analyse the entire distribution, going well beyond a superficial detection of extreme concentration anomalies. We illustrate the proposed methodology on a dataset gathered according to the Czech national legislation (1990–2009), whose information content has not yet been fully exploited. Taking into account specific properties of probability density functions and recent results for orthogonal decomposition of multivariate densities enabled us to reveal real contamination patterns that were so far only suspected in Czech agricultural soils. We process the above Czech soil composition dataset for Cu, Pb, and Zn by first compartmentalizing it into spatial units, the so-called districts, and by subsequently clustering these districts according to diagnostic features of their uni- and multivariate distributions at high concentration levels. These clusters were seen to correspond to compartments that show known features of contamination, such as historical metallurgy of non-ferrous metals and iron and steel production. Comparison between compartments, notably neighbouring districts with similar natural factors controlling soil variability, is key to the reliable distinction of diffuse contamination. In this work, we used soil contamination by Cu-bearing pesticides as an example for empirical testing of the proposed data mining approach. In general, there are no natural and justifiable thresholds of risk element concentrations that would be valid for geographical areas with too much natural heterogeneity. Therefore, national (or larger) soil geochemistry datasets cannot be processed as a whole. As we demonstrate in this paper, empirical knowledge and careful tailoring of statistical tools for the characteristic types of soil contamination are essential for unequivocal identification of the anthropogenic component in real datasets.

dc.description.sponsorship

FWF - Österr. Wissenschaftsfonds

dc.language.iso

dc.publisher

ELSEVIER

dc.relation.ispartof

Journal of Geochemical Exploration

dc.subject

Bayes spaces

dc.subject

Compartmentalisation

dc.subject

Cu-bearing pesticides

dc.subject

FDA for geochemical maps

dc.subject

FDA of univariate and multivariate densities

dc.subject

Identification of Czech agricultural soil contamination

dc.title

Exploratory functional data analysis of multivariate densities for the identification of agricultural soil contamination by risk elements

dc.type

Article

dc.type

Artikel

dc.identifier.scopus

2-s2.0-85184990454

dc.identifier.url

https://api.elsevier.com/content/abstract/scopus_id/85184990454

dc.contributor.affiliation

Czech Academy of Sciences, Institute of Inorganic Chemistry, Czechia

dc.contributor.affiliation

Palacký University Olomouc, Czechia

dc.contributor.affiliation

Humboldt-Universität zu Berlin, Germany

dc.contributor.affiliation

McGill University, Canada

dc.contributor.affiliation

Palacký University Olomouc, Czechia

dc.relation.grantno

I 5799-N

dc.type.category

Original Research Article

tuw.container.volume

259

tuw.journal.peerreviewed

true

tuw.peerreviewed

true

wb.publication.intCoWork

International Co-publication

tuw.project.title

Generalisierte relative Daten und Robustheit in Bayes Räumen

tuw.researchTopic.id

tuw.researchTopic.name

Mathematical and Algorithmic Foundations

tuw.researchTopic.value

100

dcterms.isPartOf.title

Journal of Geochemical Exploration

tuw.publication.orgunit

E105-06 - Forschungsbereich Computational Statistics

tuw.publisher.doi

10.1016/j.gexplo.2024.107416

dc.date.onlinefirst

2024

dc.identifier.articleid

107416

dc.identifier.eissn

1879-1689

tuw.author.orcid

0000-0003-0931-0390

tuw.author.orcid

0000-0003-0495-850X

tuw.author.orcid

0000-0001-9634-4796

tuw.author.orcid

0000-0002-6214-8786

tuw.author.orcid

0000-0002-1847-6598

wb.sci

true

wb.sciencebranch

Informatik

wb.sciencebranch

Wirtschaftswissenschaften

wb.sciencebranch

Mathematik

wb.sciencebranch.oefos

1020

wb.sciencebranch.oefos

5020

wb.sciencebranch.oefos

1010

wb.sciencebranch.value

item.openairetype

research article

item.cerifentitytype

Publications

item.grantfulltext

none

item.languageiso639-1

item.openairecristype

http://purl.org/coar/resource_type/c_2df8fbb1

item.fulltext

no Fulltext

crisitem.project.funder

FWF - Österr. Wissenschaftsfonds

crisitem.project.grantno

I 5799-N

crisitem.author.dept

Czech Academy of Sciences, Institute of Inorganic Chemistry

crisitem.author.dept

E105-06 - Forschungsbereich Computational Statistics

crisitem.author.dept

Palacký University Olomouc

crisitem.author.dept

Humboldt-Universität zu Berlin

crisitem.author.dept

McGill University

crisitem.author.dept

Palacký University Olomouc

crisitem.author.orcid

0000-0003-0931-0390

crisitem.author.orcid

0000-0003-0495-850X

crisitem.author.orcid

0000-0002-6214-8786

crisitem.author.orcid

0000-0002-1847-6598

crisitem.author.parentorg

E105 - Institut für Stochastik und Wirtschaftsmathematik

Appears in Collections:

Article

Show simple item record

Google Scholar^TM

Check

Google ScholarTM

Google Scholar^TM