Puchhammer, P., & Filzmoser, P. (2023). A spatially smoothed MRCD estimator for local outlier detection. In ICORS 2023 - Book of Abstracts (pp. 58–59).
International Conference on Robust Statistics (ICORS) - 2023
en
Event date:
22-May-2023 - 26-May-2023
-
Event place:
Toulouse, France
-
Number of Pages:
2
-
Keywords:
Local outlier detection; Multivariate data; Spatial data; MRCD estimation
en
Abstract:
Many methods are available for multivariate outlier detection but until now only a hand full are developed for spatial data where there might be observations differing from their neighbors, so-called local outliers. Although there are methods based on a pairwise Mahalanobis distance approach, the type of the covariance matrices used is not yet agreed upon. For example, Filzmoser et al. [2013] propose a global covariance while Ernst and Haesbroeck [2016] suggest a very local structure by estimating one covariance matrix per observation. To bridge the gap between the global and local approach by providing a refined covariance structure we develop spatially smoothed covariance matrices based on the MRCD estimator [Boudt et al., 2020] for pre-defined neighborhoods a1, . . . , aN . As well known from the MCD literature, a subset of observations, the so-called H- set, is obtained by optimizing an objective function. In our case we obtain a set of optimal H-sets H = (H1, . . . , HN ) from minimizing the objective function f (H) =∑_i=1^N det( (1 − λ)Ki(H) + λ∑_j=1,j̸ =i^N ωij Kj (H)). While W = (wij )i,j=1,...,N represents the closeness of the neighborhoods, the parameter λ is essential for the degree of locality of the covariance matrices. The local covariance matrices Ki(H) are based on the MRCD convex combination of the sample covariance matrix of an H-set of the neighborhood ai and a global target matrix. For the optimal set of H-sets H* = (H*_i )i=1,...,N of the objective function, the final covariance estimate for neighborhood ai is defined as ∑_SSM,i =(1 − λ)Ki(H*) + λ ∑_j=1,j̸ =i ^N ωij Kj (H*). A heuristic algorithm based on the notion of a C-step is developed to find the optimal set of H-sets which also shows stable convergence properties in general. We demonstrate the applicability of the new covariance estimators and the importance of a compromise between locality and globality for local outlier detection with simulated and real world data, and compare the performance with other state-of-the-art methods from statistics and machine learning.