Braus, L. (2023). Local outlier detection for compositional data [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.105504
E105 - Institut für Stochastik und Wirtschaftsmathematik
-
Date (published):
2023
-
Number of Pages:
62
-
Keywords:
Robust statistics; Outliers
en
Abstract:
This master thesis explores the application of outlier detection techniques on compositional data, focusing on local outlier detection methods. Compositional data, where relevant information is contained in ratios between the components, require specialized analysis approaches. The thesis begins with an introduction to compositional data and local outlier detection, providing a foundation for the subsequent chapters. The analysis of compositional data involves understanding their geometrical properties and employing the so-called Aitchison geometry on the simplex. Coordinate representations and preprocessing issues are also discussed, highlighting the challenges unique to compositional data analysis.The thesis delves into outlier detection, covering classical and robust statistical analysis techniques and emphasizing their significance in the context of compositional data. The main focus is on local outlier detection methods, specifically exploring two robust methods and the Local Outlier Factor (LOF) technique. The practical application of these methods is demonstrated using spatially dependent geochemical data obtained from the Geological Survey of Finland. The thesis provides a detailed description of the data and explains the necessary data preparation and cleaning steps. The four relevant methods are applied to identify outliers in the data. Furthermore, the identified outliers are analyzed and explained. The thesis concludes with a comprehensive evaluation and comparison of the applied methods, considering their overall effectiveness and performance. Parameters used in the analysis, including the parameter k for the k-nearest neighbors (kNN) method, are discussed to provide insights into their impact on the results.