Walach, J. (2019). Robust log-ratio methods for classifying high-dimensional metabolomics data [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2019.44448
The development of statistical methods which are able to deal with high-dimensional data belongs to the major research activities in statistics. In many fields (e.g. chemometrics, genomics, metabolomics) it is easy to measure and store data by using advanced modern techniques. Thus, there are also numerous real-world applications justifying these developments. One possible way how to deal with such data comes from the log-ratio point of view. There is whole branch of statistics devoted to log-ratios -- Compositional Data Analysis. Compositional data represent a special type of multivariate data which describe parts of a whole. In this context only relative information is important. Because of these special features of compositional data, the application of standard statistical methods could lead to invalid conclusions. The primary aim of the thesis is to introduce procedures for analysing high-dimensional data which originate from different groups. The main focus is set on applications in the field of metabolomics, where the different data groups consist of observations related to different diseases. The new methods should not only allow to differentiate between the groups, but they should also enable feature selection: only those features (variables), which allow to discriminate between the different groups, should be identified. An important request for these methods is their robustness against outlying observations, which is a common situation in real data. Another interest of the thesis is the investigation of outliers in the data. We focus on both observational outliers and on so-called cell outliers. The former refers to the situation when an observation deviates from the majority of a group in possibly all variables, while in the latter case for a certain observation only the values in some variables (cells) are deviating. This will contribute to gain a better insight into the data structure.
en
Additional information:
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers