Motivation: Understanding how bacterial species relate to clinical health indicators can reveal microbiome signatures of disease, offering insights into conditions such as obesity or liver disease. However, analyzing such data requires methods that address compositionality, high dimensionality, sparsity, and outliers. Results: We tackle the challenge of identifying microbiome components linked to health indicators through a robust multivariate compositional regression model. Our method addresses the high dimensionality, sparsity, and compositional nature of microbiome data while maintaining control of the false discovery rate (FDR). By incorporating outlier robustness and a derandomization step, we enhance the stability and reproducibility of results, surpassing current techniques like the Multi-Response Knockoff Filter (MRKF). In simulation studies, our method outperforms MRKF in terms of FDR control, power, and robustness. In real data applications, it leads to valuable biological insights, such as identifying microbial species associated with specific clinical parameters. Availability and implementation: Software in R code format, along with synthetic data example illustrations and comprehensive documentation, is available at https://github.com/giannamonti/RobMReg.
en
Projekt (extern):
Ministry of Environment and Energy Security (MASE) through PNRR Mission 2, Component 2, Investment 3.5 European Union