reposiTUm: Multi-study Factor Regression Models for Large Complex Data with Applications to Nutritional Epidemiology and Cancer Genomics

Record link:

http://hdl.handle.net/20.500.12708/154454

Title:

Multi-study Factor Regression Models for Large Complex Data with Applications to Nutritional Epidemiology and Cancer Genomics

Citation:

Avalos Pacheco, A. (2023, February 3). Multi-study Factor Regression Models for Large Complex Data with Applications to Nutritional Epidemiology and Cancer Genomics [Presentation]. Statistics Seminar, Mexico city, Mexico.

Publication Type:

Presentation - Presentation

Language:

English

Authors:

Avalos Pacheco, Alejandra

Organisational Unit:

E105-08 - Forschungsbereich Angewandte Statistik

Date (published):

3-Feb-2023

Event name:

Statistics Seminar

Event date:

3-Feb-2023

Event place:

Mexico city, Mexico

Keywords:

factor models dimensionality reduction; high-dimensional inference; applied Bayesian statistical modelling; heterogenous data integration; dimensionality reduction

Abstract:

Data-integration of multiple studies can be key to understand and gain knowledge in statistical research. However, such data present both biological and artifactual sources of variation, also known as covariate effects. Covariate effects can be complex, leading to systematic biases. In this talk I will present novel sparse latent factor regression (FR) and multi-study factor regression (MSFR) models to integrate such heterogeneous data. The FR model provides a tool for data exploration via dimensionality reduction and sparse low-rank covariance estimation while correcting for a range of covariate effects, such as batch effects. MSFR are extensions of FR that enable us to jointly obtain a covariance structure that models the group-specific covariances in addition to the common component. I will discuss the use of several sparse priors (local and non-local) to learn the dimension of the latent factors. Our approach provides a flexible methodology for sparse factor regression which is not limited to data with covariate effects. I will present several examples, with a focus on bioinformatics applications. We show the usefulness of our methods in two main tasks: (1) to give a visual representation of the latent factors of the data, i.e. an unsupervised dimension reduction task and (2) to provide a (i) supervised survival analysis, using the factors obtained in our method as predictions for the cancer genomic data; and (ii) dietary pattern analysis, associating each factor with a measure of overall diet quality related to cardio-metabolic disease risk for a Hispanic community health nutritional-data study.Our results show an increase in the accuracy of the dimensionality reduction, with non-local priors substantially improving the reconstruction of factor cardinality. The results of our analyses illustrate how failing to properly account for covariate effects can result in unreliable inference.

Link (external):

http://estadistica.itam.mx/sites/default/files/399.pdf

Research Areas:

Mathematical and Algorithmic Foundations: 40%
Computer Science Foundations: 20%
Modeling and Simulation: 40%

Science Branch:

3019 - Sonstige Medizinisch-theoretische Wissenschaften: 15%
1020 - Informatik: 15%
1010 - Mathematik: 70%

Appears in Collections:

Presentation

Show full item record

Page view(s)

checked on Dec 1, 2023

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM