Parzer, R., Filzmoser, P., & Vana Gür, L. (2025). Sparse data-driven random projection in regression for high-dimensional data. Journal of Data Science, Statistics, and Visualisation, 5(5). https://doi.org/10.52933/jdssv.v5i5.138
E105-06 - Forschungsbereich Computational Statistics E056-23 - Fachbereich Innovative Combinations and Applications of AI and ML (iCAIML)
-
Journal:
Journal of data science, statistics, and visualisation
-
Date (published):
9-May-2025
-
Number of Pages:
36
-
Publisher:
International Association for Statistical Computing (IASC)
-
Peer reviewed:
No
-
Keywords:
High-dimensional regression; Dimension reduction; Random Projection
en
Abstract:
We examine the linear regression problem in a challenging high-dimensionalsetting with correlated predictors where the degree of sparsity of the coefficientsis unknown and can vary from sparse to dense. In this setting, we propose acombination of probabilistic variable screening with random projection tools asa computationally efficient approach. In particular, we introduce a new data-driven random projection for dimension reduction in linear regression, which ismotivated by a theoretical bound on the gain in expected prediction error overconventional random projections when using information about the true coefficient. The variables to be included in the projection are screened by consideringthe correlation of the predictors. To reduce the dependence on fine-tuning choices,we aggregate over an ensemble of linear models. A threshold parameter is introduced to obtain a higher degree of sparsity, which can be chosen together withthe number of models in the ensemble by cross-validation. In extensive simulations, we compare the proposed method with other random projection tools andwith well-known methods, and show that it is competitive in terms of predictionin a variety of scenarios with different sparsity and predictor covariance settings,while most competitors are targeted at either sparse or dense settings. Finally,we illustrate the method on two data applications.