<div class="csl-bib-body">
<div class="csl-entry">Parzer, R., Filzmoser, P., & Vana Gür, L. (2024). <i>Data-Driven Random Projection and Screening for High-Dimensional Generalized Linear Models</i>. arXiv. https://doi.org/10.34726/8079</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/207911
-
dc.identifier.uri
https://doi.org/10.34726/8079
-
dc.description.abstract
We address the challenge of correlated predictors in high-dimensional GLMs, where regression coefficients range from sparse to dense, by proposing a data-driven random projection method. This is particularly relevant for applications where the number of predictors is (much) larger than the number of observations and the underlying structure -- whether sparse or dense -- is unknown. We achieve this by using ridge-type estimates for variable screening and random projection to incorporate information about the response-predictor relationship when performing dimensionality reduction. We demonstrate that a ridge estimator with a small penalty is effective for random projection and screening, but the penalty value must be carefully selected. Unlike in linear regression, where penalties approaching zero work well, this approach leads to overfitting in non-Gaussian families. Instead, we recommend a data-driven method for penalty selection. This data-driven random projection improves prediction performance over conventional random projections, even surpassing benchmarks like elastic net. Furthermore, an ensemble of multiple such random projections combined with probabilistic variable screening delivers the best aggregated results in prediction and variable ranking across varying sparsity levels in simulations at a rather low computational cost. Finally, three applications with count and binary responses demonstrate the method's advantages in interpretability and prediction accuracy.
en
dc.description.sponsorship
FWF - Österr. Wissenschaftsfonds
-
dc.language.iso
en
-
dc.rights.uri
http://creativecommons.org/licenses/by/4.0/
-
dc.subject
Generalized Linear Models
en
dc.subject
high-dimensional data
en
dc.subject
Predictive Modeling
en
dc.subject
Random Projection
en
dc.subject
Screening
en
dc.title
Data-Driven Random Projection and Screening for High-Dimensional Generalized Linear Models
en
dc.type
Preprint
en
dc.type
Preprint
de
dc.rights.license
Creative Commons Namensnennung 4.0 International
de
dc.rights.license
Creative Commons Attribution 4.0 International
en
dc.identifier.doi
10.34726/8079
-
dc.identifier.arxiv
2410.00971
-
dc.relation.grantno
ZK 35-G
-
tuw.project.title
Hochdimensionales statistisches Lernen: Neue Methoden zur Förderung der Wirtschafts- und Nachhaltigkeitspolitik