Schwendinger, B., Schwendinger, F., & Vana Gür, L. (2022, June 22). Holistic generalized linear models [Poster Presentation]. useR! 2022, United States of America (the).
E384-01 - Forschungsbereich Software-intensive Systems E105-06 - Forschungsbereich Computational Statistics
-
Date (published):
22-Jun-2022
-
Event name:
useR! 2022
en
Event date:
20-Jun-2022 - 23-Jun-2022
-
Event place:
United States of America (the)
-
Keywords:
Generalized Linear Models; Algorithmic regression; best subset selection; conic programming; holistic constraints
en
Abstract:
Selecting a sensible model from the set of all reasonable models is an essential but typically time-consuming process in the data analytic process. To simplify this process, Bertsimas & King 2015 and Bertsimas & Li 2020 introduce the holistic linear model (HLM). The HLM is a constrained linear regression model where the constraints aim to automate the model selection process by utilizing quadratic mixed-integer optimization. The integer constraints are used to place cardinality constraints on the linear regression model. Placing a cardinality constraint on the total number of variables allowed in the final model leads to the classical best subset selection problem (Miller 2002): minimize_{beta} 1/2 ||y-X*beta||_2^2 subject to ||beta||_0 =< k
Adding cardinality constraints on user-defined groups of variables can be used to limit the pairwise multicollinearity or select the best (non-linear) transformation. Additionally, the HLM allows posing constraints on the global multicollinearity and linear constraints on the parameters.
This work introduces holiglm, an R package for formulating and fitting holistic generalized linear models (HGLMs). To our knowledge, we are the first to suggest using conic optimization to extend the results presented for linear regression by Bertsimas et al. to the class of generalized linear models. The holiglm package provides a flexible infrastructure for automatically translating constrained generalized linear models into conic optimization problems. The optimization problems are solved by utilizing the R optimization infrastructure package ROI (Theußl, Schwendinger & Hornik 2020). Using ROI makes it possible for the user to choose from a wide range of commercial and open-source optimization solvers. Additionally, a high-level interface is provided, which can be used as a drop-in replacement for the stats::glm() function. Using conic optimization instead of iteratively reweighted least squares (IRLS) has the advantage that no starting values are needed, the results are more reliable (proven optimality) and the solvers are designed to handle constraints. These advantages come at the cost of a longer runtime. However, as shown by Schwendinger, Grün & Hornik 2021, for some GLMs the speed of the conic formulation is similar to the IRLS implementation.
en
Research Areas:
Mathematical Methods in Economics: 50% Fundamental Mathematics Research: 50%