when developing LASSO logistic regression models using the OHDSI/PLP framework, we generally receive as an output a model with coefficient values assigned to each candidate predictor. Some coefficients may be zero due to LASSO shrinkage.
We were wondering, if reporting the 95% confidence interval of these coefficient values is actually a thing:
Is there functionality to compute the 95% confidence interval of coefficient values? I believe we are not doing this in the PLP package @jennareps? Is this information computed in Cyclops @msuchard?
Would there be any reason why we would want the 95% confidence interval for a regression coefficient? It appears common practice to report this CI for hazard ratios or odds ratios to then conclude which covariates are “good” risk factors, but translated to a prediction problem, I believe LASSO is already making this decision for us.
I would be interested if anyone has experience with CIs for regression coefficients
The short answer is that confidence intervals make perfect sense for regression coefficients if these arise from a regression model you believe fit reality well (“describe the data-generating process; that is, based on domain expertise) with one or more well-defined null hypothesis/es; however, that is not the case for lasso (or ridge or elastic net) models with which you squeeze as much information as possible out of as few covariates as possible for a prediction problem which is something very different.
It is possible to compute (meaningful) CIs for variables that have been excluded from regularization. But in a typical prediction setting you’ll be applying regularization to all.