OHDSI Home | Forums | Wiki | Github

95% confidence intervals for logistic regression coefficients


when developing LASSO logistic regression models using the OHDSI/PLP framework, we generally receive as an output a model with coefficient values assigned to each candidate predictor. Some coefficients may be zero due to LASSO shrinkage.

We were wondering, if reporting the 95% confidence interval of these coefficient values is actually a thing:

  • Is there functionality to compute the 95% confidence interval of coefficient values? I believe we are not doing this in the PLP package @jennareps? Is this information computed in Cyclops @msuchard?

  • Would there be any reason why we would want the 95% confidence interval for a regression coefficient? It appears common practice to report this CI for hazard ratios or odds ratios to then conclude which covariates are “good” risk factors, but translated to a prediction problem, I believe LASSO is already making this decision for us.

I would be interested if anyone has experience with CIs for regression coefficients :thinking:

The short answer is that confidence intervals make perfect sense for regression coefficients if these arise from a regression model you believe fit reality well (“describe the data-generating process; that is, based on domain expertise) with one or more well-defined null hypothesis/es; however, that is not the case for lasso (or ridge or elastic net) models with which you squeeze as much information as possible out of as few covariates as possible for a prediction problem which is something very different.

The book Computer Age Statistical Inference has a good chapter on lasso, ridge and elastic nets (available for free online: Computer Age Statistical Inference: Algorithms, Evidence and Data Science)

1 Like

It is possible to compute (meaningful) CIs for variables that have been excluded from regularization. But in a typical prediction setting you’ll be applying regularization to all.

1 Like

Right, thanks for the qualification, @schuemie.