OHDSI Home | Forums | Wiki | Github

Ancestors in Large Scale Propensity Score

In defining cohorts we will choose an ancestor_concept_id to capture descendants of interest. I’m curious how the LSPS handles this. With a nearly infinite number of covariates possible in the LSPS, ancestor searches would be prohibitively difficult given any reasonable researcher time constraints. Is the procedure, thus, to simply use, for example, the condition concepts as they occur in the condition_occurrence table?

The only drawback to this would seem to be multicollinearity. My memory of the literature on PS is that multicollinearity is not an issue in estimation.

OHDSI’s LSPS implementation uses the FeatureExtraction package to construct the covariates used in the propensity model. We typically use the default set of features, which includes analysis like useConditionGroupEraLongTerm (see here for all analyses), which includes not only verbatim concepts but also their ancestors. So for example if the concept in the database is ‘Acute myocardial infarction’, we also construct covariates for ‘Myocardial infarction’ and ‘Acute ischemic heart disease’. (If you’re interested in the nitty-gritty details, you can for example examine this SQL code).

You are right that this can potentially create a lot of covariates, with lots of colinearity. Luckily, our LASSO regression is not bothered by that at all, and we now have many methods experiments showing it does not affect estimation.

That’s great, @schuemie, but do we have that in a paper? Not pushing, just trying to find out, because this comes up all the time.

Large-scale propensity scores
Tian Y,Schuemie MJ, Suchard MA. Evaluating large-scale propensity score performance through real-world and synthetic data experiments. Int J Epidemiol 2018; 47:2005-14.
Zhang L, Wang Y, Schuemie M, Blei D, Hripcsak G. Adjusting for Indirectly Measured Confounding Using Large-Scale Propensity Score. Journal of Biomedical Informatics. 2022 Oct;134:104204. doi: 10.1016/j.jbi.2022.104204.

LSPS is superior to empirical confounder selection
Tian Y, Schuemie MJ, Suchard MA. Evaluating large-scale propensity score performance through real-world and synthetic data experiments. Int J Epidemiol 2018; 47:2005-14.

LSPS is superior to manual confounder selection (2017 is early mention of LSPS heuristics and diagnostics)
WEINSTEIN, R. B., RYAN, P., BERLIN, J. A., MATCHO, A., SCHUEMIE, M., SWERDEL, J., PATEL, K. and FIFE, D. (2017). Channeling in the Use of Nonprescription Paracetamol and Ibuprofen in an Electronic Medical Records Database: Evidence and Implications. Drug Safety 40 1279–1292.
WEINSTEIN, R. B., RYAN, P. B., BERLIN, J. A., SCHUEMIE, M. J., SWERDEL, J. and FIFE, D. (2020). Channeling Bias in the Analysis of Risk of Myocardial Infarction, Stroke, Gastrointestinal Bleeding, and Acute Renal Failure with the Use of Paracetamol Compared with Ibuprofen. Drug Safety 43 927–942.

How LSPS adjusts for unmeasured covariates
Zhang L, Wang Y, Schuemie M, Blei D, Hripcsak G. Adjusting for Indirectly Measured Confounding Using Large-Scale Propensity Score. Journal of Biomedical Informatics. 2022 Oct;134:104204. doi: 10.1016/j.jbi.2022.104204.
CHEN, R., SCHUEMIE, M., SUCHARD, M., OSTROPOLETS, A., ZHANG, L. and HRIPCSAK, G. (2020). Evaluation of large-scale propensity score modeling and covariate balance on potential unmeasured confounding in observational research (abstract). In Proceedings of the 2020 AMIA Symposium.

t