@cyanover and I are developing methodologies to address the challenge of estimating the performance of a trained prediction model on external datasets using only their limited statistical characteristics. This method will be presented in the upcoming symposium (Estimating Model Performance on External Datasets from Their Limited Statistical Characteristics: Application to 3-Year Surgery Risk in Ulcerative Colitis β OHDSI).
Briefly, we propose to reweigh samples in the internal dataset to match external dataset statistics, as reported in a preceding publication (βTable 1β) or a characterization study; then estimate the performance on the external dataset using the reweighted internal sample. We validate our approach using a set of prediction models (e.g., for 3y risk of surgery in ulcerative colitis patients).
Ultimately, such an approach may identify models that perform well across multiple clinical settings and geographies, even when detailed test data from such settings is not available (obviously, some limitations may apply).
Interested in data-shift robust modeling? Want to join forces? Come visit our poster or contact us.
1 Like