Implementing Existing Prediction Models with continuous risk factors

dramacloak · February 19, 2019, 11:34pm

My goal is to implement the EORTC bladder cancer risk score model within the OHDSI ecosystem. There are several risk factors that have been problematic thus far as they are continuous variables with specific ranges being worth different point values/weights (e.g., a single tumor is 0 points, 2-7 tumors is 3 points and >=8 tumors is 6 points in the EORTC risk score model for recurrence). I’ve found an appropriate concept for ‘number of tumors’ in the observation table. I haven’t figured out how to use the available Feature Extraction covariate settings (https://rdrr.io/github/OHDSI/FeatureExtraction/man/createCovariateSettings.html) to filter based on a different value from the observation (I’m currently using useObservationLongTerm = TRUE).

Similarly complicated is tumor size (split into <3 and >=3), which I have mapped to concepts in the measurement table. There appears to be a “useMeasurementValueAnyTimePrior” or " useMeasurementRangeGroupAnyTimePrior" in the covariate settings, but I’m not sure how to set the value such that I can specify different weights/points at different thresholds/ranges.

Slightly more complex is prior recurrence rate. We’ve stored recurrent cancer in the condition occurrence table, which inherently takes care of differentiating between primary and recurrence, however this approach is insufficient to handle the differentiation between <=1 and >1 rec/year. There is a useDistinctConditionCount[long/medium/short]Term option that exists that could work as follows: we set the mediumtermdays to -365 (the past year) and set the weight 2, then no recurrence would get 0 points, 1 recurrence would get 2 points, and 2 recurrence would get 4 points, however >2 recurrences would be problematic. This also assumes that we only care about what happened in that past year window, as this wouldn’t pick up a quicker rate >1 year prior.

Thank you in advance for your time and thoughts.

Christian_Reich · February 21, 2019, 3:58am

I would not recommend using this. The Observation table is also called “garbage can” sometimes. Because it contains anything somebody might have captured. The problem is that there is no convention that causes ETL developers to write such a record. It’s silly anyway: when do you have what number of tumors. And how would you determine that anyway.

All these are cancer attributes we are currently trying to define in the Oncology Working Group. Please come and help define these, so you can reliably use them.

Same thing.

dramacloak · February 22, 2019, 8:23pm

Thanks. Any idea when the Oncology working group will be active again? I’d love to contribute to this group.

SCYou · February 25, 2019, 11:03pm

Sorry, @dramacloak I don’t know how to calculate EORTC bladder risk score by using existing Feature Extraction package and PLP package.

I’d like to recommend you to use custom covariate builder in the feature extraction package to build EORTC bladder cancer risk score as described here.

Or, you can add EORTC risk score to the FeatureExtraction package, as I did to calculate the hospital frailty risk score in this commit