PLP: Restricting features that co-occur in the same encounter as the outcome

Evan_Minty · February 8, 2018, 4:12pm

Hi,

Many outcomes of interest in prediction might occur in the context of an inpatient encounter. Within the current ATLAS framework, is there a way to only use features that occur up to (but not including) the visit where the outcome of interest occurs? If not, I think this would be a useful configuration to add to the framework.

The issue is that (probably due to erroneous / insufficiently detailed timestamps), other procedures, diagnoses, etc. that co-occur in the same inpatient encounter with the outcome of interest can become predictors in the model, invariably some of these are directly connected to the occurrence of the outcome and distort the model.

For example, in the PLP tutorial in Oct, we looked at trying to predict ICU admission for pneumonia, and found that procedures (e.g. mechanical intubation) that describe the process of bringing someone to the ICU rise to the top as highly predictive features (this validates the great work that has been done in building the model framework - but obviously poses a problem).

That use case is difficult because you’d need features from that inpatient presentation to predict a patient’s deterioration - I don’t see a way around this until data timestamping improves.

In my current use case, I’m trying to create prediction models for a series of surgical outcomes. I’d like to use all features up to, but NOT including the inpatient encounter where the surgery occurs, as predictors to skirt this timestamp issue. I’ve tried various ways of implementing the target cohort (by trying to generate a cohort of preoperative patients for example), but ultimately when you configure the model and declare the outcome cohort of interest, I can’t think of a way of stopping features that co-occur in the same encounter as that outcome from being in the model (within the current ATLAS config options).

Any thoughts?

Rijnbeek · February 9, 2018, 8:47am

Hi Evan,

I think by default the FeatureExtraction Package is not including the covariates at the cohort start date.

See for example section 2.2 :

“ This redefines the long term window as 180 days prior up to (but not including) the cohort start date, and

redefines the short term window as 14 days prior up to (but not including) the cohort start date.”

Is your cohort start date indeed the visit?

Peter

Evan_Minty · February 9, 2018, 5:01pm

Thanks Peter,

Because I do want to have the index date as the cohort start date, I tried specifying the cohort definition that way - see public ATLAS here

(1):
http://www.ohdsi.org/web/atlas/#/cohortdefinition/1729525
n on STRIDE ~ 6600

I found I had to create my own concept set for inpatient visit, creating a visit criteria (which defaults to any Visit) and then specifying inpatient visit as a criteria attribute didn’t seem to work.

But compare (1) with (2):
http://www.ohdsi.org/web/atlas/#/cohortdefinition/1729527
n~52000.

So a big difference in sample size, the two are not equivalent. But then I presume the cohort index date in (2) is that of the procedure occurrence, not the inpatient visit as I’m looking for.

You raise another point though - and that’s the short term window. I had acutally assumed that if the long term window was set the medium and short term where redundant - i.e. that these are overlapping sets.

The way the params are organized suggests the same:
longTermDays = 365, mediumTermDays = 180,shortTermDays = 30,windowEndDays =0

i.e. they all have the same windowEndDays argument.

So currently I actually have set UseCovariateXXXXMedium and Short terms to false. I will dig into the documentation - but are these non overlapping sets?

Patrick (in an email thread) suggested I change windowEndDays to -1. Have done that, still get suspiciously good performance. I will try to manually impose a longer washout, but obviously the preference here is a cohort definition that takes the visit start date as the index date, from which I can subtract one day to ensure a clean set of predictors.

Evan_Minty · March 5, 2018, 6:47pm

So for the benefit of the community thought I would add here:

Part of the confusion was being generated by what I think is a bug in ATLAS. In copying cohorts, some ‘memory’ of the previous copy seems to get retained (occasionally?). i…e in the links I posted above, look at the ‘export’ tab (both text and, for e.g. json) relative to the definition. If this has not yet been reported, I’ll look to do so. Worth carefully examining the export tab.
In case others run into it - this is probably obvious for power users, but currently, ATLAS R code exports params for an earlier version of FeatureExtraction (to which the documentation on GitHub doesn’t apply). i.e. the params for shortTermDays etc. have changed, notably their sign has changed which explained the continued ‘too good’ performance when I used ‘-1’ in FeatureExtraction v1.2.3.

On the very bright side, FeatureExtraction 2.0 is an incredible piece of work, combining the params and SQL like it does.

Evan_Minty · March 14, 2018, 8:42pm

Kudos to @anthonysena and the ATLAS dev team - ATLAS 2.3 now exports params compatible with FeatureExtraction 2.0 .