Generation of Continuous Measurement Features in FeatureExtraction

Mike_Van_Ness · March 14, 2023, 4:19am

Hi all, thanks in advance for the help.

I’m struggling to understand exactly how continuous features are generated in FeatureExtraction, specifically for measurement values. The createCovariateSettings function allows for creating covariates for measurements in short, medium, and long term ranges which can be specified, but never mentions how the measurement values are aggregated in these windows if there is more than 1 measurement value in the window. For example, we may set useMeasurementValueShortTerm = TRUE and shortTermStartDays = -14, but if there are multiple measurements in the last 14 days, what happens? Are these values averaged, or maybe the most recent value is taken?

I also do not understand how the continuous measurement values are generated for temporal covariates. For one, there could be the same issue of having multiple measurement in one time window in the past (e.g. between 14 and 7 days ago). Further, when I have tried generating such temporal covariates, I find that I get a lot of covariate values at the farther back point in and very few after this. For example, if I set: useMeasurementValue = TRUE, temporalStartDays = seq(-10, -1, by = 1), temporalEndDays = seq(-9, 0, by = 1), I get also only covariates generated at the 10 day ago timeId.

Please let me know if anything is not clear, and looking forward to hearing back

-Mike

mbrand · May 4, 2023, 6:23pm

Hi Mike,

I have the same question, just wondering if you found out more about this?

Thanks

david_vizcaya · May 19, 2023, 8:37pm

+1! Also interested if someone has any insights.
I think measurements are a tremendous Achilles heel for atlas and other tools, especially on how to handle multiple measures. I tried to build three mutually exclusive cohorts according to baseline (-365) UACR categories as defined in guidelines, but it is imposible to restrict the measurement to the last one before index date in ATLAS, so any patient with two or more measurement in different categories will be double or triple counted. I created a topic elsewhere, but got no reactions,

Thomas_White · June 1, 2023, 7:02pm

@david_vizcaya and @Mike_Van_Ness , we have a similar desire to include measurement values in our PLE and PLP work (e.g. as features for propensity score matching).

@Patrick_Ryan , I heard you mention on the final VEGF SOS Challenge call that OHDSI doesn’t currently support using measurement values in propensity matching. Is anyone actively working on an approach for this?

I’m interesting in prototyping this if no one else is already working on it. I presume it would be an addition to FeatureExtraction module? However, I’d need guidance on where best to put it, and naming standards. @Chris_Knoll , is that the right place to design such new features? If so, what is the right Community call to engage to discuss more details (e.g. the Atlas/WebAPI one or other?)

Methodologically, does the community have recommendations on which measurement value-related features to include (e.g. min, max, mean, median, P10, P25, P75, P90)? We’d also want earliest and latest values within the specified time window (e.g. to get first and last values). Any of those are amenable to standard SQL. What other measurement-value-related features is the community commonly including in their models? For example, are there preferred ways to look for trends (e.g. slope of measurement values over the course of the time window)?

benskov · June 2, 2023, 6:33am

I’d be very keen on partaking in this work. We’re about to build an ICU CDM, and this would be very useful for our work.

Chris_Knoll · June 2, 2023, 1:53pm

I believe creating features from measurement values is implemented. You can find the analysis sql script here.

Looking at the docs for createCovariateSettings(), the parameter to enable it is one of these:

  useMeasurementValueAnyTimePrior = FALSE,
  useMeasurementValueLongTerm = FALSE,
  useMeasurementValueMediumTerm = FALSE,
  useMeasurementValueShortTerm = FALSE,

You can follow the steps in the vignette, but just change the settings to this to see measurement features:

settings <- createCovariateSettings(useMeasurementValueAnyTimePrior = TRUE,
                                    useMeasurementValueLongTerm = TRUE,
                                    useMeasurementValueShortTerm = TRUE)

I believe this would create features from measurement values for use in a model…unless I’ve missed something. Is this what is being asked?

Chris_Knoll · June 2, 2023, 2:05pm

Apologies, I think I did miss it, you all have discussed the existing MeasurementValueLong/Medium/Short term options…

I did dig into this further, and it looks like only the latest measurement is being pulled into the result (see here). The way the covariateID is being defined, it’s a combinatition of measurement_cocnept_id + unit_concept_id and an analysis id (which defines the short/medium/long mode of the analysis (see here).

So is the ask to allow multiple measurements per person into the model? Or, as @Thomas_White described: allow putting some sort of quartile into the model? The challenge I see with it is that the model element must be uniquely identified, so if you have someone with a measurement value in (15,23,45) you’ll need a unique identifier for each of those 3 distinct values. Maybe for an age it makese sense because thre’s only so many distinct values, but for values that can have a broad range of values you’ll be creating a lot of variables. Maybe the idea of narrowing it down to a quartile within the population will reduce the number of variables needed to 4.

I’m speaking a little bit out of my lane on the statistics part and I’d ask @schuemie or @anthonysena to chime in, but I hope the references to the code helps digest what’s happening under the covers.