OHDSI Home | Forums | Wiki | Github

Per patient characterization stats from Atlas

Bringing this chat with @anthonysena into Forums:

Is it possible with Atlas Characterization to get per patient statistics? I’m interested in seeing, for instance, index year per patient (rather than aggregated for the cohort). When looking at the results schema’s cc_results table, we only get 1 row per index year and a count of patients, which is fine for cohort level statistics, but doesn’t provide the per patient values.

Ultimately, I’d like to get a bunch of stats per patient, such as risk scores (Charlson, DCSI, CHADS, etc), comorbidities, comedications, or anything else Characterization can do.

It seems like this isn’t currently possible within Atlas (just in FeatureExtraction) so I’m wondering if perhaps this idea has enough utility to be a new option in Atlas Characterization. Certainly, when running through PLE/PLP packages, these covariates are derived by those R packages via FeatureExtraction, but we’re finding we would like to have these per-patient statistics available even before we get to these packages.

Tagging @Chris_Knoll and @schuemie

1 Like

Well, I feel that characterization is by definition an aggregate function of the data, but that doesn’t mean if you have a need for something it can’t happen just because it’s not ‘characterization’. I feel the concern with patient-level exposure of data has been something we’ve avoided for most parts of the application due to PHI concerns (the notable exception to this is patient profiles and samples, so this may have been softened over the years).

Am I understanding correctly that you’re looking more for ‘population features’ (ala feature extraction) where you get a row per person and a column per variable?

Ah okay, yep, keeping PHI in Atlas to a minimum is certainly reasonable. But if it’s in the SQL tables and not the app UI, does that help reduce that concern?

You’ve got it though, population features is what I’m looking for. We can do this in FeatureExtraction, but as FE is already in CC to some degree, wondering if we could have CC generate both the population features and the aggregated stats. We’re still an immature R shop, so having this be generated via Atlas and then available in the results tables would be useful. But I’m not sure if we’re alone in this or if there’s other sites that would also be interested in this. Also what I’m trying to understand, are there performance / architecture consequences to adding such functionality?

Just off the top of my head:

There is an optimization in FE where you either get the features or get the ‘characteristics’ (aka: aggregated values). There could be a modification to FE execution in WebAPI where we generate the features and then generate the aggregates but it would be effectively be executing the same FE twice.

It might have been nice if FE did only the job of creating features, and then some other ‘characterization actor’ would build characterization from features, but the reason (I think) it was done all in one place was a matter of optimization: why create a billion records for features when you can aggregate those down to a few hundred thousand records. Doing this ‘on the fly’ means you don’t eat up resources creating temporary tables of features when you can just scan the raw tables and aggregate on the fly.

Even if there’s not advanced expertise in R in your org, I do feel like FE should be allowing people to do this function without too much difficulty in programming. Of course, a web-baed UI to do it for you is always nice. I wonder if this should be a separate function of cohort generation (like samples) where you can get the features of the population and keep characterization as it’s own specific function.

Disclaimer: this is just ramblings off the top of my head and probably after a day of mulling I may have a different perspective on what I just wrote :slight_smile:

1 Like

Thanks @Chris_Knoll! These ramblings are coherent to me :slight_smile:

I’m going to see if we can leverage FE in a pipeline first, before trying to add more to Atlas.

t