OHDSI Home | Forums | Wiki | Github

Network study: Concept Prevalence

We want to announce a new network study: https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ConceptPrevalence

The full protocol can be found here: https://github.com/OHDSI/StudyProtocolSandbox/blob/master/ConceptPrevalence/extras/ConceptPrevalenceStudyProtocol_v0.1.docx

We want to study the usage patterns of Concepts across different OMOP CDM instances. This in itself could be useful information to answer many questions, but we have a concrete reason: For any one medical entity, the granularity of codes captured in a data source can vary greatly. For example, Chronic Kidney Disorder stage II can be coded as ICD9 code 585.2 Chronic kidney disease, Stage II (mild); 585.9 Chronic kidney disease, unspecified or even as 586 Renal failure, unspecified. However, this information is key for any cohort definition. Currently, researchers have no way of knowing whether a certain concept with high granularity is even available for selection, or whether they have to use a generic concept in combination with some auxiliary information to define the cohort correctly. Each data source instance is a black box and knowledge about the distribution of the concepts is limited to the very instance researchers have access to. But OHDSI Network Studies are dependent on cohort definitions that work across the network.

In an ideal world, a cohort definition tool like ATLAS would have access to the distribution of all concepts in the community. We would like to make that a reality and collect counts for all:

  • Unique values in the *_concept_id fields

  • Unique values in the *_source concept_id fields

  • Mappings between them

As a side effect, we would also get a better understanding of the dynamics of that distribution over time, and we could draw conclusions about the impact of erroneous mappings.

I’d welcome everybody to participate; tagging some of the people that might be interested: @SCYou @Christian_Reich @Patrick_Ryan @jduke @Rijnbeek @rkboyce @Daniella_Meeker


Hi @aostropolets!

Colorado would like to participate! What’s the timeline?

Hi, I’m Ho-kyun working with SCYOU.
We would like to participate in the study too. :grinning:

Hi, I would be delighted to try and participate! Great idea.

Tufts would like to participate.

Hi @aostropolets! The IQVIA team would be happy to participate too. Is there a study package we could review?

1 Like

Count us in at Columbia! Then again my post is redundant :slight_smile:

It’s such wonderful news that you all decided to participate! I’ve almost lost hope :slight_smile:
Timeline: the study itself doesn’t require a lot of preliminary work, so I’d say we expect to get some results in a couple of months.
R package can be found here; happy to answer your questions if any.

Hi @aostropolets, it looks like the SQL overlaps a lot with classic Achilles analyses and some new ones added by @AnthonyMolinaro. Specifically, characterizing concept ids (classic) and characterizing source concept ids (new ones).

Not suggesting you should change anything here, but there could be an opportunity to leverage Achilles results for the deliverable.

Hey @aostropolets, @Frank and I will work on this as well. :slight_smile:

Thanks, Ajit! I know that Achilles produces a lot of useful results, but figured out that it may be more convenient just to run the package that readily spits out the tables :slight_smile: Do you feel Achilles is more convenient?

Great to hear! I added my email to the readme file; will also duplicate it here: ao2671@cumc.columbia.edu
Please feel free to send the results to this email once they are ready.
@MPhilofsky, @Hokyun, @rkboyce, @Andrew, @krfeeney, @cukarthik, @AnthonyMolinaro thanks a lot for your interest!

I’m happy to see that we’ve been getting the first results! Thanks to @Hokyun and @mattspotnitz for pioneering :slight_smile:
As you all know, the deadline for the OHDSI Symposium submission is in three weeks, so we are planning to submit some preliminary results. So, it would be great if we can get more data next week to prepare the abstract and finalize the project over the summer :slight_smile:

1 Like

Many thanks to all participants who generously supplied their data to our study. During our study-a-thon we used the aggregated frequencies across all databases to create comprehensive concept sets for our phenotypes (can check out the record counts and descendant record counts here). It appeared to be very handy and allowed us to capture all important concepts (especially those that we didn’t think about in the beginning).
As a need for COVID studies emerged, some of data partners (including us, Columbia) have been updating their datasets to capture new information. I’m asking all data partners who re-run their ETL and those who have COVID information to kindly submit their data to the Concept Prevalence study. Looking forward to see how these results will inform our COVID studies and help patients across the world to battle this disease!
Tagging @krfeeney, @Andrew, @Evan_Minty, @Frank , @mattspotnitz, @TengLiaw, @SCYou, @Rijnbeek, @edburn and everybody else interested!


Tagging @mgkahn and @ufuoma :slight_smile:


@aostropolets Again, I really appreciate you for leading this invaluable project in OHDSI.

My two cents are:
Could you extend this to the ‘source concept ids’, too? Recently, KCD (Korean ICD) and EDI (Korean CPT or RxNorm) have been added to OMOP vocabulary. It would be really interesting we can compare the prevalence of source concept IDs, too.

Hi @aostropolets

I know am late but I recently came across this study/package during Study-a-thon and would like to try it at our end where we have a T2DM cohort of 5K patients. Though I had an issue while installing the package which is posted in github here

Should we just run it at our site and send you the results? Will our cohort results be useful to you? We don’t have covid data though

You have it! We gather source concept ids, s o they can be used in an analysis. So you have anything specific in mind?

Yes, if you could run the package and send over the results, it would be very much appreciated!
Do you have any issues with the package now?

Hi @aostropolets,

Yes, when I try to install the package I get the below error which is also posted in github

Downloading GitHub repo OHDSI/StudyProtocolSandbox@master
"C:\PROGRA~1\Git\cmd\git.exe" clone --depth 1 --no-hardlinks --recurse-submodules https://github.com/NEONKID/RCDM-ETL C:\Users\test\AppData\Local\Temp\RtmpoDDE3h\remotes5fe862d7185f/OHDSI-StudyProtocolSandbox-536420f/ConceptPrevalence/../RCDM-ETL
"C:\PROGRA~1\Git\cmd\git.exe" clone --depth 1 --no-hardlinks --recurse-submodules https://github.com/aostropolets/ConceptPrevalence.git C:\Users\test\AppData\Local\Temp\RtmpoDDE3h\remotes5fe862d7185f/OHDSI-StudyProtocolSandbox-536420f/ConceptPrevalence/../ConceptPrevalence
2020-04-05 17:29:57	running command '"C:\PROGRA~1\Git\cmd\git.exe" clone --depth 1 --no-hardlinks --recurse-submodules https://github.com/aostropolets/ConceptPrevalence.git C:\Users\test\AppData\Local\Temp\RtmpoDDE3h\remotes5fe862d7185f/OHDSI-StudyProtocolSandbox-536420f/ConceptPrevalence/../ConceptPrevalence' had status 128
Error: Failed to install 'ConceptPrevalence' from GitHub:
  Command failed (128)