We want to study the usage patterns of Concepts across different OMOP CDM instances. This in itself could be useful information to answer many questions, but we have a concrete reason: For any one medical entity, the granularity of codes captured in a data source can vary greatly. For example, Chronic Kidney Disorder stage II can be coded as ICD9 code 585.2 Chronic kidney disease, Stage II (mild); 585.9 Chronic kidney disease, unspecified or even as 586 Renal failure, unspecified. However, this information is key for any cohort definition. Currently, researchers have no way of knowing whether a certain concept with high granularity is even available for selection, or whether they have to use a generic concept in combination with some auxiliary information to define the cohort correctly. Each data source instance is a black box and knowledge about the distribution of the concepts is limited to the very instance researchers have access to. But OHDSI Network Studies are dependent on cohort definitions that work across the network.
In an ideal world, a cohort definition tool like ATLAS would have access to the distribution of all concepts in the community. We would like to make that a reality and collect counts for all:
Unique values in the *_concept_id fields
Unique values in the *_source concept_id fields
Mappings between them
As a side effect, we would also get a better understanding of the dynamics of that distribution over time, and we could draw conclusions about the impact of erroneous mappings.
It’s such wonderful news that you all decided to participate! I’ve almost lost hope
Timeline: the study itself doesn’t require a lot of preliminary work, so I’d say we expect to get some results in a couple of months.
R package can be found here; happy to answer your questions if any.
Hi @aostropolets, it looks like the SQL overlaps a lot with classic Achilles analyses and some new ones added by @AnthonyMolinaro. Specifically, characterizing concept ids (classic) and characterizing source concept ids (new ones).
Not suggesting you should change anything here, but there could be an opportunity to leverage Achilles results for the deliverable.
Thanks, Ajit! I know that Achilles produces a lot of useful results, but figured out that it may be more convenient just to run the package that readily spits out the tables Do you feel Achilles is more convenient?
Great to hear! I added my email to the readme file; will also duplicate it here: ao2671@cumc.columbia.edu
Please feel free to send the results to this email once they are ready. @MPhilofsky, @Hokyun, @rkboyce, @Andrew, @krfeeney, @cukarthik, @AnthonyMolinaro thanks a lot for your interest!
I’m happy to see that we’ve been getting the first results! Thanks to @Hokyun and @mattspotnitz for pioneering
As you all know, the deadline for the OHDSI Symposium submission is in three weeks, so we are planning to submit some preliminary results. So, it would be great if we can get more data next week to prepare the abstract and finalize the project over the summer
Many thanks to all participants who generously supplied their data to our study. During our study-a-thon we used the aggregated frequencies across all databases to create comprehensive concept sets for our phenotypes (can check out the record counts and descendant record counts here). It appeared to be very handy and allowed us to capture all important concepts (especially those that we didn’t think about in the beginning).
As a need for COVID studies emerged, some of data partners (including us, Columbia) have been updating their datasets to capture new information. I’m asking all data partners who re-run their ETL and those who have COVID information to kindly submit their data to the Concept Prevalence study. Looking forward to see how these results will inform our COVID studies and help patients across the world to battle this disease!
Tagging @krfeeney, @Andrew, @Evan_Minty, @Frank , @mattspotnitz, @TengLiaw, @SCYou, @Rijnbeek, @edburn and everybody else interested!
@aostropolets Again, I really appreciate you for leading this invaluable project in OHDSI.
My two cents are:
Could you extend this to the ‘source concept ids’, too? Recently, KCD (Korean ICD) and EDI (Korean CPT or RxNorm) have been added to OMOP vocabulary. It would be really interesting we can compare the prevalence of source concept IDs, too.
I know am late but I recently came across this study/package during Study-a-thon and would like to try it at our end where we have a T2DM cohort of 5K patients. Though I had an issue while installing the package which is posted in github here
Should we just run it at our site and send you the results? Will our cohort results be useful to you? We don’t have covid data though