We want to announce a new network study: https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ConceptPrevalence
The full protocol can be found here: https://github.com/OHDSI/StudyProtocolSandbox/blob/master/ConceptPrevalence/extras/ConceptPrevalenceStudyProtocol_v0.1.docx
We want to study the usage patterns of Concepts across different OMOP CDM instances. This in itself could be useful information to answer many questions, but we have a concrete reason: For any one medical entity, the granularity of codes captured in a data source can vary greatly. For example, Chronic Kidney Disorder stage II can be coded as ICD9 code 585.2 Chronic kidney disease, Stage II (mild); 585.9 Chronic kidney disease, unspecified or even as 586 Renal failure, unspecified. However, this information is key for any cohort definition. Currently, researchers have no way of knowing whether a certain concept with high granularity is even available for selection, or whether they have to use a generic concept in combination with some auxiliary information to define the cohort correctly. Each data source instance is a black box and knowledge about the distribution of the concepts is limited to the very instance researchers have access to. But OHDSI Network Studies are dependent on cohort definitions that work across the network.
In an ideal world, a cohort definition tool like ATLAS would have access to the distribution of all concepts in the community. We would like to make that a reality and collect counts for all:
Unique values in the *_concept_id fields
Unique values in the *_source concept_id fields
Mappings between them
As a side effect, we would also get a better understanding of the dynamics of that distribution over time, and we could draw conclusions about the impact of erroneous mappings.