While completing a comparison between cohort #1 pulled directly from our source using ICD9CM/ICD10CM codes and cohort #2 from the CDM using standard concept_ids, we came across a large difference in the number of Persons between the two cohorts.
Since we were replicating a study from the EHR, we used the standard concept_ids for the ICD9CM/ICD10CM codes. One of the inclusion criteria is the Person must not be pregnant. The list of codes to use to define a pregnancy Condition includes ‘O29.021’,
Pressure collapse of lung due to anesthesia during pregnancy, first trimester. This code maps to 3 standard concept_ids including Atelectasis, which is not a disease found only in pregnancy. By including it in our inclusion criteria, Persons not pregnant, the cohort’s definition was changed. And we eliminated many Persons from cohort #2 by the addition of the Atelectasis concept_id.
When a researcher wants to use ICD9CM/ICD10CM to define their concept set, should we query the table/domain of the standard concept_id, but pull data based on the source_concept_id? What do others do? Are there any other source vocabularies which map a source code to > 1 standard concept_id and changes the definition of the cohort?
For Atlas use, I will direct researchers to define ICD9CM/ICD10CM as source concept_ids and not translate to standard concept_ids when doing non-network research.