OHDSI Home | Forums | Wiki | Github

Clarification requested: source concept id fields

Hello,

I’m trying to understand the intention of the source concept id field.

Is the source concept id designed to capture just the concept in the source data, or is it designed to capture both the clinical concept and terminology used natively within the source system to encode the data?

Being super concrete, if col 1-3 of the table below depicts the source data (encoded using organization specific codes), which codes are appropriate for the source concept id: col 4 - b/c just the source concept matters or col 5 - because we are making a claim about how the data is natively encoded (or not) in the source system:

Thanks in advance for your insight.

The source concept_id was added to appease those that wanted to be able to query records using the native vocabulary such as ICD10CM, NDC, CPT4… Sometimes for consistency sake, a rule that might make sense for conditions or procedures get applied to all tables. I would set the gender_source_concept_id equal to 0 and get on with more important ETL items.

But if you want a rule of thumb, the source_concept_id should be an appropriate concept where the concept_code is equal to the value in the source tables. So if your source tables used 1, 2, 3 for Female, Male and Unknown then the gender_source_concept_id should be zero because you will not find gender concepts where the concept_code IN (1, 2, 3). Now if source data has F, M, U you can make an argument to set the source_concept_id to concept ids other than zero because

select *
from concept
where concept_code IN( ‘F’, ‘M’, ‘U’)
and vocabulary_id = ‘Gender’
AND invalid_reason IS NULL;

Will return values for F and M. Note that the gender concept for ‘U’ is deprecated and should not be used.

Thank you @DTorok. That helps.

I see that I erroneously entered the SCT concept id rather than the omop concept id for line item nonbinary.

t