OHDSI Home | Forums | Wiki | Github

Adding value_source_concept_id field to OMOP CDM

Dear OHDSI Community,

Over time, we’ve encountered challenges with data organized as entity-attribute-value (EAV) records or surveys. Currently, we can only fully access survey/EAV data concepts in the CDM in ‘standard’ fields (observation_concept_id, value_as_concept_id). This is the reason for keeping thousands of unnecessarily standard concepts in OMOP vocabularies, such as UK Biobank concepts.

Besides, if we look into CDM specification, there are _source_concept_id fields for every event field (procedure_source_concept_id, condition_source_concept_id, etc.), but the field for value_concept_id is missing.

Given those two reasons, @Alexdavv and @zhuk presented a new solution to the community, which was discussed with @clairblacketer during the CDM working group call and accepted in the developing version of CDM. We created a pull request, and now it’s time to revisit this topic on the forum to get more eyes on it and gather broader input.

These issues were described multiple times: here,

  1. Complexity of EAV Data Conversion
  2. Spread and Arbitrariness of Data
  3. Heterogeneous Survey Vocabularies
  4. Violation of vocabulary principles
  5. Need for Referring to Source Concepts

Our proposed solution is to add the _value_source_concept_id field to the OMOP CDM, which will allow you to use concepts that are indicated as a value in the source but are present in OMOP vocabularies.

Proposal example for Observation table (the same applies to the Measurement table):

Field in the Observation table How they should be populated
observation_source_value Including yourself, who in your family has had asthma? Select all that apply.
value_source_value Daughter
observation_source_concept_id 836815 Including yourself, who in your family has had asthma? Select all that apply.
value_source_concept_id 43528434 Daughter
observation_concept_id 4054433 Family history with explicit context pertaining to daughter
value_as_concept_id 317009 Asthma

Additionally, we suggest updating conventions:

  1. Question-answer/variable-value pairs will be presented in the source_concept_id and source_value_concept_id fields.
  2. All types of values will be stored exclusively in the value_source_value field, with the source_value_concept_id field populated when the value is a concept.
  3. In the case when the source question-answer (Observation domain) is mapped to the Condition/Procedure domain concepts, since the condition_ and procedure_occurence tables are missing the value_source_concept_id field, create 2 separate records: one in the Observation table to represent the source, and another one in the Condition/Procedure table to represent the OMOP standard (mapping), link them between each other using the fact_relationship table or observation_event_id/obs_event_field_concept_id fields.

We’d love to hear your thoughts and feedback on these proposals.
Tagging the working group and people involved in discussions before:
@aostropolets @Chris_Knoll @Christian_Reich @clairblacketer @cmkerr @ColinOrr @Daniel_Prieto @Dave.Barman @Dymshyts @ellayoung @ericaVoss @gregk @Josh_R @kyriakosschwarz @lee_evans @linikujp @MaximMoinat @mcantor2 @mik @mmandal @MPhilofsky @Andy_Kanter

Tetiana on behalf of the vocabulary team.


I think adding this single new column to 2 tables (Observation and Measurement) makes sense to me.

The idea of preserving the source data is part of the CDM philosophy.
And we are inconsistent and it has value as outlined in the links above.

Btw, there is a precedence. This is not the first time we realized that something is missing in CDM.
We have done that when we added missing column of unit_source_concept_id

l1 [measurement table] missing unit_source_concept_id · Issue #259 · OHDSI/CommonDataModel · GitHub
l2 Adding value_source_value in observation table · Issue #193 · OHDSI/CommonDataModel · GitHub

key slide from one link

Dear Community, we updated the Pull request. Previous one has been closed due to the desynchronization of the branches.
Please take a look into changes here


Honestly, I don’t like this mapping. Yes, it is semantically correct. However, it breaks the rules and requirements of the CDM.

The *_source_value represents the data as it is displayed in the source system. Your example = Daughter

The *_source_concept_id = the concept_id for the source code. Your example = Daughter (in English)

The *_concept_id = the standard concept_id mapped from the *_source_concept_id. Your example = Asthma. I understand why it is “Asthma”, but, per the rules & requirements of the CDM this should be “Daughter”.

Also, a minor typo unless you are adding another field to the table, your example has ‘value_concept_id’. Shouldn’t this be ‘value_as_concept_id’?