Dear OHDSI Community,
Over time, we’ve encountered challenges with data organized as entity-attribute-value (EAV) records or surveys. Currently, we can only fully access survey/EAV data concepts in the CDM in ‘standard’ fields (observation_concept_id, value_as_concept_id). This is the reason for keeping thousands of unnecessarily standard concepts in OMOP vocabularies, such as UK Biobank concepts.
Besides, if we look into CDM specification, there are _source_concept_id fields for every event field (procedure_source_concept_id, condition_source_concept_id, etc.), but the field for value_concept_id is missing.
Given those two reasons, @Alexdavv and @zhuk presented a new solution to the community, which was discussed with @clairblacketer during the CDM working group call and accepted in the developing version of CDM. We created a pull request, and now it’s time to revisit this topic on the forum to get more eyes on it and gather broader input.
These issues were described multiple times: here,
- Complexity of EAV Data Conversion
- Spread and Arbitrariness of Data
- Heterogeneous Survey Vocabularies
- Violation of vocabulary principles
- Need for Referring to Source Concepts
Our proposed solution is to add the _value_source_concept_id field to the OMOP CDM, which will allow you to use concepts that are indicated as a value in the source but are present in OMOP vocabularies.
Proposal example for Observation table (the same applies to the Measurement table):
Field in the Observation table | How they should be populated |
---|---|
observation_source_value | Including yourself, who in your family has had asthma? Select all that apply. |
value_source_value | Daughter |
observation_source_concept_id | 836815 Including yourself, who in your family has had asthma? Select all that apply. |
value_source_concept_id | 43528434 Daughter |
observation_concept_id | 4054433 Family history with explicit context pertaining to daughter |
value_as_concept_id | 317009 Asthma |
Additionally, we suggest updating conventions:
- Question-answer/variable-value pairs will be presented in the source_concept_id and source_value_concept_id fields.
- All types of values will be stored exclusively in the value_source_value field, with the source_value_concept_id field populated when the value is a concept.
- In the case when the source question-answer (Observation domain) is mapped to the Condition/Procedure domain concepts, since the condition_ and procedure_occurence tables are missing the value_source_concept_id field, create 2 separate records: one in the Observation table to represent the source, and another one in the Condition/Procedure table to represent the OMOP standard (mapping), link them between each other using the fact_relationship table or observation_event_id/obs_event_field_concept_id fields.
We’d love to hear your thoughts and feedback on these proposals.
Tagging the working group and people involved in discussions before:
@aostropolets @Chris_Knoll @Christian_Reich @clairblacketer @cmkerr @ColinOrr @Daniel_Prieto @Dave.Barman @Dymshyts @ellayoung @ericaVoss @gregk @Josh_R @kyriakosschwarz @lee_evans @linikujp @MaximMoinat @mcantor2 @mik @mmandal @MPhilofsky @Andy_Kanter
Tetiana on behalf of the vocabulary team.