OHDSI Home | Forums | Wiki | Github

Source data mapping - OMOP

(rashmi) #1

I am new to OMOP and have been working to map the source data to OMOP data model.
I have few questions on source data mapping.

  1. If the source data can be mapped to standard concepts look for mapping in concept relationship table if not then the steps that need to be followed is make an entry in concept table( >2000000000) and an entry as “maps to” standard concept in concept relationship table. the conceptid_1 field from concept relationship table goes to source concept id field and conceptid_2 into concept if field of clinical table. Is this understanding correct?
  2. If the source data cannot be mapped to standard vocabulary then do we make a new entry in the concept table or what goes in concept id field and source concept id field of the clinical table is my question.very confused on this
  3. Source data has multiple ethnicity values whereas OMOP has only hispanic and non-hispanic as standard concepts. for the other ethnicity values in source data like costa rican is there any mapping that is already done for such ethnicity to the hispanic and non hispanic. what is the general approach taken for such data?


(Chetan) #2

Can you elaborate more what you are trying to achieve? a few examples would help.

(Roger Carlson) #3

Well, I’m relatively new to this as well, but since it’s been a couple of days, I’ll take a swing at answering. If I’m wrong, please anyone, feel free to correct me.

  1. At present, I’m not modifying the Concept table at all. I’m not sure what value there is to do so. There are a lot of relationships built into the Standardized Vocabulary tables (concept, concept_relationship, concept_ancestor, etc.). Too many for me to feel comfortable mucking around in.

  2. If the concept cannot be mapped, there is another table called Source_to_concept_map, which allows you to do your own mapping. There is a downloadable tool called USAGI that can help with that.
    Using USAGI, you can map your codes to standard concept codes. Concept_id would be the mapped standard code, Source_concept_id would remain 0 (zero) because there is no source concept.

  3. Ethnicity is too vague and undefined world-wide to try an map with too much granularity. For now, Hispanic and Non-Hispanic are the most useful for research purposes. There’s no particular use (in a clinical research sense) to know that a patient is Costa Rican.

  • My source data has a race field, which may or may not include an ethnicity (ie. white, white-hispanic, white-nonhispanic, black, black-hispanic, etc…) We also have an ethnicity field which includes hispanic and non-hispanic, and others. I map all the “white” values (for instance) to the white race, ignoring the ethnicity portion of the Race field, and only map the ethnicity based on the Ethnicity field. (and of course, all the others as well). I only map to Hispanic if the patient is definitely identified as hispanic, and likewise map non-hispanic if they are identified as non-hispanic. Everyone else gets a 0 (zero) as unknown.

  • This is another use for USAGI and creating your own mapping in Source_to_concept_map.

Hope this helps.

(Melanie Philofsky) #4


@roger.carlson is spot on with his answers :slight_smile:

Here is a link to a forum discussion with different points of view for “why” some sites choose to modify the concept table. I’m not going to sugar coat the 2 billions, it’s a LOT of work.

(Roger Carlson) #5

@MPhilofsky: Thanks for the link. It is a wealth of information.

(rashmi) #6

Thanks @roger.carlson for the information.
For concepts that cannot be mapped, I think we need to refer to concept_relationship table as source to concept map is no longer recommended by OHDSI

(Roger Carlson) #7

From the current OMOP standard concerning the Source_to_concept_map table

The source to concept map table is a legacy data structure within the OMOP Common Data Model, recommended for use in ETL processes to maintain local source codes which are not available as Concepts in the Standardized Vocabularies…

…Convention Description

|1|This table is no longer used to distribute mapping information between source codes and Standard Concepts for the Standard Vocabularies. Instead, the CONCEPT_RELATIONSHIP table is used for this purpose, using the relationship_id=‘Maps to’.|
|2|However, this table can still be used for the translation of local source codes into Standard Concepts.|

(Christian Reich) #8


You can use the SCTM table, and currently there are no plans to scrap it. But we have found a disadvantage of its use: Atlas does not see it, and you can’t find codes burried in it. We therefore prefer to create concepts for local codes (or even free text artifacts) in the >2B range.