OHDSI Home | Forums | Wiki | Github

Race and Ethnicity in the OMOP CDM

You illustrated out a use case of combining US and OUS data on race and ethnicities.
+1 from me.

@Christian_Reich I think the current standard had problems with the domain (Person vs Observation) definition (relationship to race) and granularity (currently limited to Hispanic / non-Hispanic) that prohibit investigation of the social determinants of health when data holders have better more granular data… Those are the dimensions of the problem space that the current standard is insufficient for…

Investigation of the health impacts of self identified ethnicity requires a more granular representation. @AsiyahFDA thanks for looking at the EO and assessing it’s relevance and readiness for use. Do you think it is worth trying to build from it?

@Andrew Regarding to extend EO, please see my assessment above: the EO is aligned to UK use. For an ontology, without providing textual definition, people may use in different ways, thus introduce heterogenous.
If OHDSI wants to extend ethnic groups, given the international participant of OHDSI, it maybe worthy to look at SNOMED to see if anything is available for international use to start with.


Looks like @AsiyahFDA’s link is broken, but in Athena you can find them here. Not sure there is a useful hierarchy, but happy to switch over from OMB if folks find it more appropriate to what they are doing.


There are actually a whole ton of ethnicity-related ontologies listed in the bioportal: Search | NCBO BioPortal. Why don’t you check them out and tell us which one you like, and we have a path forward. @esholle, @Andrew, @andrea, @AsiyahFDA, @linikujp, @y7g2p, @roger.carlson, @SCYou, @Doc_Ed, @Vimala_Jacob? Anybody up for doing the homework so we can rest this subject (which will otherwise keep playing wackamole every 6 months):


how about this one?


like @linikujp said, It maybe possible to just add my subgroups here

@Andrea, @Christian_Reich,
I didn’t look through all the terms, but a lot of them do not have children terms for “ethnic group”. However, I found a big list of Chinese ethic terms under NCIT’s “ethnic group”. Please check all the children terms under Ethnic Group - National Cancer Institute Thesaurus (NCIT)

The best is to bring this to the vocabulary.

And I think the race and ethnic group terms arranged in NCTI needs us to take a good look. Maybe reusable.

I have checked all the items.it has 56 ethnic groups and each of them are my required.thanks @linikujp


Thank you for the discussion on this topic. We in CDM working group would like to use this forum post to come to a decision, if possible, as to the best way to represent this data in the CDM. Some potential solutions that have been proposed:

  • Keep race in PERSON, move ethnicity to OBSERVATION
    • observation_concept_id is “ethnicity of person”, value_as_concept_id is actual ethnicity
  • Keep both in PERSON but update the ontology in the vocabulary - I believe the link sent out by @AsiyahFDA showing the SNOMED ethnicity is already in the vocabulary but the NCI ethnicities need to be added
  • Keep either race or ethnicity and remove the other to reduce confusion

Anything I missed?

@clairblacketer you may consider to add the NCIT resource in the conversation above.

Hi @linikujp you’re right, we would need to add these. I’ll add that to my list.

Hi, has there been any solution to this issue? In addition to the listed ontologies shared by @Christian_Reich, there are other consortiums working on this very same issue. For example the ClinGen consortium has a specific working group Ancestry and Diversity - ClinGen | Clinical Genome Resource I am wondering if OHDSI should start one group devoted only to this issue. I think we should consider not only race and ethnicity, but also ancestry, religion, nationality, etc.

Is there a reason that race and ethnicity are in the person table and not in the observation table? I ask because: 1/ these are usually patient reported and patient may be asked on admission even if they have a record already. 2/ increasingly, people are more and more multi-racial and multi-ethnic which is obvious to anyone who has ever seen a 23 and me result. 3/ we see that a number of patients will change their ethnicity at different visits. Still exploring why that is but the going hypothesis is that multi-ethnic patients will use whatever they see as the most beneficial for them for that visit. If they get admitted through emergency, minorities might not want to be slowed by biased triage and decalre themselves as white, but when admitted to a ward directly they may choose to decalre their minority group to get more financial aid. 4/ similar but insurance fraud when the patient has no legitimate claim to the ethnicity they are stating.

If these were observations I would feel more cofortable using non-standard concepts.

Would it violate any CDM rules to leave the ethnicity and race as 0 and create observations? Instead of 0 we could also put the first observed ethnicity and race in the person table and still add the observations.

We have a client data where there are more than 1 race for patients. What we did is to put one of values into Person table and the rest race values into Observation table. This does not violate OMOP rules as there are standard observation concepts for race and ethnicity. I am listing them below:

  • 4013886 Race
  • 44803968 Ethnicity

These concepts are loaded into observation_concept_id column and the actual race / ethnicity values are put into value_as_concept_id fields in Observation table.

1 Like

@QI_omop: We need to ratify this, by the way. We need to tell @clairblacketer.

The whole thing violates the OMOP idea: Creating a standard that everybody adheres to, so that data no longer need the context in which they were generated to be correctly analyzed. This standard should be objectively defined. Race and ethnicity however are not objectively definable. Worse, they are self-assigned, which means they are not even defined within a data asset.

So, I think the solution is what you guys laid out: Standard simple self-assignment in PERSON, and details (3/7th of an Inuit) into the OBSERVATION table. We do a similar distinction between crude and detailed with Location.

Has this been established? Where can we go to learn current standard?

We have just begun to OMOP our data at my institution and the race ethnicity problem immediately confounded us. We have mapped our data (we are in the US) to the CDC standard codes. Are the CDC codes represented in the OMOP standard vocabulary? If this is not the forum for these questions please direct me to the appropriate place.

Thank you!

There isn’t an established convention for adding more than one race or ethnicity to the CDM.

I will mark this thread as a Themis issue and create an issue in the Themis GitHub. Stay tuned!

Edit to add link to GitHub issue.

We have a request at our institution to bring in the patients ethnic background variable (from the clarity zc_ethnic_bkgrnd table). While the concepts mostly line up with those described here. But, as described in the posts above - we too have some patients with >2 ethnicities. Should we just pick one race for the person table, and add the others to the observation table?
There is no resolution on the Themis github issue?

Also going back to this conversation as there is an interest in adding races (such as in this post). Apologies if these questions have been resolved, but:

  1. Do we have a consensus on how to store multiple races?
  2. Do we have a consensus on whether races within the context of a country should be represented as different entities or as one entity? Such as White - US, White - British, White - Australian and so on?

I’m specifically interested in the latter question (Vocabularies perspective). Couldn’t find a convention but may be missing something.

There is an open Themis issue here, but it lacks a sponsor . As an open source community, we rely on community members to contribute and drive the evolvement of the standards, methods and research. The Themis process is found on the Themis GitHub home page here. Who would like to sponsor this topic?