OHDSI Home | Forums | Wiki | Github

Conventions of Race and Ethnicity

Dear all,

In the person table, both race and ethnicity are mandatory fields. There are now standard mappings to the 2001 UK census ethnicity values which helps greatly (from a UK perspective, ethnicity is actually “race” in the CDM).
What is the convention for “race” for non US implementations of the CDM. We don’t have a notion of “hispanic or non-hispanic”.

Many thanks in advance,



Couple things:

  • Yes, they are mandatory, but if you don’t have the information set RACE_CONCEPT_ID to 0.
  • Use Concepts where DOMAIN_ID=‘Race’ only. They are not SNOMED or Read, but based on CDC Race codes.
  • The census codes in Read and SNOMED need to be mapped to these. This mapping doesn’t exist today, so you would have to do that yourself.
  • The Domain assignment of the SNOMED Race concepts needs to be fixed (will do that).
  • Don’t use ETHNICITY_CONCEPT_ID outside the US and maybe Canada. Set it to 0.

The subject of Race and how it is correctly addressed keeps coming up, despite the lack of actual uses cases or studies. The best solution is probably simple and pragmatic.

Thanks for this extremely informative response Christian

@Christian_Reich further to your points above. Am right in thinking that the following 2 assumptions should be followed in OHDSI:

  1. everything must be mapped to a “standard” code
  2. race should/must be mapped to a concept that falls in the “race” domain.
    At present there are only 50 codes that are standard race codes (the ones from the CDC). Unfortunately, it isn’t possible to create a useful mapping to these in the UK. Most would have to be mapped to “European”, which wouldn’t be granular enough for our needs. I note the full CDC source (many of which are non-standard) would probably suffice. Although it might be of interest to know some of the CDC terms are a little culturally insensitive for us here in the UK.

If you were to change the SNOMED UK census 2011 codes from “condition” to “race”, I assume they would still not be considered “standard”? If so, and the CDC codes remain standard, how best do you think we should proceed for a UK OHDSI instance? I suspect it would be best to map to standard codes where we can, and use 0 where a direct mapping isn’t possible, and then rely on race_source_value for analytical needs?

Thank you in advance for you help as I get my hands dirty with OHDSI here in the UK!

  1. Yes.
  2. Yes. The problem is we only extended to the 2nd hierarchical level. We could go deeper and then you would have all your Irish, Scottish, Welsh and English you need: https://www.cdc.gov/nchs/data/dvs/Race_Ethnicity_CodeSet.pdf. Apart from wanting to preserve the data given to you - do you have a use case why you need to have those ethnicities?

The English/Scottish/Welsh/Irish divide isn’t particularly important to us. The UK has fairly diverse ethnic makeup with major groups historically from, amongst other areas, the Caribbean and the Indian Subcontinent (around 10% of our total population). I suppose part of my reasoning is with just wanting to maintain an accurate database (and perhaps I need to be more pragmatic here). However, there are good epidemiological reasons for wanting to maintain some granularity that is relevant to the region (health outcomes are thought to be worse in these groups).

There’s a lot of open Public Health and Health Outcome data available in the UK which is aligned to socio-demographic data (e.g. Public Health England’s fingertips profiles) so I fully undersand @Doc_Ed’s point around wanting to maintain accuracy with codes such as ethnicity and race.

Following this thread as I’m really interested in this subject :slight_smile:

Let’s continue here.