OHDSI Home | Forums | Wiki | Github

Mapping of NAACCR (Race) Meas Value concepts to OMOP Race standard concepts

I have been exploring the concept_relationship table to find mappings of the NAACCR Race1 values to standard OMOP Race concepts. For example:
OMOP Race: 8527 - White to NACCR Meas Value: 35911604 - White or 40198884 - White.

I assume I simply need a point to the relevant documentation on this topic.

Thank you in advance and Happy New Year,

p.s. @mgurley good to see your great work again.

There is no mapping to OHDSI concept ID 8527 “White” from any of the NAACCR concepts. The relevant documentation is Mapping Relationships. You can see all the mappings to the the standard race concept ‘White’ by looking at non standard to OHDSI standard concept for race, white.. Or starting with NACCR Meas Value, “White”, concept id 35911604, you can see there is no relationship to the OHDSI race vocabulary.

Thank you @DTorok for confirming that there is no mapping. I looked at the reference documentation but could not find any rationale as of why this particular mapping doesn’t exist. I am trying to find out whether I am “asking the wrong question”, i.e. there is a good reason why that mapping does not exist and I like to understand it. On first glance at least, the mapping looks to be rather straight forward. Given the tremendous work that went into the Oncology extension, including substantial additions to the vocabulary, I wonder whether the omission was deliberate.


The reason is, in part, that the NAACCR code for ‘White’ only refers to race when used in combination with NAACCR code Race 2. That is White is the Answer to Race 2. But it looks like the code for White has a bunch of ‘Value to Schema’ relationships with other NAACCR codes. So you cannot map the NAACCR code for White to OHDSI race white without understanding how the NAACCR code ‘White’ is being used.

Thank you. The field I am sourcing the CDM data from is NACCR Race 1 (‘White’). I assume the correct value is Athena. If true, would the following be correct for a PERSON table mapping?



We are going to publish soon the NAACCR mapping. But because the variable-value problem for the source (sometimes even variable-variable-value) the simple “Maps to” relationship does not work. We need the Wide Mapping table. Stay tuned, please.

Yes Race concept_id 8527 is the correct mapping for NAACCR Race (White).

This query will give the you possible NAACCR race value.
select c1.concept_id, c1.concept_code, c1.concept_name
, relationship_id
, c2.concept_id, c2.concept_code, c2.concept_name
FROM concept_relationship
JOIN concept c1 on c1.concept_id = concept_id_1
JOIN concept c2 on c2.concept_id = concept_id_2
WHERE concept_id_1 = 35917100
and relationship_id = ‘Has Answer’;

And this is the query to get Standard OHDSI concepts for race.
select *
from concept
where domain_id = ‘Race’ and standard_concept=‘S’ AND invalid_reason is NULL;

You will have to determine what is the best mapping from NAACCR to OHDSI.

@Christian_Reich Thank you. I figured a mapping would be forthcoming. I will work on a temporary solution and update once the official solution is available.

@DTorok Thank you. I got a very similar query with the difference of constraining to 35917103 since I am sourcing from NAACCR Race 1. According to Athena 35917100 is for Race 2.

Happy New Year to both of you! I truly appreciate your support over the Holidays. Amazing!

Here is a preliminary mapping for NAACCR Race Vocabulary I came up with. Maybe it can contribute to the effort @Christian_Reich hinted at (despite the flaws it invariably has). This is for ‘Has Answer’ NAACCR Race 1 (Athena).

NAACCR_Concept_ID NAACCR_Concept_Name Standard_Concept_ID Standard_Concept_Name
719371 American Indian, Aleutian, or Eskimo (includes all indigenous populations of the Western hemisphere) 8657 American Indian or Alaska Native
35943624 Asian Indian 38003574 Asian Indian
35941279 Asian Indian or Pakistani, NOS (code 09 prior to Version 12) 38003574 Asian Indian
719372 Black 38003598 Black
35940300 Chamorro/Chamoru 38003611 Micronesian
719376 Chinese 38003579 Chinese
35911258 Fiji Islander 38003610 Polynesian
719375 Filipino 38003581 Filipino
35940822 Guamanian, NOS 38003611 Micronesian
719374 Hawaiian 8557 Native Hawaiian or Other Pacific Islander
35941363 Hmong 38003582 Hmong
719370 Japanese 38003584 Japanese
35940912 Kampuchean (Cambodian) 38003578 Cambodian
719373 Korean 38003585 Korean
35940871 Laotian 38003586 Laotian
35940633 Melanesian, NOS 38003612 Melanesian
35941269 Micronesian, NOS 38003611 Micronesian
35941199 New Guinean 38003612 Melanesian
35940933 Other 0 Other
35941596 Other Asian, including Asian, NOS and Oriental, NOS 8515 Asian
35942018 Pacific Islander, NOS 38003613 Other Pacific Islander
35942949 Pakistani 38003589 Pakistani
35941057 Polynesian, NOS 38003610 Polynesian
35941155 Samoan 38003610 Polynesian
35940478 Tahitian 38003610 Polynesian
35941206 Thai 38003591 Thai
35941597 Tongan 38003610 Polynesian
35912439 Unknown 0 Unknown
35940990 Vietnamese 38003592 Vietnamese
719369 White 8527 White

Yes, that looks like a good list.

Except: We have to make the decision at the community level, but I am tending to abolish all the ethnic “races”. And leave the standard 5 ones only. Ethnicities are impossible to figure out, like in this case. Hierarchical relationships to races are even more ridiculous. As a consequence, studies with ethnicities will have to use source concepts, like NAACCR’s.

@Christian_Reich I agree and I am just trying to deal with the data reality at my disposal.

Unnecessary complexity should be avoided and granularity be limited to a level where values can be objectively determined with reasonable means. Ethnicity might just be an inferior proxy to cultural, socio-economic, genetic, etc. determinants of health. It will take a community with much richer experience than mine to come up with something useful.

I have done the same for ethnicity and gender. Maybe less flawed than the mapping for race.
I am posting it here in case it is useful for somebody else. Of course also to get help in case the mapping is bad. Thank you in advance.


naaccr_concept_id naaccr_concept_name standard_concept_id standard_concept_name
35914498 Unknown whether Spanish or not 0 Unknown
35940262 Dominican Republic 38003563 Hispanic or Latino
35940448 Mexican (includes Chicano) 38003563 Hispanic or Latino
35940906 Spanish, NOS 38003563 Hispanic or Latino
35941390 Other specified Spanish/Hispanic origin (includes European; excludes Dominican Republic) 38003563 Hispanic or Latino
35941643 Non-Spanish; non-Hispanic 38003564 Not Hispanic or Latino
35941814 Puerto Rican 38003563 Hispanic or Latino
35941961 Cuban 38003563 Hispanic or Latino
35941988 South or Central American (except Brazil) 38003563 Hispanic or Latino
35943612 Spanish surname only (Code 7 is ordinarily for central registry use only, hospital registrars may use code 7 if using a list of Hispanic surnames provided by their central registry; otherwise, code 9 ‘unknown whether Spanish or not’ should be used.) The 38003564 Not Hispanic or Latino


naaccr_concept_id naaccr_concept_name standard_concept_id standard_concept_name
35919260 Female 8532 FEMALE
35919299 Transsexual, NOS 0 Unknown
35919434 Not stated/Unknown 0 Unknown
35919502 Other (Hermaphrodite) 0 Unknown
35919663 Transsexual, natal male 8507 MALE
35919733 Transsexual, natal female 8532 FEMALE
35919842 Male 8507 MALE

Hang on a second. This is trickier. The ethnicities we have imported from the OMB are part of the race_concept_id. And second, ethnicity as in ethnicity_concept_id right now follows the US system, in which it just means Latino or not (because the Latinos have the same or similar race composition as the non-Latinos). So, being a US citizen or a Mexican citizen does not specify if you are Latino or not. However, either may have an ethnicity as part of the race_concept_id.

I know. We need to change that system. And soon. People don’t get it in the US, but they certainly don’t get it outside the US.

The sex concepts I think you got right.

Possible, but I am getting the feeling that there isn’t anything. It would have materialized by now. The race/ethnicity/socio-economic-genetic-cultural backgrounds are not precisely defined. They are wishy-washy concepts. You cannot unequivocally define them using criteria. Instead, they are self-defined, without clear criteria. As such, it is very difficult to use them for unbiased research.

Sure thing (race and ethnicity are highly subjective measures tainted by a host of motivations). It’s time we move on to more objective measure in the age of precision medicine.
Maybe I should not try and map them at all?

Hi @hannes , in the upcoming call of the Vocabulary Subgroup (separate meeting but to be found as part of the Common Data Model Workgroup) on Jan 18th, @Jake will present his findings around Race and Ethnicity and I expect to see a summary as well as maybe some new perspectives on this topic.
Cheers ~ Mik

Thank you. I will see whether I can join.

Is this the OMOP CDM Oncology WG – CDM/Vocabulary Subgroup Meeting? If not, could you please point me to the correct meeting information. Thank you.

Sorry, Hannes - missed that reply. Check out the Common Data Model Workgroup and find the Vocabulary Subgroup. If you have an account in the OHDSI teams, I can add you to the invite.

Thank you. I found the information in Teams. Much appreciated.

Hi @hannes ,

We’re planning on using the above mappings you’ve created, and possibly modifying. I’ll take a look at the CDM subgroup from 2022 to see what the group decided regarding race/ethnicity. However, since you’ve instantiated the mappings, how has it been working for you, regarding the use case you were trying to solve (possibly for end users)?

I’m planning to ingest these (or similar) mappings as custom concept_relationship entries, and just wanted to see if you found the ingestion useful.