OHDSI Home | Forums | Wiki | Github

Source of Race vocabulary in OMOP CDM

(Anita Umesh) #1

I would like to know the source of the “race” vocabulary used in the OMOP CDM. Is it based on the US Census Bureau? Is there a place where this kind of information can be easily found?

Thank you

(Christian Reich) #2

@You can find the source of all vocabularies in the VOCABULARY reference table:

select * from vocabulary where vocabulary_id='Race';

The vocabulary_reference field tells you: http://www.cdc.gov/nchs/data/dvs/Race_Ethnicity_CodeSet.pdf

(Anita Umesh) #3

Thank you @Christian_Reich! :smiley:

(Piper Ranallo) #4

Hi Christian,
Has there been an discussion about adding the CDC extended race and detailed ethnicity codes to the concept table and mapping them to OMOP standard concepts?

(Christian Reich) #5

Is that not the one mentioned above, @piper?

(Piper Ranallo) #6

Hi @Christian_Reich,

Yes, it appears that the PDF contains the complete set of race and ethnicity values. However, when I look at the concept table it doesn’t appear that all values are included. Where concept values are present, the concept code is one I’m not familiar with - it’s not the code used natively in the CDC published value set

I was anticipating that we’d see all of the race and ethnicity codes and terms in the concept table along with the code that appears natively in the CDC value set, e.g., code 1004-1 for race concept ‘American Indian’

I am new to OHDSI so may not be fully understanding the nuances of the vocabular and mapping tables!


(Christian Reich) #7

No. Only the first two hierarchical levels. For example, 8527 “White” and 38003614 “European”. “Italian” is already missing. 38003572 “American Indian” is also second hierarchical level.

The concept_code is identical to the field “Hierarchical Code” in the table, except the R is missing.

Generally, we want to get away with this system where ethnicities are descendants of races. There are so many things wrong with this notion. We should put ethnicities into the field ethnicity_concept_id, and races into race_concept_id. The vocabulary should be simple and straightforward.

What’s your use case? What are you trying to achieve?

(Piper Ranallo) #8

Thanks @Christian_Reich. This is helpful!

Agree regarding apparent conflation of race and ethnicity in the CDC code set (and in many EHRs). In fact, the ethnicity values in the source data we’re currently working with map better to CDC race values than to ethnicity values. However, b/c they are captured as ethnicities, we are not mapping them to race values. For those ethnicities that map to the second level ethnicity codes in the CDC detailed ethnicity value set, we’re rolling them up to the parent ethnicity so they can be mapped to an OMOP concept id. This is the use case for having at least the 2nd level ethnicity codes in the table.

The value proposition of adding the CDC codes to OMOP would just be to facilitate mapping when the CDC code or descriptor appears in the source data.

One more quick question - the source data we have allows for multiple race records per patient. Wondering how other folks have dealt with this?

Thanks in advance for your insight,

(Ajit Londhe) #9

I’m curious about multiple race source values for a patient, as well. The person table is meant to be 1:1, 1 record per patient, so I guess you’d have to throw in the other values in fact_relationship?

(Christian Reich) #10

Why don’t you just truncate the source codes and be done with it?

Also, are you just trying to capture the information, or do you have some research in mind?

Urgh. Let’s not go there. Races are already a wishy-washy mostly social construct with no objective criteria, If you start arithmetic with them (3/8th Black) it completely breaks down. There are no reasonable use cases.

(Piper Ranallo) #11

Ah, I see what you’re saying. This is the first time I’m seeing the hierarchical code. The code that typically appears in source data is the code listed as the unique identifier rather than the hierarchical code. The unique identifier is also the code referenced in the HL7 standard here.


Yes, at this point just making sure that what’s in OMOP is a complete and accurate representation of what’s in the source system. Race and ethnicity will be important for health disparities research down the road.

(Piper Ranallo) #12

@Ajit_Londhe, how have you approached the issue?

It’s unclear how one would select a single race when a person reports that they are both African American and Native American Indian.

(Ajit Londhe) #13

I don’t have this use case in my data. Might be good to check in with the CDM Working Group about this question. All I can think of is to pick the race value that you believe to be most clinically important, and then perhaps store the other value in observation?

(Tim Quinn) #14

I work for Mount Sinai Health System here in New York City, where our patient population spans the entire range of race and ethnicity codes (truly!). Like @piper, our researchers are working on health disparities projects, especially for at-risk populations.

We are doing exactly what @Ajit_Londhe suggested: loading a “primary” race to OMOP’s person.race_concept_id column and putting the others in OMOP’s observation table.

(Piper Ranallo) #15

@quinnt, I like the approach of capturing all race and ethnicity data in the observation table.

What’s your process for determining which race is primary?

(Tim Quinn) #16

Our EHR system assigns line numbers for a patient’s race categories, so we’re simply selecting the race from the first line (line number = 1).

Obviously, this is unscientific and somewhat arbitrary, but we have no other information in structured fields to go on.

(Alexander Davydov) #17

Why would you need it in the Observation table? Why don’t store it in the right place - race_source_value and ethnicity_source_value fields?

(Piper Ranallo) #18

@Alexdavv, the PERSON table can store only one race and one ethnicity value. The observation table can store the remaining values.