Race values - Not found in Athena


Our source data has patients with race and religion values such as Malay, Sikhism, Islam, Free thinker and Others.

While I am able to find Malay in Athena for other domains but I don’t find it under Race domain. Should we map it to zero?

While there is no specific domain called Religion, I assume I can pick observation domain concept ids for indicating religion terms

Should I put a formal request with vocabulary team to have this Malay and Caucasian (outdated race?) added as a race value?

please note Malaysian and Malay are different.

Any inputs from experienced ETL’ers would really be helpful


The race discussion is often coming up in the Forum (e.g. here, here and here). Let’s discuss it there.

I am still waiting for somebody actually doing the research and needing these data, rather than just trying not to lose what’s in the source data. Do you have that?

Hi @Christian_Reich, one really important use case is assessing disparities in health. Granular race and ethnicity data is critical for doing this. What are your thoughts about exploring architectural options that would support capture not only of multiple race and ethnicity values for a given person, but potentially more detailed data elements related to a person’s race and ethnicity (e.g., first-generation, etc.)?

@Akshay – For what it’s worth, here is what we’re doing:

We adhere to the guidance (or rule?) provided in the OMOP data dictionary to use only standard concepts from OHDSI’s “Race” vocabulary when populating the column person.race_concept_id.

As Christian states, this approach is a “lossy” exercise, because OHDSI’s “Race” vocabulary doesn’t include everything that we might want.

Fortunately, there are no constraints put on the person.race_source_concept_id (according to the current data dictionary), so we populate the concept (or closest concept) that we can find from any vocabulary in Athena. Ideally, these non-standard “source concepts” would be assigned to the Race domain by OHDSI, but they might not be.


“Architectural options” - we got the CDM. If we want to change it we need to go to the CDM WG.

Not sure we need that. Right now, we are planning to fix the hierarchy in such a way that the top are the 5 (?) concepts in the Race Domain, and the bottom contains the Ethnicity Domain concepts. The former would go to race_concept_id, the latter to ethnicity_concept_id. Happy to extend the hierchy, so we have more ethnicities.

Regarding the double races, and the whole race fraction arithmetic (5/8th of a Native American) - I am very skeptical we have the necessary data and, if they do exist, that the data are that useful. After all, races are not factual (we have no races in mankind), but self-declared socio-economical determinants with some lose phenotypical characteristics.

Do you have any concrete plans of studies or analytics you want to run with that? We need that, without it we won’t be able to make any change to the system.


Let’s close this stream. Right now, race and ethnicity representation is debated here, with a final proposal on the table after years of good debate. Unless folks come up with a significant objection detailing how the proposal will not allow a scientific use case we will adopt it. If you have such an opposition let us know the use case that would not work.