OHDSI Home | Forums | Wiki | Github

Race concepts for skin colors

One of our Brazilian datasets contains race classifications only by skin color. Is this information too vague for populating race in the CDM? We do have standard concept ids for ‘White’ and ‘Black’. Can we consider adding ‘Yellow’ and ‘Brown’ to the vocabulary?


Sounds like these are the pre-PC synonyms of the main races:

White = Caucasian
Black = Black
Yellow = Asian
Brown = Native American (sometimes also denoted as “red”)

Can you find out?

White = Caucasian
Black = Black
Yellow = Asian
Brown = Native American (sometimes also denoted as “red”)

Within the context of the Brazilian census, these colors have very particular meanings, and citizens are asked to self-identify as one of five races (much as OMB has races defined for US citizens): branco, pardo, preto, amarelo, or indigena. One could make the argument that while branco = white, preto = black, amarelo = Asian, and indigena = Native American, pardo is better parsed as multiracial or mixed. These categories are descended from the colonial casta system and do not map neatly onto American conceptualizations of race and ethnicity.

Appreciate the replies. This is useful information and we’ll double check with our data providers as well. That being said, the closest mapping for “indigena” is American Indian and I’m not sure if that’s an accurate mapping. And since we don’t have a standard concept for multiracial we’d end up leaving that unmapped too. That would mean we would have no way to differentiate between indigena, pardo, and unknown race in Brazilian data (without looking at the source value fields).

Considering these are the 5 races used by the Brazilian census, wouldn’t it be best to have mappings available for them?

Indigena/indeginous (specifically for Brazil/South America)

Just out of curiosity the concept_id below for race, is that meant to be for an indigenous people? While working on Brazil we see that there is a mapping that would fit into indigenous people category but historically American Indian or Native American is specifically North America. So this concept doesn’t feel like the best place to put it. To play outside of the America’s where would the indigenous people of Australia go? If they would go in this concept_id I would suggest changing it to be more generic just to help clear up some of at least my confusion. I will admit it’s minor but I would like it be clear. Also if this is not the best place, can there be a place for generic indigenous people? Sorry if this needs be a separate topic I’d be happy to make it.

8657 - American Indian or Alaska Native


When it comes to race of indigenous people of different regions, it’s actually difficult to assign them to a common racial category such as white, black, asian. In my opinion indigenous groups are actually more of interest to researchers in the local context. ie. indigenous people of Brazil is different to north america and Australian aboriginals and each of them is usually only discussed in the context of their own region. In Australia, we would consider indigenous groups ethnic groups and created concepts the combinations of aboriginal and/or Torres strait islander and their race unmapped.

@MichaelWichers as you said, choices for indigenous are limited to the US, so apparently, you wouldn’t use them. On the other hand, a generic “Native” race wouldn’t make sense if you still want to be able to distinguish natives from different parts of the world. So I’d either leave it 0 as @guanguo suggested or create a standard concept of the race of interest on your local instance with concept_id 2 billion+.

Out of curiosity, what would be the driver for the decision to create local standard concepts vs. pushing for standardization of existing concepts? SNOMED already has codes for all of the race concepts mentioned in this thread - 4218447 works for Brazilian indigenous peoples and 4183595 works for Australian aboriginal peoples, no? Related to that, what is the rubric for deciding whether a concept is standard/non-standard? Apologies if this is clearly documented elsewhere - please feel free to tell me to go dig for my answer.

Basically, we have a standard vocabulary for races and it’s not Snomed :smile: The most accurate way would be to add these concepts into the vocabulary but since it’s an external vocabulary that doesn’t belong to us we can just politely ask the owners and wait…


What I don’t understand why this subject keeps popping up as stubbornly as it does. I haven’t heard much of a use case other than “I want to stratify XYZ by race”, and when that’s done folks stare at the percentages and learn - nothing.

The vast vast majority of our genetic material, which defines the our susceptibility to disease, is not distributed along racial lines. The couple genes switching melanine production high or low, or shape the eyes and nose length, appear important to our visual pattern recognition, but have nothing to do with outcomes. Tay-Sachs syndrome in Ashkenazi and Quebecians, and gastric cancer in Orientals are extremely rare exceptions in the overall distribution of things. When you really need to compare these racial subpopulations - please use the source values. If you in fact are after socioeconomic factors - there are better surrogates than that.

I guess I have an aversion against this insistence on racial differences.

I think it keeps popping up because people are interested in investigating the impact of race as sociological construct on people’s health outcomes.

It’s a well-established fact that race is not a biophysical concept - as you point out, there is very little substantial genetic difference between races. That said, race is still a very real sociological construct, as recent events can well attest. People are treated differently by any number of different entities with very real power based on their perceived race. This can have all sorts of interesting impacts on their health. For example, Claude Steele has written elegantly on the role of “stereotype threat,” and others have tied this into higher incidence of hypertension in African-Americans. I’m sure you know better than I do that genetics are just one component of the stew that determines who gets sick in what way - knowing what race people are (and whether it’s self-identified, determined by an ADT clerk, etc etc) allows us to do interesting work describing the impact of being seen as X (and living in a society where people are seen as X are treated differently in Y ways). For example, some areas in Australia have implemented restrictions on gasoline designed to target inhalant abuse in aboriginal populations - knowing who is and isn’t aboriginal affects our ability to assess the impact of this policy change.

I would frame it not so much as an insistence on racial differences as an insistence on the importance of being able to accurately asses the extent to and mechanisms by which structural racism creates and maintains the well-documented disparities in health between groups.

Fair enough.

But then you guys need to figure out how you want to conceptualize it. Because each time somebody doesn’t like the way it’s laid out right now in the vocabulary (and neither do I). If there is a general categorization even possible. What we have now is what Congress mandates for race capture in activities of the Federal Government of the USA. But if they are not biological but social constructs there is no way that the American version would work anywhere else where you have visibly different races in the population.

Any idea what we should do?

Then race concepts belong in the Observation table along with other social history concepts i.e. education level, recreational drug use, marital status, etc.

By the way, my genetic composition just got updated last week. I am now 65% Scandinavian, up from 48% the week before!

As @aostropolets suggested above, seems to me that it’s best to either leave it as 0 in race_concept_id and put the source value in race_source_value or create local 2 billion+ concepts and map it manually.