Dealing with multiple races and other exceptions

Pulver · September 30, 2023, 5:19pm

Perhaps I would better understand your perspective if you explained, using the current schema, how you would you populate person.race_concept_id and ethnicity_concept_id in a longitudinal database, using domain-consistent standard concepts, keeping in mind that as you can’t foresee questions which will be addressed by future users, you wish to minimize loss of intelligence.

Situations we may face on large-scale USA-based studies:

Bi-racial person identifies as Asian and African American?
Person identifies as Spanish?
Person identifies as Chamorra?
Person identifies as Chicano?
Person says “I don’t know”?
Person explicitly refuse/declines to answer?
Person writes in “Australian Aborigine”?
Person identifies as African American early in the study, but years later identifies as bi-racial?

Hypothetical cases that i expect our colleagues overseas may be likely to encounter:

Bi-racial person identifies as Australian Aborigine and English?
Person identifies as Spanish?
Person identifies as New Zealand Maori?
Person identifies as Zulu?
Person identifies as Han?

How or would you code the five “overseas” scenarios differently if you were in Europe, Australia, Asia, or Africa?

[It could be helpful if folks from outside the US joined this discussion!]

Gerry

Christian_Reich · September 30, 2023, 5:41pm

@Pulver:

You could have a concept “Asian-African American” in either race_concept_id or ethnicity_concept_id, depending where you think that should be.
You could have a concept “Spanish” in either race_concept_id or ethnicity_concept_id.
You could have a concept “Chamorra” in either race_concept_id or ethnicity_concept_id.
etc.
You have a record with both fields=0.
You have a record with both fields=0.
You have a concept “Australian Aborigine” in either …
I don’t understand that. We are not running prospective studies in OMOP. Whatever is in the current instance of the database is the fact.
You could have a concept “English-Australian Aborigine” in either…
etc.

But you still haven’t actually provided a use case. How would an “English-Australian Aborigine” be used in a query? Give me all persons who are “English”, or “Half English”, or “Quarter English”, or “Three Eigth English”? Is that a cohort you would study?

If this sounds sarcastic, it is not. I don’t know how to create scientifically valid questions that would give us evidence out of this mosaic information.

Pulver · October 3, 2023, 10:56pm

Concerning your apparent suggestion that OMOP is useful solely for looking at characteristics at a fixed point in time:

In a longitudinal study we may look at observations occurring prior to and during the period of data collection. Of course, by the time we receive incremental batches of data recorded during the study, they will be retrospective observations. As @Wilson_Pace explained, the demographic characteristics of a patient followed over a period of years can not be assumed to remain static.

You say that we could have all sorts of combination codes. However, as we presently do not have standard domain-compliant concepts for most of the world’s racial/ethnic identities, let alone combinations of them, this does not address my point that it can not be done today using the current US-centric scheme.

Considering your suggestion as an alternative to mine;

I would prefer to record a combined race/ethnicity using multiple codes from “n” concepts, rather than have n! concepts to cover all possible combinations ethnicities/races. To be comprehensive, globally, “n” would be a multi-digit number, which, history suggests, will grow over time.
Sure, most combinations may be extremely unlikely; would a committee be charged with selecting the “likely” combinations and promptly updating the list when an exception arises?
Once we accept that race/ethnicity is dynamic, I don’t want to deny future consumers of the data the ability to track changes over time.

P.S. While I look forward to your reply, I am going to step back from the discussion for now to see how others feel, particularly folks from outside the USA and others who deal with records of subjects whose ethnicity doesn’t neatly fall into the American Indian / Alaskan Native | African American/Black |Asian |Native Hawaiian / Pacific Islander / White & Hispanic structure.

MPhilofsky · October 3, 2023, 10:06pm

I wasn’t going to reply with a solution, but then I was asked for my opinion

If I were to design a solution, I would remove race and ethnicity from the Person table. I would create a convention to direct ETLers to put all races & ethnicities in the Observation table with observation.observation_concept_id = 3050381, “Race or ethnicity” (honestly doesn’t matter what the concept_id is as long as it means race/ethnicity) and all the individual races and ethnicities would be ETL’d into the observation.value_as_concept_id field. Then I would gather every race/ethnicity/cultural/skin identifier from every source I could find in the world, divide them into singular concepts if needed, de-duplicate them on exact/almost exact text string match and make them all standard concepts with domain_id = ‘Race/Ethnicity’. There would not be a hierarchy because it does not exist in nature in this era of time. The ETLer would have to split up any combos they have in their data and ETL each as a separate row. So, “black hispanic” becomes a row for “black” and a row for “hispanic”. The researchers can then group them however they want. This will require an initial push by the Vocab team, then it will be low maintenance after. Even better, it might be a good use case for the “community contribution” vocabulary project Anna Ostropolets unveiled. Then the community can continue to contribute as new races/ethnicities are discovered/unveiled.

Pulver · October 3, 2023, 10:53pm

As your proposal largely coincides with ours apart from using “Race or Ethnicity” as observation_concept_id and shifting the various racial identifiers to value_as_concept_id, which, I think is better, I am happy to give it a .

I wonder though, how (or if) the current race_concept_id, ethnicity_concept_id, and related fields would be populated. Though I assume that they would be dropped from the CDM in a future major release, until then, I expect that the two conventions for handling race/ethnicity would need to be supported in parallel. Should the grandfatherly fields be left unpopulated (NULL), or should a flag value of some sort be inserted, to clue analysts to look to Observation for the race/ethnicity? Perhaps there is a convention for handling this predicament.

MPhilofsky · October 4, 2023, 12:58am

A flag would be needed. concept_id = 0 could work, but would also include the flavors of null. So, it’s not the best option. “Other” would work for those who identify as one of the non-standard or unlisted races or ethnicities including multi-racial or multi-ethnic. However, I do realize “other” is considered a valid response to What is your race? question when the responder is only given a handful of choices, as @Pulver pointed out:

How about a concept_id = “see all observation.value_as_concept_id values for this person were observation_concept_id = 3050381”? I’m joking, but only slightly. I need other’s input on this

Correct, but we can’t denormalize the OMOP CDM. So, having a concept_id for “other” and then pointing folks to the Observation table is one option. Again, we need input on this.

Correct for parallel support. Possibly for dropping the race & ethnicity data from the Person table. If it’s not here, then look there is not ideal. We need to make the CDM analytic ready. But we also need to take into account the downstream consequences of moving a domain from a “static” table to a “clinical event” table. This is where the use cases really come into play. We need the questions you are asking of the data to correctly model the data. And as you pointed out, we need folks in other countries to chime in on this along with the researchers who have the questions. @Christian_Reich is taking notes on use cases and what is in your source data. Please post them here. We need to know so we can fully model these data and close this issue with a solution that will stand the test of time.

Andrew · October 26, 2023, 9:29pm

Re use cases
Christian, I added several use cases illustrating the need for multiple races to Jake’s proposed changes to race and ethnicity quite a while ago. There are many cases like these where there is a driving need to distinguish among people who self-identify as more than one race because they have distinct health outcomes.

Re what is in our source data
Jared can quantify it exactly, but we have a significant number of people we currently assign a “0” to because they report multiple races. That practice is currently creating confusion in our use of OMOP for reporting for projects that are building OMOP-shaped datasets. Workarounds can be found, but it’s a pain.

Re Race is a construct
As I’ve noted elsewhere, many medical diagnoses don’t mean the exact same thing across cultures. Mental health diagnoses in particular, but others also have a different meaning and are ascertained using different methods by people with different training. Since those standards are not and cannot be applied to other OMOP concepts for things like conditions, it seems unwise, or at least inconsistent, to single out race/ethnicity as the set of concepts where those standards have to be met.

Christian_Reich · October 27, 2023, 2:51am

Why post-coordinate? Why wouldn’t a race as an observation be self-explanatory and go into observation_concept_id?

This appears hardly possible. Because races have no definition, and neither have ethnicities. They are geographical/social/biological trait-based/ideological categories that people either place other people in, or put themselves in, or a little bit of both. There is no test that would determine whether a Black person in the US or in Nigeria or in the UK is the same race. Or even Black. There is no way to decide whether my Grandmother’s ethnicity was Austrian, Czechoslovakian, Czech, East German or German (she held all passports at some point). We cannot deduplicate, except those concepts that are clearly derived from the same source (e.g. the US OMB race categories, which, btw, have changed over the years). We cannot even distinguish races from ethnicities (think “Indian” in Singapore, which is thought of a race there). That means we cannot create standard concepts (which require to be deduped), unless we allow everybody’s concept as an independent declaration of some category. Which is totally fine with me.

Please bring them on. I haven’t heard much except some kind of regression between racial or ethnic categories and co-morbidities. @rimma was working on a project studying healthcare access.

That’s the one I am strongly supporting.

That is not correct. It is true that especially in mental disease, but also in cancer, there are different schools of categorization, and they often happen to be centered between country or continent lines. But not because somehow different ethnicities go crazy in different ways, or have different malignancies. We are all the same creatures biologically, with very very rare exceptions (a handful of mostly monogenetic diseases). It’s just the way science develops. In fact, the very vast majority of conditions are in full consensus internationally, and passing a final exam in Med School in one country enables you to do the same thing in another country hands down (but do it fast before you start forgetting this massive corpus of encyclopedic information). I know that.

MPhilofsky · October 27, 2023, 4:04pm

I spoke with the Health Equity WG on Symposium Saturday outlining my proposal for race & ethnicity data in the CDM. The WG meeting wasn’t recorded, but the slides are posted here for anyone who is interested. I’d be happy to review this proposal with all interested persons.

Many reasons:

We do it for the “history of” concepts and it works very well for the ETLer and the end user. The data are easily inserted and retrieved. win-win
Stratifying by “Asian” when the term “Asian” is found in >1000 concepts might be tedious for end users.
Race, ethnicity, indigenous status and other terms used to phenotypically or otherwise describe a person’s country of recent or far in the past country of origin are ambiguous and poorly defined at the global level. These terms are continuously evolving as science evolves, the vocabularies for these terms evolve and more source data are revealed at the global level. The pre-coordinated list of concepts would be extraordinarily long due the lack of a global definition for race, ethnicity, indigenous status, etc. And the list of concepts will continue to grow over time creating an unwieldy set of terms needing to be pre-coordinated.

Happy we are in agreement We split combo terms, “Black African”, “Black Caribbean”, etc. into two separate terms then ETL. And we de-dupe on exact or close to exact string match. Everyone contributes. Stanford’s “Native Hawaiian” and Colorado’s “natives hawaiian” are the same except for the typo and casing of the words. We keep “Native Hawaiian” and Colorado maps “natives hawaiian” to “Native Hawaiian”. We keep it super simple and when in doubt keep both.

I reviewed the proposal.

It is not global enough. See Georgie’s use case here on August 1, 2023.
Leaving these data in the Person table precludes us from identifying the date of this observation and the provenance of these data. And this list will continue to expand.

At this time, the proposal doesn’t cover the following:

Flavors of NULL - flavors of null are not allowed in the CDM and with very few exceptions
Negative values - statements of “it did not occur, it’s not part of the record, negative values, etc”. With a good use case, I can be persuaded to add “non-____” terms.
Hierarchies - Hierarchies are tough for these data. There isn’t a globally inclusive hierarchy at this time. Note - not including hierarchies in the Concept Ancestor table does not preclude users from using their own hierarchies.

Keep it simple and pragmatic for our first implementation. Let’s use the data in the new table with the new concept_ids, run it through some use cases and research, find areas for improvement, regroup, and make a plan for future iterations. This solution is scalable.

This is a HUGE change and will affect many in the OHDSI community from the ETL through the pipeline to the researchers including the following OHDSI groups: Vocabulary, Themis, CDM, DQD, Phenotype Library, Methods & others. This is an open source community, we need folks to volunteer to help with this effort or it will continue to be a topic in the forums, on the Github and in the minds of everyone, but NOT an implementation in the CDM. Who’s joining @Christian_Reich to move this proposal forward?

Dave.Barman · October 27, 2023, 6:11pm

Just chiming in to say I really like this solution and believe it makes more sense in terms of OMOP and ETLing.

Andrew · October 27, 2023, 7:45pm

Christian, I know this is the proposal you are championing. I was responding to the call to post for your notes the linked supporting use cases and our relevant source data issues.

Re our differences on whether biomedical medical model is universal, we can agree to disagree. I think there are clearly culture-bound syndromes in mental health. I’m a bit surprised you don’t seem to think so, but it’s mostly a side issue in this thread. I brought it up here to help clarify whether temporal stability and international universality are applied as criteria to all other data representations especially in light of points made about driving use cases Australia.

Melanie, I’m happy to join Christian in moving this proposal forward if he’ll have me .

Christian_Reich · October 28, 2023, 3:05am

I would claim none of them are. But it doesn’t matter. It’s either

Your solution of a “has race/ethnicity” Observation concept with concepts of the “Race/Ethnicity” domain in value_as_concept_id. This is a big surgery, as you pointed out.
Jake’s solution of race and ethnicity fields with those same concepts in it. It’s a little easier on the ETLers for it’s backward compatibility.

The difference really is only in nuances. I was thinking we wait a little to see if there are more opinions, and then we make a choice between the two.

Unfortunately no. You cannot do that. They depend on the context. The context is usually a country. A Black African? Such an idea only exists outside Africa. In South Africa for example, there is a big distinction between “Black” and “Coloured”. Go figure.

We cannot dedup for lack of definition. Lexical similarity is insufficient. Concepts only can be construed as the same if they share the original source. So, yes, we will end up with quite a big bucket of races and ethnicities. It’s up to the analysts to make sense of that. I wish them luck.

Thanks.

Sure they are. So, we need to agree to agree. I am debating that making a definition is bound by the culture/identity/race/ethnicity/whatever of a standardizing body. Definitions are created by professional societies, and if there are disagreements they are debated in public and tend to be resolved over time. Sometimes a long time. The fact that the differences seems to depend on the country is because that’s how the societies are compartmentalized. There is no such thing as “definitions for my people only”.

Alexdavv · October 28, 2023, 3:42am

Actually it does. We had a long discussion around the vocabulary principles so we better stick to them in any new implementation.
We do pre-coordinate and don’t post-coordinate unless there’s a strong reason like explosion or inability to represent the facts as self-sufficient entities.

Since @MPhilofsky proposal doesn’t have any flavors of information reporting like “current race indicated by the patient” or “race my parents told me I had at birth”, there’s no reason to post-coordinate:

It works very well either.
Users will limit their reporting to the Race domain, so useless concepts will be filtered out.
If the new race concepts are self-sufficient it doesn’t matter where do you put them in. The concepts are essentially the same and the maintenance effort is the same. The only difference is that they will include a small piece of additional information which is “it’s the race or ethnicity”. We can even skip it, and have a convention that the Race domain implies this.

Christian_Reich · October 30, 2023, 12:39am

Looks like you actually agree with me and reject the reasons.

But again. We have two proposals. Both would work with much wider race/ethnicity domain concepts we’d have to put together (with some formal deduping). Melanie’s solution allows time stamps for those pieces of information, Jake’s does not. Melanie’s solution allows multiple concepts in parallel, Jake’s solves that problem with concepts indicating mixtures. Melanie’s solution is the larger surgery, Jake’s is backward compatible.

Let’s duke it out in one of the upcoming Vocabulary WG sessions and be done with it.

gkennos · October 30, 2023, 2:02am

Hi all,

May I request that the specific vocabulary working group session that you will tackle this issue is scheduled at a time that representatives from the Australian chapter are able to be present and participate in the discussion.

Per conversations with Melanie and others at the symposium, it is the consensus within the Australian chapter that:

There is a well-established practice of capturing Indigenous status in Australia
There would be almost universal uptake within Australia of a specific observation type covering this established best practice, and there is no expectation that this observation would have relevance in any other setting and thus does not affect or overload any of the discussions here
Due to practices around data linkage that are (again) well-adhered to within an Australian setting, this CANNOT go in the person table (see slides posted in the health-equity channel in teams for references and full description if interested)
We have engaged with Indigenous data leaders in Australia to submit a fulsome community contribution that will be distinct from the race/ethnicity domain entirely, and include full documentation of conventions around the submission - this should not affect any changes made to race and ethnicity as discussed in this thread, as they will be standalone and distinct
This will include references to the Australian national standard on ethical conduct in research, and therefore is likely to stop anyone not using appropriate vocab items from receiving HREC (~IRB) approval + AH&MRC approval (additional oversight required for research questions pertaining specifically to Indigenous populations), so adding these concepts to race and ethnicity domain is actually counterproductive
The SNOMED codes for Australian Indigenous status are out of date and do not represent best practice in this setting, please do not elevate them to standard concepts
There is a concept ‘Neither Aboriginal nor Torres Strait Islander’ included in this typical practice which will be included - it is not considered a negative concept, rather a positive assertion of a negative fact (similar to a negative test result) and therefore must be supported

Details will be forthcoming, pending input from the Indigenous reference group.

As such, please do not conflate Australian Indigenous status with these updates - this will be submitted separately and handled at a chapter level.

Thanks,
Georgie

liujie · October 30, 2023, 8:29pm

Hi all
I am still new and try to map multiple race and failed.
I found some concepts like Mixed - White and Black African = 700389 or Mixed - Any other mixed background-700391 at Athena website under domain=race and class=race. But i cannot use them as the race_concept_id in person table because these concepts does not exist in my downloaded vocabulary table-concept. The foreign key stopped me. Shouldn’t I use them? Or my concept table (downloaded several weeks ago) needs some update?
Can anybody help me please?

Mark · October 30, 2023, 8:47pm

If you read the thread, you will see it is (and has been for a while) under discussion on what to do about multiple races.

liujie · October 30, 2023, 8:56pm

Thank you! @Mark
The discussion is all about how to make changes. My question is to use the current system. I can see these mix race items through Athena but my downloaded vocabulary table - concept did not include these concepts.

Mark · October 30, 2023, 9:07pm

Both the codes you quoted are non-standard, so they are not valid types to put into omop. Both of those map to white if you do a non-standard to standard mapping.

Also both are NHS codes, you may not have end user licensing for said set. I am American, so I have not idea as we do not use them.

In Athena, if you select the standard concept, you will see there are no current valid mixed race concept.

liujie · October 30, 2023, 9:10pm

@Mark
NHS probably is the reason why my download vocabulary table-concept did not include them because i am in united states too. The other non-standard race concept like 8522-Other Race are all in the concept table.
Thank you.