OHDSI Home | Forums | Wiki | Github

Dealing with multiple races and other exceptions

A flag would be needed. concept_id = 0 could work, but would also include the flavors of null. So, it’s not the best option. “Other” would work for those who identify as one of the non-standard or unlisted races or ethnicities including multi-racial or multi-ethnic. However, I do realize “other” is considered a valid response to What is your race? question when the responder is only given a handful of choices, as @Pulver pointed out:

How about a concept_id = “see all observation.value_as_concept_id values for this person were observation_concept_id = 3050381”? I’m joking, but only slightly. I need other’s input on this :slight_smile:

Correct, but we can’t denormalize the OMOP CDM. So, having a concept_id for “other” and then pointing folks to the Observation table is one option. Again, we need input on this.

Correct for parallel support. Possibly for dropping the race & ethnicity data from the Person table. If it’s not here, then look there is not ideal. We need to make the CDM analytic ready. But we also need to take into account the downstream consequences of moving a domain from a “static” table to a “clinical event” table. This is where the use cases really come into play. We need the questions you are asking of the data to correctly model the data. And as you pointed out, we need folks in other countries to chime in on this along with the researchers who have the questions. @Christian_Reich is taking notes on use cases and what is in your source data. Please post them here. We need to know so we can fully model these data and close this issue with a solution that will stand the test of time.

Re use cases
Christian, I added several use cases illustrating the need for multiple races to Jake’s proposed changes to race and ethnicity quite a while ago. There are many cases like these where there is a driving need to distinguish among people who self-identify as more than one race because they have distinct health outcomes.

Re what is in our source data
Jared can quantify it exactly, but we have a significant number of people we currently assign a “0” to because they report multiple races. That practice is currently creating confusion in our use of OMOP for reporting for projects that are building OMOP-shaped datasets. Workarounds can be found, but it’s a pain.

Re Race is a construct
As I’ve noted elsewhere, many medical diagnoses don’t mean the exact same thing across cultures. Mental health diagnoses in particular, but others also have a different meaning and are ascertained using different methods by people with different training. Since those standards are not and cannot be applied to other OMOP concepts for things like conditions, it seems unwise, or at least inconsistent, to single out race/ethnicity as the set of concepts where those standards have to be met.

1 Like

Why post-coordinate? Why wouldn’t a race as an observation be self-explanatory and go into observation_concept_id?

This appears hardly possible. Because races have no definition, and neither have ethnicities. They are geographical/social/biological trait-based/ideological categories that people either place other people in, or put themselves in, or a little bit of both. There is no test that would determine whether a Black person in the US or in Nigeria or in the UK is the same race. Or even Black. There is no way to decide whether my Grandmother’s ethnicity was Austrian, Czechoslovakian, Czech, East German or German (she held all passports at some point). We cannot deduplicate, except those concepts that are clearly derived from the same source (e.g. the US OMB race categories, which, btw, have changed over the years). We cannot even distinguish races from ethnicities (think “Indian” in Singapore, which is thought of a race there). That means we cannot create standard concepts (which require to be deduped), unless we allow everybody’s concept as an independent declaration of some category. Which is totally fine with me.

Please bring them on. I haven’t heard much except some kind of regression between racial or ethnic categories and co-morbidities. @rimma was working on a project studying healthcare access.

That’s the one I am strongly supporting.

That is not correct. It is true that especially in mental disease, but also in cancer, there are different schools of categorization, and they often happen to be centered between country or continent lines. But not because somehow different ethnicities go crazy in different ways, or have different malignancies. We are all the same creatures biologically, with very very rare exceptions (a handful of mostly monogenetic diseases). It’s just the way science develops. In fact, the very vast majority of conditions are in full consensus internationally, and passing a final exam in Med School in one country enables you to do the same thing in another country hands down (but do it fast before you start forgetting this massive corpus of encyclopedic information). I know that.

I spoke with the Health Equity WG on Symposium Saturday outlining my proposal for race & ethnicity data in the CDM. The WG meeting wasn’t recorded, but the slides are posted here for anyone who is interested. I’d be happy to review this proposal with all interested persons.

Many reasons:

  1. We do it for the “history of” concepts and it works very well for the ETLer and the end user. The data are easily inserted and retrieved. win-win :slight_smile:

  2. Stratifying by “Asian” when the term “Asian” is found in >1000 concepts might be tedious for end users.

  3. Race, ethnicity, indigenous status and other terms used to phenotypically or otherwise describe a person’s country of recent or far in the past country of origin are ambiguous and poorly defined at the global level. These terms are continuously evolving as science evolves, the vocabularies for these terms evolve and more source data are revealed at the global level. The pre-coordinated list of concepts would be extraordinarily long due the lack of a global definition for race, ethnicity, indigenous status, etc. And the list of concepts will continue to grow over time creating an unwieldy set of terms needing to be pre-coordinated.

Happy we are in agreement :slight_smile: We split combo terms, “Black African”, “Black Caribbean”, etc. into two separate terms then ETL. And we de-dupe on exact or close to exact string match. Everyone contributes. Stanford’s “Native Hawaiian” and Colorado’s “natives hawaiian” are the same except for the typo and casing of the words. We keep “Native Hawaiian” and Colorado maps “natives hawaiian” to “Native Hawaiian”. We keep it super simple and when in doubt keep both.

I reviewed the proposal.

  1. It is not global enough. See Georgie’s use case here on August 1, 2023.

  2. Leaving these data in the Person table precludes us from identifying the date of this observation and the provenance of these data. And this list will continue to expand.

At this time, the proposal doesn’t cover the following:

  • Flavors of NULL - flavors of null are not allowed in the CDM and with very few exceptions
  • Negative values - statements of “it did not occur, it’s not part of the record, negative values, etc”. With a good use case, I can be persuaded to add “non-____” terms.
  • Hierarchies - Hierarchies are tough for these data. There isn’t a globally inclusive hierarchy at this time. Note - not including hierarchies in the Concept Ancestor table does not preclude users from using their own hierarchies.

Keep it simple and pragmatic for our first implementation. Let’s use the data in the new table with the new concept_ids, run it through some use cases and research, find areas for improvement, regroup, and make a plan for future iterations. This solution is scalable.

This is a HUGE change and will affect many in the OHDSI community from the ETL through the pipeline to the researchers including the following OHDSI groups: Vocabulary, Themis, CDM, DQD, Phenotype Library, Methods & others. This is an open source community, we need folks to volunteer to help with this effort or it will continue to be a topic in the forums, on the Github and in the minds of everyone, but NOT an implementation in the CDM. Who’s joining @Christian_Reich to move this proposal forward?

3 Likes

Just chiming in to say I really like this solution and believe it makes more sense in terms of OMOP and ETLing.

1 Like

Christian, I know this is the proposal you are championing. I was responding to the call to post for your notes the linked supporting use cases and our relevant source data issues.

Re our differences on whether biomedical medical model is universal, we can agree to disagree. I think there are clearly culture-bound syndromes in mental health. I’m a bit surprised you don’t seem to think so, but it’s mostly a side issue in this thread. I brought it up here to help clarify whether temporal stability and international universality are applied as criteria to all other data representations especially in light of points made about driving use cases Australia.

Melanie, I’m happy to join Christian in moving this proposal forward if he’ll have me :smiley:.

1 Like

I would claim none of them are. :slight_smile: But it doesn’t matter. It’s either

  • Your solution of a “has race/ethnicity” Observation concept with concepts of the “Race/Ethnicity” domain in value_as_concept_id. This is a big surgery, as you pointed out.
  • Jake’s solution of race and ethnicity fields with those same concepts in it. It’s a little easier on the ETLers for it’s backward compatibility.

The difference really is only in nuances. I was thinking we wait a little to see if there are more opinions, and then we make a choice between the two.

Unfortunately no. You cannot do that. They depend on the context. The context is usually a country. A Black African? Such an idea only exists outside Africa. In South Africa for example, there is a big distinction between “Black” and “Coloured”. Go figure.

We cannot dedup for lack of definition. Lexical similarity is insufficient. Concepts only can be construed as the same if they share the original source. So, yes, we will end up with quite a big bucket of races and ethnicities. It’s up to the analysts to make sense of that. I wish them luck.

Thanks.

Sure they are. So, we need to agree to agree. I am debating that making a definition is bound by the culture/identity/race/ethnicity/whatever of a standardizing body. Definitions are created by professional societies, and if there are disagreements they are debated in public and tend to be resolved over time. Sometimes a long time. The fact that the differences seems to depend on the country is because that’s how the societies are compartmentalized. There is no such thing as “definitions for my people only”.

Actually it does. We had a long discussion around the vocabulary principles so we better stick to them in any new implementation.
We do pre-coordinate and don’t post-coordinate unless there’s a strong reason like explosion or inability to represent the facts as self-sufficient entities.

Since @MPhilofsky proposal doesn’t have any flavors of information reporting like “current race indicated by the patient” or “race my parents told me I had at birth”, there’s no reason to post-coordinate:

  1. It works very well either.
  2. Users will limit their reporting to the Race domain, so useless concepts will be filtered out.
  3. If the new race concepts are self-sufficient it doesn’t matter where do you put them in. The concepts are essentially the same and the maintenance effort is the same. The only difference is that they will include a small piece of additional information which is “it’s the race or ethnicity”. We can even skip it, and have a convention that the Race domain implies this.

Looks like you actually agree with me and reject the reasons.

But again. We have two proposals. Both would work with much wider race/ethnicity domain concepts we’d have to put together (with some formal deduping). Melanie’s solution allows time stamps for those pieces of information, Jake’s does not. Melanie’s solution allows multiple concepts in parallel, Jake’s solves that problem with concepts indicating mixtures. Melanie’s solution is the larger surgery, Jake’s is backward compatible.

Let’s duke it out in one of the upcoming Vocabulary WG sessions and be done with it.

Hi all,

May I request that the specific vocabulary working group session that you will tackle this issue is scheduled at a time that representatives from the Australian chapter are able to be present and participate in the discussion.

Per conversations with Melanie and others at the symposium, it is the consensus within the Australian chapter that:

  • There is a well-established practice of capturing Indigenous status in Australia
  • There would be almost universal uptake within Australia of a specific observation type covering this established best practice, and there is no expectation that this observation would have relevance in any other setting and thus does not affect or overload any of the discussions here
  • Due to practices around data linkage that are (again) well-adhered to within an Australian setting, this CANNOT go in the person table (see slides posted in the health-equity channel in teams for references and full description if interested)
  • We have engaged with Indigenous data leaders in Australia to submit a fulsome community contribution that will be distinct from the race/ethnicity domain entirely, and include full documentation of conventions around the submission - this should not affect any changes made to race and ethnicity as discussed in this thread, as they will be standalone and distinct
  • This will include references to the Australian national standard on ethical conduct in research, and therefore is likely to stop anyone not using appropriate vocab items from receiving HREC (~IRB) approval + AH&MRC approval (additional oversight required for research questions pertaining specifically to Indigenous populations), so adding these concepts to race and ethnicity domain is actually counterproductive
  • The SNOMED codes for Australian Indigenous status are out of date and do not represent best practice in this setting, please do not elevate them to standard concepts
  • There is a concept ‘Neither Aboriginal nor Torres Strait Islander’ included in this typical practice which will be included - it is not considered a negative concept, rather a positive assertion of a negative fact (similar to a negative test result) and therefore must be supported

Details will be forthcoming, pending input from the Indigenous reference group.

As such, please do not conflate Australian Indigenous status with these updates - this will be submitted separately and handled at a chapter level.

Thanks,
Georgie

Hi all
I am still new and try to map multiple race and failed.
I found some concepts like Mixed - White and Black African = 700389 or Mixed - Any other mixed background-700391 at Athena website under domain=race and class=race. But i cannot use them as the race_concept_id in person table because these concepts does not exist in my downloaded vocabulary table-concept. The foreign key stopped me. Shouldn’t I use them? Or my concept table (downloaded several weeks ago) needs some update?
Can anybody help me please?

If you read the thread, you will see it is (and has been for a while) under discussion on what to do about multiple races.

Thank you! @Mark
The discussion is all about how to make changes. My question is to use the current system. I can see these mix race items through Athena but my downloaded vocabulary table - concept did not include these concepts.

Both the codes you quoted are non-standard, so they are not valid types to put into omop. Both of those map to white if you do a non-standard to standard mapping.

Also both are NHS codes, you may not have end user licensing for said set. I am American, so I have not idea as we do not use them.

In Athena, if you select the standard concept, you will see there are no current valid mixed race concept.

image

@Mark
NHS probably is the reason why my download vocabulary table-concept did not include them because i am in united states too. The other non-standard race concept like 8522-Other Race are all in the concept table.
Thank you.

Friends:

Right now, we don’t have a viable solution outside the US. So, no need to keep bringing this up. We have two proposals: Melanie’s and Jake’s (see above). We should discuss them and make a decision. Either one will solve the international and the mixed race situations.

@aostropolets will make this the agenda of one of the upcoming WG meetings. It will start at 9 am Eastern, so our Australian and Singaporean friends can participate.

The assumption that it only seems to is based on an assertion that apparent cultural differences in syndromes are illusory. A different assumption is that mental health are influenced by and characterized by more than biology. I.e. that those differences are real rather than illusory and due in part to the real impact of culture.

The question, I think, is whether there are useful definitions that reflect the influence of culture in addition to biology. I think there are. If there are, a culture-bound entity could have a persistent identifier that uniquely resolves to a meaning that includes that cultural component. That definition isn’t for subset of people. It is for everyone. It just includes the cultural context as a component of the definition. But it is defined in a way that everyone understands and can use to refer unambiguously to a culture-bound entity.

I do not disagree with you, but how does one do this? Let me give you an example of what our organization faces:

Our headquarters is in East Tennessee, which culturally is Appalachian and we have clinics that are in Memphis, which culturally is southern. The culture, that our headquarters is in, shares more with Nova Scotia than it does with our Memphis clinics yet if you ask the common person here, in the headquarters area(I do this often as I am a very curious person), they will tell you that they are southern.

In this case, self reporting would skew the data set yet how many providers have the training, time and patience to ask the correct questions to make this determination?

I think this ask goes beyond the auspices of OMOP.

Let me repeat: The culture can have all the effect it wants. I am not debating that. The question is whether the definition of a condition is dependent on the culture. Like “Hispanic”, as derived from Spanish or Portugese culture and heritage, has a very different implication in Europe and the US. In conditions, there may be debates over what conditions there are, and how they should be defined, and the different schools fall on different sides of country borders, but once you declare one definition you are clean globally.

Maybe you have an example in mind that would illustrate your idea. But honestly, I sincerely hope not, because our global OHDSI network depends on the ability to refer to clinical facts in an unambiguous way. If diseases became as wishy washy as races and ethnicities we’d deprive ourselves of the very substrate that lets us generate evidence.

Great! I need ~10 minutes to present my solution. Let’s get this on the agenda. This topic needs a conclusion.

2 Likes
t