Dealing with multiple races and other exceptions

Andrew · November 1, 2023, 3:10pm

The assumption that it only seems to is based on an assertion that apparent cultural differences in syndromes are illusory. A different assumption is that mental health are influenced by and characterized by more than biology. I.e. that those differences are real rather than illusory and due in part to the real impact of culture.

The question, I think, is whether there are useful definitions that reflect the influence of culture in addition to biology. I think there are. If there are, a culture-bound entity could have a persistent identifier that uniquely resolves to a meaning that includes that cultural component. That definition isn’t for subset of people. It is for everyone. It just includes the cultural context as a component of the definition. But it is defined in a way that everyone understands and can use to refer unambiguously to a culture-bound entity.

Mark · November 1, 2023, 3:35pm

I do not disagree with you, but how does one do this? Let me give you an example of what our organization faces:

Our headquarters is in East Tennessee, which culturally is Appalachian and we have clinics that are in Memphis, which culturally is southern. The culture, that our headquarters is in, shares more with Nova Scotia than it does with our Memphis clinics yet if you ask the common person here, in the headquarters area(I do this often as I am a very curious person), they will tell you that they are southern.

In this case, self reporting would skew the data set yet how many providers have the training, time and patience to ask the correct questions to make this determination?

I think this ask goes beyond the auspices of OMOP.

Christian_Reich · November 2, 2023, 1:55am

Let me repeat: The culture can have all the effect it wants. I am not debating that. The question is whether the definition of a condition is dependent on the culture. Like “Hispanic”, as derived from Spanish or Portugese culture and heritage, has a very different implication in Europe and the US. In conditions, there may be debates over what conditions there are, and how they should be defined, and the different schools fall on different sides of country borders, but once you declare one definition you are clean globally.

Maybe you have an example in mind that would illustrate your idea. But honestly, I sincerely hope not, because our global OHDSI network depends on the ability to refer to clinical facts in an unambiguous way. If diseases became as wishy washy as races and ethnicities we’d deprive ourselves of the very substrate that lets us generate evidence.

MPhilofsky · November 3, 2023, 4:03pm

Great! I need ~10 minutes to present my solution. Let’s get this on the agenda. This topic needs a conclusion.

Mark · November 3, 2023, 5:59pm

I have looked at the proposals briefly, but without a deep reading. Would someone do a simple TLDR; of the differences of the two proposals, please?

DaveraG · November 3, 2023, 7:16pm

HL7 FHIR uses the “CDCREC” (CDC Race & Ethnicity Codesystem) as the value set that would meet the requirement @MPhilofsky describes above. Today … (there’s another version on the way) that can be found here: Code System Details I highly suggest leveraging this content as this is what users of FHIR will have at-the-ready and it would be super tedious to have to chase down any codes in that set that are used in a FHIR implementation & not represented in the OMOP Vocabs. Please consider using this content.

Christian_Reich · November 5, 2023, 1:00am

No question, @DaveraG. But the problem we have is not getting one good value set. The problem is we have many, since they are projections to the societies they are made for, in this case. the USA. As I tried to explain before, seemingly same races and ethnicities have very different implications with respect to access to resources and participation in different countries, and are therefore not the same. We need a solution that will work in the US, UK, South Africa and Australia. And all other places.

Love it. Didn’t you know that pre-canned industrialized food is bad for you, and you should eat FIBERS?

Ok. The ultrashort version: We:

Create a union of all race and ethnicity concepts anybody brings up, and slap a combined “Race/Ethnicity” domain_id on them.
Don’t deduplicate, unless they are from the same source.
Don’t allow flavors of null (unknown, don’t want to tell, etc.) or negatives.
Build no hierarchy.

Now the split. In the “Jake” proposal, we:

Allow mixed race and ethnicity concepts, again, based on demand (rather than a cartesian product).
Put the concepts into the PERSON table, one each into the existing race_concept_id and ethnicity_concept_id, according to the preference of the source data.
Collect no timing information. Races and ethnicities are static (even though recorded at different times).

In the “Melanie” proposal, we:

Don’t allow mixed race and ethnicity concepts (we might have to split some existing mixed concepts up).
Place concepts into the OBSERVATION table as value_as_concept_id, with the observation_concept_id = “Has race/ethnicity”, allowing multiple records (for mixed races or people changing their mind), each with a time stamp.
Remove the race_concept_id and ethnicity_concept_id fields from PERSON.

Did I get that right?

Heidi_Schmidt1 · November 6, 2023, 3:33pm

What I am finding is that there are multiple entries for a patients race as the encounter (visit) lends itself to cataloging each time there is an encounter (visit occurance).
I had to create a “winning record” of each patients race to normalize across encounters.

My question pivots off to ask –
Why is the table OMOP PERSON not a dimension look up table?
The way I read it - is that for each person - if there is a different location and care site for a person then there are multiple entries. One for each person, location, care site combination.

Some unpacking of the above -
The location and care_site related column values are added to an entry for PERSON it means, at least from our data, that we can have many locations per person within one care_site because they can move from one department (location) to another with the hospital (care site)

Example: If a patient has been in three locations within the hospital (ER to Dept 1 then transferred to Dept2) then the OMOP person table will have 3 duplicate entries with the only difference being location where location is a department with the care site hosptial.

Mark · November 6, 2023, 3:54pm

Someone should tell the Maasai… no evil fiber in their diet and look how healthy they are… hmmm .

This was close to what I thought, but I wanted to make sure I understood this before I commented.

I hate the idea of race/ethnicity in the observation table. One either has a race and ethnicity or one doesn’t; there is no middle ground. Either put it in the person table to drop race and ethnicity from OMOP altogether.

Chris_Knoll · November 6, 2023, 4:08pm

Sorry to interject but I would like to ask:

Can we separate race and ethnicity as they seem to be 2 very different concepts. I don’t know if this is an authoritative source but it seems that race seems to be something you are born with and is immutable, while ethnicity is described as “cultural identity, chosen or learned from your culture and family”. So the former seems to be something you’d attach to the Person as it is something that is immutable and comes from your biology (such as your birth date), while ethnicity is something that is learned and potentially changed over time…so wouldn’t the learned/change-over-time thing be more appropriate to put in observation while the thing that you are born as goes in person?

Alternatively, is it possible to just drop race entirely if we can perform our analytical use cases using ethnicity (or race over ethnicity?)

aostropolets · November 6, 2023, 6:24pm

Sounds like a good discussion for a Vocab WG (among other places). How should we go about it? I’m looking at, say, Dec 12, but our regular time (noon EST) may not work for everybody.

@gkennos what’s your time preference (I think others are US based but correct me if I’m wrong)? We can do an off-cycle call

Mark · November 6, 2023, 6:24pm

This seems to be the same as Andrew Williams request above.

I would argue, as I believe that epigenetics and most the behaviorist would agree, that by time of adulthood, these characteristics are set. I am neither, so I say believe; I just read a lot.

Edit: Even in my terse reply, I am finding problems with it. What is adulthood? That varies from source to source and culture to culture. I do not say there is no validity to studying what culture has on health, but OMOP is not the correct vehicle for that.

DaveraG · November 6, 2023, 9:38pm

Re: selection of a content set… as you know both SNOMED and HL7 deal with this by declaration of “realms / realm-specific content” (HL7) or “editions” (SNOMED). As such, it is not inconceivable to parse context of use by geographic or political boundaries when curating standards and vocabularies.

I do not differ with your assertion:

“We need a solution that will work in the US, UK, South Africa and Australia. And all other places”

… but you are arguing with yourself in that you also state that to do so is impossible since these locales have different meanings & use cases. This later clarification is indicative of a need to declare the region / realm etc as HL7 and SNOMED do.

Also: I was asking to please leverage ther work in HL7 an INCLUDE the CDCREC. I am sorry if I was not being clear and that appeared that I was indicating that was the all-singing-all-dancing value set. That is a very easy go-to for the (in HL7 parlance:) US realm

Christian_Reich · November 6, 2023, 9:46pm

Friends:

Whether or not race or ethnicity is static or changes over time is a good debate, but not one we should have here. There are even people who claim that those things don’t exist at all and shouldn’t be emphasized. Since there is no consensus, and since there are no objective definitions for any of the concepts, we may want to create a model that can accommodate everything. Melanie’s proposal does that.

On the other hand, we need the data to be available for use cases. I haven’t seen use cases that demand these things be flexible over time. A static model like Jake’s may provide what we need, plus is computationally more favorable and backwards compatible. (The latter is very important if you remember the fate of CDM Version 6.0 and it’s death due to it’s incompatibility with the existing tech stack and its inefficiency.)

Come to the meeting and we will decide. We can’t really vote due to lack of the denominator, but we always find compromise.

DaveraG · November 6, 2023, 9:54pm

I’d like to step out of the topic into process for just a moment and make a comment about “moving discussions” between the live WG meetings and onto the forum & vice versa. In last week’s Vocab meeting and other calls, we have been asked (for lack of time) to move our discussions to the forums, and here we are being asked to move our discussion back into a live one hour meeting…

I respect that in these cases there is an acknowledgement when one format or another has exceeded its usefulness, and thus redirection. This is very good leadership practice, imo. What is lacking when there is (too) much discussion, is a clearcut means to move forward and remain in partnership with an engaged (if also:large) community.

Could it be that the OHDSI community has outgrown live discussion & forums for topics where many voices wish to be heard? Is there an alternative to shuffling between live meetings and the forums (& back) for those times when neither result in consensus?

aostropolets · November 6, 2023, 10:49pm

That’s a very good question. Do you have any tips from the FHIR community?

I see live discussions as more productive and comprehensive, especially when we can achieve a better quorum (last time we had a whooping 56 people attending the call). It is a good way to spread the information and gather initial opinions.

Next, all the decisions are documented in the notes so that people can revisit and comment some more (this is where we see a drop in participation - very few if any tend to continue the discussions beyond the WG). This is where Forums step in as they allow for asynchronous conversations.

Finally, we get together again if more discussion is needed (just a reminder: we will come back for remaining SNOMED items on the next WG call). We issue final notes and proceed with them as our small “conventions”.

This is a big step on increasing transparency of Vocabulary operations.

What makes it harder is that Vocabularies potentially impacts a lot of users in the community, so we want to make sure that as many as possible know about the processes/changes and can voice their opinion AND that the process doesn’t take months as we cannot afford to stop Vocabularies maintenance. I wonder if @clairblacketer, @MPhilofsky, @Patrick_Ryan or others have any thoughts on the topic.

MPhilofsky · November 7, 2023, 12:59am

Mostly

You forgot to mention my solution allows for provenance of the records via the observation_type_concept_id field.

And now is a good time to mention my implementation plan, found on slides 8 - 12.

Before we introduce breaking changes to the CDM and remove the race & ethnicity concept_ids from the Person table, I suggest we make a convention to encourage and allow the use of observation_concept_id = “Has race/ethnicity” to the Observation table. To allow these data to co-exist in both tables until next major/breaking change CDM release. Yes, this will denormalize the CDM, however, it will give us some time to test drive this solution and update cohort definitions before going all in with removal of these data from the Person table. I spoke to @Chris_Knoll at the Symposium and he doesn’t have any concerns about this change for Atlas. Chris suggested I talk to @schuemie, so I pitched it to him. Cohort definitions will have to be updated. Clear and concise documentation on how to ETL the data and how to use the data will be given by Themis & the CDM WG.

Once ETLers have implemented this change, we will need feedback from them on 1. Were the instructions on ETLing these data clear? 2. What are your pre & post change mapping rates? 3. What’s still not mapping?. Next, we’re going to need feedback along the same vein from the analysts: 1. Which use cases now work? 2. Which use cases don’t work? 3. What’s missing?

I am coming at this from the Themis point of view with a strong Health System Interest Group influence. I’d like all of us to keep Themis’ mission statement in mind as we discuss this topic, “Themis makes decisions for the good of the whole community. We must compromise. We can always revisit and modify the convention. Don’t let perfect be the enemy of great. And interoperability between different OMOP CDMs is great!”. I’ll admit, it’s a little cheesy, but we really need the community to follow the standards. You can always add additional fields to your CDM, but need to populate the CDM as expected or we can’t do federated research. And we must comprise, agree to disagree, and move forward. The race topic has been going in circles and infinite loops for years.

With this in mind, I propose we defer the flavors of NULL (unknown, not answered, etc.), hierarchies, and negative values to a future iteration unless there is a strong use case. These items will be easy to add in later, if needed. Let’s use the data with the new concept_ids, run it through some use cases and research, identify areas needing improvement, regroup after running it through the rounds, and then make a plan. Let’s keep it simple and pragmatic for our first implementation.

To echo what @aostropolets said, regardless of which proposal or combination of proposals the community adopts, we need to broadcast to all including: OHDSI chapter leads/WGs, those about to ETL their data, those using the CDM including secondary research groups N3C, All of Us, etc.

Since this is such a huge change and will affect many in the OHDSI community from the ETL through the pipeline to the researchers and the tools used, once a decision has been made, I suggest we form a sub-working group to document and implement the change requested by the community.

Agnes_Wojciechowski · November 7, 2023, 2:36pm

I want to voice my support for Melanie’s proposal.

Keeping race ethnicity on Person seems to cause problems with interpretation:

ETL’er will need to decide which race to favor (first recorded in our period, most recent, most frequent? )
the analyst looking at OMOP needs to know what convention was used.

If the research focus is on race, not singling our one of many entries gives more flexibility on methodology used.

Adopting the proposal will lessen burden of custom mapping race cobinations.
The tools impact consideration is important and needs to be addressed.

Mark · November 13, 2023, 1:57pm

At the danger of angering Christian ( I probably will not be able to attend the workgroup), if we are going to have to track demographics, then make a demographics table. It isn’t that hard and would run faster than trying to pull the data out of observations.
For those of us that are using certain EHR’s, all moving it to observations is doing is making the ETL much harder with no gain in functionality. We have ZERO history of demographics of any sort. I am sure that the billing dept. does, but that data is not accessible to us. Kill and fill means that we always loose any demographics that has changed.

Christian_Reich · November 13, 2023, 2:06pm

@Mark:

No anger! This is the good debate we are having.

We have a demographics table, it’s called PERSON. It does have the necessary fields, and they lack timing. Sounds to me like you are a proponent of Jake’s proposal. Make sure you come to the WG session.