OHDSI Home | Forums | Wiki | Github

Dealing with multiple races and other exceptions

The assumption that it only seems to is based on an assertion that apparent cultural differences in syndromes are illusory. A different assumption is that mental health are influenced by and characterized by more than biology. I.e. that those differences are real rather than illusory and due in part to the real impact of culture.

The question, I think, is whether there are useful definitions that reflect the influence of culture in addition to biology. I think there are. If there are, a culture-bound entity could have a persistent identifier that uniquely resolves to a meaning that includes that cultural component. That definition isnā€™t for subset of people. It is for everyone. It just includes the cultural context as a component of the definition. But it is defined in a way that everyone understands and can use to refer unambiguously to a culture-bound entity.

I do not disagree with you, but how does one do this? Let me give you an example of what our organization faces:

Our headquarters is in East Tennessee, which culturally is Appalachian and we have clinics that are in Memphis, which culturally is southern. The culture, that our headquarters is in, shares more with Nova Scotia than it does with our Memphis clinics yet if you ask the common person here, in the headquarters area(I do this often as I am a very curious person), they will tell you that they are southern.

In this case, self reporting would skew the data set yet how many providers have the training, time and patience to ask the correct questions to make this determination?

I think this ask goes beyond the auspices of OMOP.

Let me repeat: The culture can have all the effect it wants. I am not debating that. The question is whether the definition of a condition is dependent on the culture. Like ā€œHispanicā€, as derived from Spanish or Portugese culture and heritage, has a very different implication in Europe and the US. In conditions, there may be debates over what conditions there are, and how they should be defined, and the different schools fall on different sides of country borders, but once you declare one definition you are clean globally.

Maybe you have an example in mind that would illustrate your idea. But honestly, I sincerely hope not, because our global OHDSI network depends on the ability to refer to clinical facts in an unambiguous way. If diseases became as wishy washy as races and ethnicities weā€™d deprive ourselves of the very substrate that lets us generate evidence.

Great! I need ~10 minutes to present my solution. Letā€™s get this on the agenda. This topic needs a conclusion.

2 Likes

I have looked at the proposals briefly, but without a deep reading. Would someone do a simple TLDR; of the differences of the two proposals, please?

HL7 FHIR uses the ā€œCDCRECā€ (CDC Race & Ethnicity Codesystem) as the value set that would meet the requirement @MPhilofsky describes above. Today ā€¦ (thereā€™s another version on the way) that can be found here: Code System Details I highly suggest leveraging this content as this is what users of FHIR will have at-the-ready and it would be super tedious to have to chase down any codes in that set that are used in a FHIR implementation & not represented in the OMOP Vocabs. Please consider using this content.

1 Like

No question, @DaveraG. But the problem we have is not getting one good value set. The problem is we have many, since they are projections to the societies they are made for, in this case. the USA. As I tried to explain before, seemingly same races and ethnicities have very different implications with respect to access to resources and participation in different countries, and are therefore not the same. We need a solution that will work in the US, UK, South Africa and Australia. And all other places.

Love it. Didnā€™t you know that pre-canned industrialized food is bad for you, and you should eat FIBERS? :slight_smile:

Ok. The ultrashort version: We:

  • Create a union of all race and ethnicity concepts anybody brings up, and slap a combined ā€œRace/Ethnicityā€ domain_id on them.
  • Donā€™t deduplicate, unless they are from the same source.
  • Donā€™t allow flavors of null (unknown, donā€™t want to tell, etc.) or negatives.
  • Build no hierarchy.

Now the split. In the ā€œJakeā€ proposal, we:

  • Allow mixed race and ethnicity concepts, again, based on demand (rather than a cartesian product).
  • Put the concepts into the PERSON table, one each into the existing race_concept_id and ethnicity_concept_id, according to the preference of the source data.
  • Collect no timing information. Races and ethnicities are static (even though recorded at different times).

In the ā€œMelanieā€ proposal, we:

  • Donā€™t allow mixed race and ethnicity concepts (we might have to split some existing mixed concepts up).
  • Place concepts into the OBSERVATION table as value_as_concept_id, with the observation_concept_id = ā€œHas race/ethnicityā€, allowing multiple records (for mixed races or people changing their mind), each with a time stamp.
  • Remove the race_concept_id and ethnicity_concept_id fields from PERSON.

Did I get that right?

What I am finding is that there are multiple entries for a patients race as the encounter (visit) lends itself to cataloging each time there is an encounter (visit occurance).
I had to create a ā€œwinning recordā€ of each patients race to normalize across encounters.

My question pivots off to ask ā€“
Why is the table OMOP PERSON not a dimension look up table?
The way I read it - is that for each person - if there is a different location and care site for a person then there are multiple entries. One for each person, location, care site combination.

Some unpacking of the above -
The location and care_site related column values are added to an entry for PERSON it means, at least from our data, that we can have many locations per person within one care_site because they can move from one department (location) to another with the hospital (care site)

Example: If a patient has been in three locations within the hospital (ER to Dept 1 then transferred to Dept2) then the OMOP person table will have 3 duplicate entries with the only difference being location where location is a department with the care site hosptial.

Someone should tell the Maasaiā€¦ :no_mouth: no evil fiber in their diet and look how healthy they areā€¦ hmmm :wink:.

This was close to what I thought, but I wanted to make sure I understood this before I commented.

I hate the idea of race/ethnicity in the observation table. One either has a race and ethnicity or one doesnā€™t; there is no middle ground. Either put it in the person table to drop race and ethnicity from OMOP altogether.

1 Like

Sorry to interject but I would like to ask:

Can we separate race and ethnicity as they seem to be 2 very different concepts. I donā€™t know if this is an authoritative source but it seems that race seems to be something you are born with and is immutable, while ethnicity is described as ā€œcultural identity, chosen or learned from your culture and familyā€. So the former seems to be something youā€™d attach to the Person as it is something that is immutable and comes from your biology (such as your birth date), while ethnicity is something that is learned and potentially changed over timeā€¦so wouldnā€™t the learned/change-over-time thing be more appropriate to put in observation while the thing that you are born as goes in person?

Alternatively, is it possible to just drop race entirely if we can perform our analytical use cases using ethnicity (or race over ethnicity?)

Sounds like a good discussion for a Vocab WG (among other places). How should we go about it? Iā€™m looking at, say, Dec 12, but our regular time (noon EST) may not work for everybody.

@gkennos whatā€™s your time preference (I think others are US based but correct me if Iā€™m wrong)? We can do an off-cycle call :slight_smile:

This seems to be the same as Andrew Williams request above.

I would argue, as I believe that epigenetics and most the behaviorist would agree, that by time of adulthood, these characteristics are set. I am neither, so I say believe; I just read a lot.

Edit: Even in my terse reply, I am finding problems with it. What is adulthood? That varies from source to source and culture to culture. I do not say there is no validity to studying what culture has on health, but OMOP is not the correct vehicle for that.

Re: selection of a content setā€¦ as you know both SNOMED and HL7 deal with this by declaration of ā€œrealms / realm-specific contentā€ (HL7) or ā€œeditionsā€ (SNOMED). As such, it is not inconceivable to parse context of use by geographic or political boundaries when curating standards and vocabularies.

I do not differ with your assertion:

ā€œWe need a solution that will work in the US, UK, South Africa and Australia. And all other placesā€

ā€¦ but you are arguing with yourself in that you also state that to do so is impossible since these locales have different meanings & use cases. This later clarification is indicative of a need to declare the region / realm etc as HL7 and SNOMED do.

Also: I was asking to please leverage ther work in HL7 an INCLUDE the CDCREC. I am sorry if I was not being clear and that appeared that I was indicating that was the all-singing-all-dancing value set. That is a very easy go-to for the (in HL7 parlance:) US realm

Friends:

Whether or not race or ethnicity is static or changes over time is a good debate, but not one we should have here. There are even people who claim that those things donā€™t exist at all and shouldnā€™t be emphasized. Since there is no consensus, and since there are no objective definitions for any of the concepts, we may want to create a model that can accommodate everything. Melanieā€™s proposal does that.

On the other hand, we need the data to be available for use cases. I havenā€™t seen use cases that demand these things be flexible over time. A static model like Jakeā€™s may provide what we need, plus is computationally more favorable and backwards compatible. (The latter is very important if you remember the fate of CDM Version 6.0 and itā€™s death due to itā€™s incompatibility with the existing tech stack and its inefficiency.)

Come to the meeting and we will decide. We canā€™t really vote due to lack of the denominator, but we always find compromise.

1 Like

Iā€™d like to step out of the topic into process for just a moment and make a comment about ā€œmoving discussionsā€ between the live WG meetings and onto the forum & vice versa. In last weekā€™s Vocab meeting and other calls, we have been asked (for lack of time) to move our discussions to the forums, and here we are being asked to move our discussion back into a live one hour meetingā€¦

I respect that in these cases there is an acknowledgement when one format or another has exceeded its usefulness, and thus redirection. This is very good leadership practice, imo. What is lacking when there is (too) much discussion, is a clearcut means to move forward and remain in partnership with an engaged (if also:large) community.

Could it be that the OHDSI community has outgrown live discussion & forums for topics where many voices wish to be heard? Is there an alternative to shuffling between live meetings and the forums (& back) for those times when neither result in consensus?

1 Like

Thatā€™s a very good question. Do you have any tips from the FHIR community?

I see live discussions as more productive and comprehensive, especially when we can achieve a better quorum (last time we had a whooping 56 people attending the call). It is a good way to spread the information and gather initial opinions.

Next, all the decisions are documented in the notes so that people can revisit and comment some more (this is where we see a drop in participation - very few if any tend to continue the discussions beyond the WG). This is where Forums step in as they allow for asynchronous conversations.

Finally, we get together again if more discussion is needed (just a reminder: we will come back for remaining SNOMED items on the next WG call). We issue final notes and proceed with them as our small ā€œconventionsā€.

This is a big step on increasing transparency of Vocabulary operations.

What makes it harder is that Vocabularies potentially impacts a lot of users in the community, so we want to make sure that as many as possible know about the processes/changes and can voice their opinion AND that the process doesnā€™t take months as we cannot afford to stop Vocabularies maintenance. I wonder if @clairblacketer, @MPhilofsky, @Patrick_Ryan or others have any thoughts on the topic.

Mostly :slight_smile:

You forgot to mention my solution allows for provenance of the records via the observation_type_concept_id field.

And now is a good time to mention my implementation plan, found on slides 8 - 12.

Before we introduce breaking changes to the CDM and remove the race & ethnicity concept_ids from the Person table, I suggest we make a convention to encourage and allow the use of observation_concept_id = ā€œHas race/ethnicityā€ to the Observation table. To allow these data to co-exist in both tables until next major/breaking change CDM release. Yes, this will denormalize the CDM, however, it will give us some time to test drive this solution and update cohort definitions before going all in with removal of these data from the Person table. I spoke to @Chris_Knoll at the Symposium and he doesnā€™t have any concerns about this change for Atlas. Chris suggested I talk to @schuemie, so I pitched it to him. Cohort definitions will have to be updated. Clear and concise documentation on how to ETL the data and how to use the data will be given by Themis & the CDM WG.

Once ETLers have implemented this change, we will need feedback from them on 1. Were the instructions on ETLing these data clear? 2. What are your pre & post change mapping rates? 3. Whatā€™s still not mapping?. Next, weā€™re going to need feedback along the same vein from the analysts: 1. Which use cases now work? 2. Which use cases donā€™t work? 3. Whatā€™s missing?

I am coming at this from the Themis point of view with a strong Health System Interest Group influence. Iā€™d like all of us to keep Themisā€™ mission statement in mind as we discuss this topic, ā€œThemis makes decisions for the good of the whole community. We must compromise. We can always revisit and modify the convention. Donā€™t let perfect be the enemy of great. And interoperability between different OMOP CDMs is great!ā€. Iā€™ll admit, itā€™s a little cheesy, but we really need the community to follow the standards. You can always add additional fields to your CDM, but need to populate the CDM as expected or we canā€™t do federated research. And we must comprise, agree to disagree, and move forward. The race topic has been going in circles and infinite loops for years.

With this in mind, I propose we defer the flavors of NULL (unknown, not answered, etc.), hierarchies, and negative values to a future iteration unless there is a strong use case. These items will be easy to add in later, if needed. Letā€™s use the data with the new concept_ids, run it through some use cases and research, identify areas needing improvement, regroup after running it through the rounds, and then make a plan. Letā€™s keep it simple and pragmatic for our first implementation.

To echo what @aostropolets said, regardless of which proposal or combination of proposals the community adopts, we need to broadcast to all including: OHDSI chapter leads/WGs, those about to ETL their data, those using the CDM including secondary research groups N3C, All of Us, etc.

Since this is such a huge change and will affect many in the OHDSI community from the ETL through the pipeline to the researchers and the tools used, once a decision has been made, I suggest we form a sub-working group to document and implement the change requested by the community.

3 Likes

I want to voice my support for Melanieā€™s proposal.

Keeping race ethnicity on Person seems to cause problems with interpretation:

  1. ETLā€™er will need to decide which race to favor (first recorded in our period, most recent, most frequent? )
  2. the analyst looking at OMOP needs to know what convention was used.

If the research focus is on race, not singling our one of many entries gives more flexibility on methodology used.

Adopting the proposal will lessen burden of custom mapping race cobinations.
The tools impact consideration is important and needs to be addressed.

1 Like

At the danger of angering Christian ( I probably will not be able to attend the workgroup), if we are going to have to track demographics, then make a demographics table. It isnā€™t that hard and would run faster than trying to pull the data out of observations.
For those of us that are using certain EHRā€™s, all moving it to observations is doing is making the ETL much harder with no gain in functionality. We have ZERO history of demographics of any sort. I am sure that the billing dept. does, but that data is not accessible to us. Kill and fill means that we always loose any demographics that has changed.

1 Like

@Mark:

No anger! This is the good debate we are having.

We have a demographics table, itā€™s called PERSON. It does have the necessary fields, and they lack timing. Sounds to me like you are a proponent of Jakeā€™s proposal. Make sure you come to the WG session.

t