OHDSI Home | Forums | Wiki | Github

Dealing with multiple races and other exceptions

We need the THEMIS rule how to select race which goes to PERSON table in case of multiple races.
I heard two approaches:

  1. Melanie proposed to simply put ‘Multiracial’ there.
    I liked this approach, but it required vocab team involvment.
    Is there concensus on it and is it going to be added in vocabs as a new concept?

  2. Put the latest race - I guess the latest that maps to standard.
    With the most recent, the races will change quicker than if we got the mode value and the accidental errors will be more pronounced. But on the flip side it is easier to implement and it will capture the latest trends.Somebidy noted that this approach goes with other rules for person fields (as the latest being the correction was made intentionaly).
    Is there consensus on ‘the latest race’ approach or does it still in the discussion/approval stage?

At least we do not have to worry about messing with the demographics table as all the non primaries will go into the junk drawer, as someone calls it.

We only have demographics info, so we will be doing neither, but we do have an orthogonal listing of what the patient declares is the race/eth of said patient, so we know what is to be put in the primary (person) listing. My question is, when do we date the other listings?
What we have is demographics data, there is no date associated with it. As far as our EHR is concerned, all demographics have no associated date but internally are used with the entry date or birth date; that is totally unrealistic for longitudinal data (which is one reason I fought so hard against what was done).

Isn’t that the rule for Observations? The date is the date of the observation record?

Sorry, Entry date of the time the demographics entry was created, which is usually the patient’s first encounter in our system. We have no history of any demographics field, we don’t know when the field was entered/modified, just when the initial record was created and the last modification ts of ~to~ the record, which again, we don’t know what was modified; many times it is a background process that triggers the modify timestamp; useless information for studies but valid information for internal reporting.

Demographics is a stateless table for us. I will put a fake timestamp on it, just tell me what fake timestamp is wanted.

Edit: from the CDM
Depending on the structure of the source data, this may have to be determined based on dates. If an OBSERVATION_DATE occurs within the start and end date of a Visit it is a valid ETL choice to choose the VISIT_OCCURRENCE_ID from the visit that subsumes it, even if not explicitly stated in the data. While not required, an attempt should be made to locate the VISIT_OCCURRENCE_ID of the observation record. If an observation is related to a visit explicitly in the source data, it is possible that the result date of the Observation falls outside of the bounds of the Visit dates.

I can choose:
A. the very first visit date,
B. the very last visit date
C. The birth date of the person
D. 1900-01-01

I like D as it would allow tools to know that there is no match in system to keep from creating false positive linkage.

This is a flavor of null, which I think we should avoid.

It is not a flavor of null; null tells one nothing.
I can just leave this out of my mapping; that is much better than putting false data into the system.

Mark:

If you don’t have a time stamp for this type of information, which is probably going to be very common, I would put in A, or even better, the observation_period_start_date where this thing belongs. Is that date useful? No, but that’s the nature of the problem. For many people, race and ethnicity is a static piece of information. I personally don’t expect to see any changes in the rest of my life. It is the flip side of for supporting use cases where people find it dynamic.

I hate to re-hash old discussions but this does strike me the same way as how ‘history of’ information is put into the CDM. Some people say put the history of obs at the start of their observation period. My personal opinion is that the cdm maintains a record of information about when things were observed, not when they actually happened. Some things are very close to the same dates (like drug exposures and procedures) but other things are not. My question is that is the CDM supposed to make best guess about when the thing actually happened, or just record the observations as they are known?

I would lean to the latter because medical decision making may be done based on the information available at the time and so you might have someone who had a latent disease for a very long time, but only discovered on a certain date, I would argue that the date it was observed is when they may make different medical decisions so we should reflect that in the model. On the other hand, I can understand the argument where you want to describe the actual date of existence to get more accurate results. But I’m not sure we can always define the actual date, but we can definitely get the date when observed, so seems like the former is the more consistent way to go.

So, apologies for derailing the convo on this, but it feels like the same re-occurring problem: what are we trying to represent, and can we represent all information in the same fidelity (actual vs. observed in this case). In the case of other demographics, it seems there’s a shift from things that can be observed over time (ethnicity) and things that are fixed for all time (your birth date). I was a little disappointed that race and ethnicity was lumped into the same bucket because I feel like one is considered fixed while another is possibly considered a social context that can change over time, thus looking at 2 different solutions for 2 different sorts of data problems. But, I respect that the time for that debate is over, but the theme of this type of challenge remains.

1 Like

@Chris_Knoll:

I think you are head-on with this. But the question is not whether to change the observation_start_date from when it was observed to some arbitrary date, but what date do we put in if the source data has no date at all for some fact. Which will be the case with race and ethnicity data very often. And in that case you could have some heuristic on when that information was perhaps collected. In case of claims probably at the beginning of enrollment, which is the beginning of the Observation Period.

I think (may be mistaken) that the next step with conventions is to pass it over to the Themis WG for discussion/ratification. @MPhilofsky could you please advise on the process?

1 Like

Yes, @aostropolets, Themis can give input. I am a little confused. I thought this issue was already voted on during the special Vocab meeting in December 2023. However, you state above you have voted on putting forth a proposal.

So, is this a proposal or has this been decided upon as the solution?

Sorry for confusion! We already voted and selected an option (one of the proposals) that the Vocab WG recommends. I looked into the process diagram here GitHub - OHDSI/Themis: Repository for OMOP CDM conventions as defined by THEMIS. These can be reference lists of concepts, pieces of standardized code for data generation or quality certification, and debates. and inferred that the option we decided upon should be passed to Themis as a proposal thus I called it a “proposal”. Will be happy to frame it as needed and follow any appropriate steps.

Got it!

The following will need clarification:

What date to use when there isn’t a date for the source record?

And this needs guidance:

Themis is in the process of finalizing the “Themis convention template”. Once we approve the final draft, I will post it to the Themis GitHub and ping you the location of the template. You’ll fill it out with the details of the “multiple race and other exceptions” proposal, submit it (location also TBD), and then Themis will “officially” ratify it and add additional metadata, if necessary. Stayed tuned!

1 Like

@MPhilofsky Do you know when a decision is expected?

We’ve also had the need to support multiple races with healthcare disparities research. Ideally, I’d prefer any and all races supported in PERSON table to treat them equally. The concern of colleagues and myself is that patients with multiple races may have those other than the first missed if in another table.

Concur with @Agnes_Wojciechowski further information is needed how transformations will be made given race (and their coding) populates PERSON table.

There are a few scenarios that may be seen with multiple races:

  1. One race is indicated for a patient populating PERSON Table.
    At a later point a second race is indicated. Is the expectation the second will populate Observation Table?

  2. One race is indicated for a patient populating PERSON Table. At a later point, the patient changes their race. Will the more current race replace the existing one in PERSON Table and not populate Observation Table as it’s an update or correction to information?

  3. Two or more races are indicated for a patient. Which race populates PERSON Table and which populates Observation Table (if that is the decision made)?

  4. A race of Unknown is recorded in the PERSON Table and later the actual race is provided. A variation on the update/correction scenario in #2, whereby the PERSON Table race needs to be updated with the race and not added as a second race in Observation Table.

I haven’t been privy to previous discussions, so feel free to point me to the decisions related to these.

Also interested in understanding when folks may start implementing the solution supporting multiple races.

Thank you,

Andrea

Dropped the ball here, picking up thanks to Andrea’s post.

What date to use when there isn’t a date for the source record?

Use date of birth (I really don’t see what else can you do if you do not have a date in your source).

We need the THEMIS rule how to select race which goes to PERSON table in case of multiple races.

I think we said the ETLers decide and document: the earliest, the latest or the one they trust the most given their knowledge of data. The community seem to have strong opinions to force one way or the other.

Themis is in the process of finalizing the “Themis convention template”.

Would be happy to fill. Thanks for the support and all the work :slight_smile:

Filling you in on what’s been decided by community voting:

You put one race in PERSON, the rest - in OBSERVATION. Which - you decide. Only exception is case #4 where you replace unknown race with known race. Thy don’t like flavors of NULL here :slightly_smiling_face:

Of course the researchers may miss other races and hopefully good ETL specs and conventions that say “Also look in OBSERVATION” will help. The thread above provides more reasons and thoughts on why community didn’t vote for modifying PERSON table (such as it would break all of the tools).

Thanks @aostropolets Wanted to make sure I didn’t miss any discussions in the thread as I saw support for both approaches.

It’d be good practice to treat each race the same way and why I favored the PERSON table. With the Observation approach, it appears one race would have an associated date and one may not.

@Christian_Reich asked for use cases.

  1. Similar to Andrew’s, there are are variety of research projects looking at patient outcomes related to race. It depends on the type of research. Some of these are related to Social Determinants of Health too. Others are related to disease prevalence in certain races. Others yet are related to genetic changes that are more prevalent in one race or another.

  2. Genetic testing for genes associated with one’s heritage are common.

  3. The NIH has acknowleged a lack of diversity in a number of studies and is encouraging studies to be performed on more diverse populations. Thus researchers may search OMOP data sets to determine if they have sufficient numbers of patients with X race, Y race, Z race in their study population/institution or do they need to recruit patients from multiple academic centers/sites?

I also favor a post coordinated approach treating race and ethnicity of distinct items. They are collected separately (in the US) and usually mapped to different codesystems depending on the country. The CDC code system Davera mentioned would be used in the US, but other countries could map to their designated code system too. Many examples were given of different country needs.

I’m glad to see the group has taken up the topic and working on a solution.

With warm regards,
Andrea

@apitkus:

Thanks for the use cases. The reason race information is spread over 2 locations is pragmatic: multiple race data are rare, and we don’t want to make the relatively rare use case (effect of multiple or changing race designations) easier at the cost of the much more common use case (using race as a single covariate for things). It also slows down computationally if you have to scan the entire OBSERVATION table for each patient.

We did have the debate if instead of tossing more than one race record into OBSERVATION we should pre-coordinate multiple races and keeping them in PERSON. But that idea was deemed infeasible, for the simple reason that nobody has those combinations ready, particularly if you go into fractions (7/16th Asian). Plus, some folks felt there are use cases about the dynamic character of that information (people changing their race designation).

But your use cases should be well covered, even though you need to do the extra step and screen two tables (PERSON and OBSERVATION). Let’s see what evidence you guys can detect.

We shouldn’t use date of birth because the current guidance to create Observation Period from EHR data says use the first event from the data. And unless you have pediatric data, the dob is before the EHRs were used. Using the date of the last visit record might be most accurate since race/ethnicity is usually recorded at every visit and updated accordingly. But we should discuss race & ethnicity conventions after April Olympians. There are many considerations to debate.

@MPhilofsky I referenced the recommendations that the community voted for in the Themis post here: Convention need for how/where to store > 1 race or ethnicity concept_id · Issue #71 · OHDSI/Themis · GitHub.

Would be great to see a Themis convention for this long standing issue soon! Thanks for all your work.

1 Like
t