OHDSI Home | Forums | Wiki | Github

Dealing with multiple races and other exceptions

At the danger of angering Christian ( I probably will not be able to attend the workgroup), if we are going to have to track demographics, then make a demographics table. It isn’t that hard and would run faster than trying to pull the data out of observations.
For those of us that are using certain EHR’s, all moving it to observations is doing is making the ETL much harder with no gain in functionality. We have ZERO history of demographics of any sort. I am sure that the billing dept. does, but that data is not accessible to us. Kill and fill means that we always loose any demographics that has changed.

1 Like

@Mark:

No anger! This is the good debate we are having.

We have a demographics table, it’s called PERSON. It does have the necessary fields, and they lack timing. Sounds to me like you are a proponent of Jake’s proposal. Make sure you come to the WG session.

:slightly_smiling_face: I know.

As a compromise, yes.

Hi Friends,

We will meet to ratify the proposals today, December 7th, 5pm EST in the Vocab WG subchannel in Teams. The invite went to the members of this thread as well as all members of the CDM WG. Looking forward to coming to a decision! Recording will be posted after the meeting as well for those who can’t make it.

I attempted to join the WG… but it being M$ Teams, of course it crashed my system.

I cannot join a meeting (off site) via teams and the notes are in otter.ai, which is not an approved app(security) that I can use. Can someone either port the meeting notes to a text file or at least give a TLDR;?

Thanks

Absolutely. You can also watch the recording here.

As per the majority voting, the proposal we would like put forward is as follows:

  • Keep one race and ethnicity (if present) in race_concept_id and ethnicity_concept_id in PERSON table
  • Keep all additional races and ethnicities and/or any longitudinal changes in OBSERVATION table. Provenance of the record can be captured through observation_type_concept_id.
  • We do not deduplicate races to allow greater flexibility given lack of consensus in terms. You can add other races and ethnicities to the OHDSI Vocabularies as long as they are not full duplicates of existing races.
  • No flavors of NULL are permitted, as usual. If race is unknown or not reported it is 0.

There are two implications of this proposal for network studies I can think of:

  1. If race/ethnicity is an inclusion criteria, one will have to look in the OBSERVATION table as well
  2. FeatureExtraction relies on the PERSON table fields to compute corresponding proportions for Table 1. If some entries are in OBSERVATION table, these proportions may be imprecise.

Now we need further feedback from the community.
Please let the Vocab WG know if:

  • You have a research that is impossible to carry with this model,
  • You have/know of tools, queries, scripts or else that will not work with this approach,
  • You have other concerns
1 Like

We need the THEMIS rule how to select race which goes to PERSON table in case of multiple races.
I heard two approaches:

  1. Melanie proposed to simply put ‘Multiracial’ there.
    I liked this approach, but it required vocab team involvment.
    Is there concensus on it and is it going to be added in vocabs as a new concept?

  2. Put the latest race - I guess the latest that maps to standard.
    With the most recent, the races will change quicker than if we got the mode value and the accidental errors will be more pronounced. But on the flip side it is easier to implement and it will capture the latest trends.Somebidy noted that this approach goes with other rules for person fields (as the latest being the correction was made intentionaly).
    Is there consensus on ‘the latest race’ approach or does it still in the discussion/approval stage?

At least we do not have to worry about messing with the demographics table as all the non primaries will go into the junk drawer, as someone calls it.

We only have demographics info, so we will be doing neither, but we do have an orthogonal listing of what the patient declares is the race/eth of said patient, so we know what is to be put in the primary (person) listing. My question is, when do we date the other listings?
What we have is demographics data, there is no date associated with it. As far as our EHR is concerned, all demographics have no associated date but internally are used with the entry date or birth date; that is totally unrealistic for longitudinal data (which is one reason I fought so hard against what was done).

Isn’t that the rule for Observations? The date is the date of the observation record?

Sorry, Entry date of the time the demographics entry was created, which is usually the patient’s first encounter in our system. We have no history of any demographics field, we don’t know when the field was entered/modified, just when the initial record was created and the last modification ts of ~to~ the record, which again, we don’t know what was modified; many times it is a background process that triggers the modify timestamp; useless information for studies but valid information for internal reporting.

Demographics is a stateless table for us. I will put a fake timestamp on it, just tell me what fake timestamp is wanted.

Edit: from the CDM
Depending on the structure of the source data, this may have to be determined based on dates. If an OBSERVATION_DATE occurs within the start and end date of a Visit it is a valid ETL choice to choose the VISIT_OCCURRENCE_ID from the visit that subsumes it, even if not explicitly stated in the data. While not required, an attempt should be made to locate the VISIT_OCCURRENCE_ID of the observation record. If an observation is related to a visit explicitly in the source data, it is possible that the result date of the Observation falls outside of the bounds of the Visit dates.

I can choose:
A. the very first visit date,
B. the very last visit date
C. The birth date of the person
D. 1900-01-01

I like D as it would allow tools to know that there is no match in system to keep from creating false positive linkage.

This is a flavor of null, which I think we should avoid.

It is not a flavor of null; null tells one nothing.
I can just leave this out of my mapping; that is much better than putting false data into the system.

Mark:

If you don’t have a time stamp for this type of information, which is probably going to be very common, I would put in A, or even better, the observation_period_start_date where this thing belongs. Is that date useful? No, but that’s the nature of the problem. For many people, race and ethnicity is a static piece of information. I personally don’t expect to see any changes in the rest of my life. It is the flip side of for supporting use cases where people find it dynamic.

I hate to re-hash old discussions but this does strike me the same way as how ‘history of’ information is put into the CDM. Some people say put the history of obs at the start of their observation period. My personal opinion is that the cdm maintains a record of information about when things were observed, not when they actually happened. Some things are very close to the same dates (like drug exposures and procedures) but other things are not. My question is that is the CDM supposed to make best guess about when the thing actually happened, or just record the observations as they are known?

I would lean to the latter because medical decision making may be done based on the information available at the time and so you might have someone who had a latent disease for a very long time, but only discovered on a certain date, I would argue that the date it was observed is when they may make different medical decisions so we should reflect that in the model. On the other hand, I can understand the argument where you want to describe the actual date of existence to get more accurate results. But I’m not sure we can always define the actual date, but we can definitely get the date when observed, so seems like the former is the more consistent way to go.

So, apologies for derailing the convo on this, but it feels like the same re-occurring problem: what are we trying to represent, and can we represent all information in the same fidelity (actual vs. observed in this case). In the case of other demographics, it seems there’s a shift from things that can be observed over time (ethnicity) and things that are fixed for all time (your birth date). I was a little disappointed that race and ethnicity was lumped into the same bucket because I feel like one is considered fixed while another is possibly considered a social context that can change over time, thus looking at 2 different solutions for 2 different sorts of data problems. But, I respect that the time for that debate is over, but the theme of this type of challenge remains.

1 Like

@Chris_Knoll:

I think you are head-on with this. But the question is not whether to change the observation_start_date from when it was observed to some arbitrary date, but what date do we put in if the source data has no date at all for some fact. Which will be the case with race and ethnicity data very often. And in that case you could have some heuristic on when that information was perhaps collected. In case of claims probably at the beginning of enrollment, which is the beginning of the Observation Period.

I think (may be mistaken) that the next step with conventions is to pass it over to the Themis WG for discussion/ratification. @MPhilofsky could you please advise on the process?

1 Like

Yes, @aostropolets, Themis can give input. I am a little confused. I thought this issue was already voted on during the special Vocab meeting in December 2023. However, you state above you have voted on putting forth a proposal.

So, is this a proposal or has this been decided upon as the solution?

Sorry for confusion! We already voted and selected an option (one of the proposals) that the Vocab WG recommends. I looked into the process diagram here GitHub - OHDSI/Themis: Repository for OMOP CDM conventions as defined by THEMIS. These can be reference lists of concepts, pieces of standardized code for data generation or quality certification, and debates. and inferred that the option we decided upon should be passed to Themis as a proposal thus I called it a “proposal”. Will be happy to frame it as needed and follow any appropriate steps.

Got it!

The following will need clarification:

What date to use when there isn’t a date for the source record?

And this needs guidance:

Themis is in the process of finalizing the “Themis convention template”. Once we approve the final draft, I will post it to the Themis GitHub and ping you the location of the template. You’ll fill it out with the details of the “multiple race and other exceptions” proposal, submit it (location also TBD), and then Themis will “officially” ratify it and add additional metadata, if necessary. Stayed tuned!

1 Like
t