OHDSI Home | Forums | Wiki | Github

Multiple death records from different sources

That makes sense. For the All of Us program we might be getting death data from various sources beyond the EHR (i.e. coordinators reaching out the participant family, CMS, state death registry, etc.). The issue we have is some of this data we might now have much control over to say which one is better, but I guess you are right @Christian_Reich, we need to create some type of rule to provide to users.

Not ‘All of Us’ know where the death data comes from, we either have it or we don’t and there is no way to verify if it is correct.

Pun intended. :slight_smile:

1 Like

Understood, but you can’t kick the can down the road to the analyst. That poor wretch has even less context to work with. So, push back against your sources and make them pick a date. If they can’t, have your own heuristic.

I should have sent that as a private message to Cukartik; my comment was more directed at the AOU process and not important for general OMOP; I apologize for the confusion.

I agree with your statement; not knowing if the data is precise or not, I would like to exclude it altogether.

At Stanford, in addition to EHR death data, we are getting LADMF death data, and will soon be getting CDPH-VR data. Our analysis has shown that SSA data can often be unreliable. While I understand that a patient can only have one death record per person, we have folks here who would like to know the dates reported via other agencies. We are proposing that for our internal OMOP, we may add some extra columns for the different sources. We are curious to know if other in the community have done that or used some other strategy.

@Christian_Reich @cukarthik @MPhilofsky @hiro-mishima

#death-data #vocabulary-users #cdm-builders

In our EDW we store multiple sources of death data (EHR, Cancer Registry, NDI, SSA, etc.), but for our research data sets we have a “derived date of death” algorithm that ranks sources by quality and delivers the “best” death date. This is the death date we are storing in OMOP.

Hello @PriyaDesai,

In Colorado we use a heuristic to select what we believe to be the most accurate death date (subject to change at any time :slight_smile: ). We have quite a few expansion columns in our OMOP CDM, but additional death date isn’t one of them. However, if we were to store multiple dates, we would add an additional column for each death date source so we can keep a single row per person and then name them <DataSourceName>_death_date_x and store the death date from that data source in the column.

1 Like

@jmethot: That is really interesting- Can you tell us a little more about your “algorithm” -happy to set up a quick call as we are hoping to do something similar.

1 Like

Here are the field descriptions from our data guide (we also deliver death dates from each of the mentioned sources):

Indicator that the patient has died based on all data sources, including Epic, historical systems, and NDI data. Y indicates that the patient is dead, and N indicates that the patient is alive.

The date of death for the patient based on all available sources, including the NDI, the Epic date of death, and historical systems date of death. If an NDI death date exists for the patient, it is used to populate the field; else if an Epic date of death exists, it is used; else if the Historical date of death exists, it is used. If none of these are populated, the field is blank.

Curious, for Epic death data, what are the requirements for finding a person has died?

@MPhilofsky if you’re asking me: that’s complicated. My understanding is there are several ways death dates get into Epic. Two and a half that I know of: entered by care team when they have direct knowledge; entered by patient registration when informed by family; populated by disease center registry teams who monitor/search obituaries to maintain their registries (less sure about that path). Perhaps also when patient dies while an inpatient?

I should have asked the question differently. From the Epic data, what are your requirements for finding a person has died? For our EHR data, we use a combination of the date of death field and a person’s vital status.

First, no guarantee the death data is accurate, which you know already. We miss about 50% of deaths. Since it isn’t really a formal requirement in EHRs, our OMOP extract leverages any information we can get, mostly focusing on date of death but including vital status. As I recall, we needed some kind of death date so we made some estimates. We also incorporated other death information from a second source and then used this to refine our algorithm. In this case, though, we made an additional table to compare our results and ended up refining our death assessments. The data quality around death is very frustrating and so we encourage people to think about the purpose when using it - if the person might be dead: don’t recruit them for trials; if you are calculating mortality, especially in relation to an exposure, please leverage the higher quality data.

1 Like

@David_Dorr: What’s the problem with the death data? The fact itself, or the date? If the latter, by how much does it deviate in your estimation?

Reason I am asking is this: Mortality is usually measured in many months to years. So, a couple days earlier or later won’t make a difference. Even years. But if the data are completely made up then it would be useless indeed.

thanks- so HYBRID_DEATH_DT is a coalesce?

@MPhilofsky what is a person’s vital status?

@PriyaDesai - I don’t know if our DBAs are using an actual COALESCE function but it sounds like it would have the same effect.


The vital status is whether a person is alive or deceased. I think the field is named status in our EHR. And it is always populated with either Alive or Deceased. No null values

We also retain records of multiple sources of death (e.g. EHR, SSA, data aggregator), and apply logic to compute the best known death from various sources and have that be the single data of death in OMOP. It isn’t always the earliest date recorded, as there are different levels of trust across the various sources.

I want to caution about the idea of using the below approach:

Within our EHR, if a person becomes an organ donor or has an autopsy, those clinical events are attached the the deceased person’s record - so can occur well after the date of death. One of the DQD logic options is to only flag death date as suspicious if there are data attached to a person more than 60 days after recorded date of death.

Please see the following thread for additional discussion:

THEMIS Conventions for Death Table to Allow Multiple Records - Version 5.4 - CDM Builders - OHDSI Forums