OHDSI Home | Forums | Wiki | Github

Multiple death records from different sources


Our database receives data from multiple sources. Sometimes, various sources send the death records of the same person to us. I understand the death table in CDM v5 (or the person table in CDM v6) is supposed to have only up to one death record per person. What should we do if we want to keep all the death records of a person from different sources?

Tagging: @clairblacketer


Hello @hiro-mishima,

Before asking “how to keep all source data in the CDM”, you should ask “why do we need it”? What is the use case to have two death records? The OMOP CDM is driven by real world use cases.


Nothing is certain in this world, except death (and taxes). Obviously, the patients only die once. The analyst working on @MPhilofsky’s use case has no clue how to interpret the double death. It is the job of the ETL team to figure out which of the death dates to believe more. You could think of different heuristics:

  • The more reliable feed
  • The date after which no new data are coming in
  • The first date, because the second date is just the time stamp of the certificate coming in
  • etc.

That makes sense. For the All of Us program we might be getting death data from various sources beyond the EHR (i.e. coordinators reaching out the participant family, CMS, state death registry, etc.). The issue we have is some of this data we might now have much control over to say which one is better, but I guess you are right @Christian_Reich, we need to create some type of rule to provide to users.

Not ‘All of Us’ know where the death data comes from, we either have it or we don’t and there is no way to verify if it is correct.

Pun intended. :slight_smile:

1 Like

Understood, but you can’t kick the can down the road to the analyst. That poor wretch has even less context to work with. So, push back against your sources and make them pick a date. If they can’t, have your own heuristic.

I should have sent that as a private message to Cukartik; my comment was more directed at the AOU process and not important for general OMOP; I apologize for the confusion.

I agree with your statement; not knowing if the data is precise or not, I would like to exclude it altogether.

At Stanford, in addition to EHR death data, we are getting LADMF death data, and will soon be getting CDPH-VR data. Our analysis has shown that SSA data can often be unreliable. While I understand that a patient can only have one death record per person, we have folks here who would like to know the dates reported via other agencies. We are proposing that for our internal OMOP, we may add some extra columns for the different sources. We are curious to know if other in the community have done that or used some other strategy.

@Christian_Reich @cukarthik @MPhilofsky @hiro-mishima

#death-data #vocabulary-users #cdm-builders

In our EDW we store multiple sources of death data (EHR, Cancer Registry, NDI, SSA, etc.), but for our research data sets we have a “derived date of death” algorithm that ranks sources by quality and delivers the “best” death date. This is the death date we are storing in OMOP.

Hello @PriyaDesai,

In Colorado we use a heuristic to select what we believe to be the most accurate death date (subject to change at any time :slight_smile: ). We have quite a few expansion columns in our OMOP CDM, but additional death date isn’t one of them. However, if we were to store multiple dates, we would add an additional column for each death date source so we can keep a single row per person and then name them <DataSourceName>_death_date_x and store the death date from that data source in the column.

1 Like

@jmethot: That is really interesting- Can you tell us a little more about your “algorithm” -happy to set up a quick call as we are hoping to do something similar.

1 Like

Here are the field descriptions from our data guide (we also deliver death dates from each of the mentioned sources):

Indicator that the patient has died based on all data sources, including Epic, historical systems, and NDI data. Y indicates that the patient is dead, and N indicates that the patient is alive.

The date of death for the patient based on all available sources, including the NDI, the Epic date of death, and historical systems date of death. If an NDI death date exists for the patient, it is used to populate the field; else if an Epic date of death exists, it is used; else if the Historical date of death exists, it is used. If none of these are populated, the field is blank.

Curious, for Epic death data, what are the requirements for finding a person has died?

@MPhilofsky if you’re asking me: that’s complicated. My understanding is there are several ways death dates get into Epic. Two and a half that I know of: entered by care team when they have direct knowledge; entered by patient registration when informed by family; populated by disease center registry teams who monitor/search obituaries to maintain their registries (less sure about that path). Perhaps also when patient dies while an inpatient?

I should have asked the question differently. From the Epic data, what are your requirements for finding a person has died? For our EHR data, we use a combination of the date of death field and a person’s vital status.

First, no guarantee the death data is accurate, which you know already. We miss about 50% of deaths. Since it isn’t really a formal requirement in EHRs, our OMOP extract leverages any information we can get, mostly focusing on date of death but including vital status. As I recall, we needed some kind of death date so we made some estimates. We also incorporated other death information from a second source and then used this to refine our algorithm. In this case, though, we made an additional table to compare our results and ended up refining our death assessments. The data quality around death is very frustrating and so we encourage people to think about the purpose when using it - if the person might be dead: don’t recruit them for trials; if you are calculating mortality, especially in relation to an exposure, please leverage the higher quality data.

1 Like

@David_Dorr: What’s the problem with the death data? The fact itself, or the date? If the latter, by how much does it deviate in your estimation?

Reason I am asking is this: Mortality is usually measured in many months to years. So, a couple days earlier or later won’t make a difference. Even years. But if the data are completely made up then it would be useless indeed.

thanks- so HYBRID_DEATH_DT is a coalesce?

@MPhilofsky what is a person’s vital status?

@PriyaDesai - I don’t know if our DBAs are using an actual COALESCE function but it sounds like it would have the same effect.