Multiple death records from different sources

hiro-mishima · March 27, 2023, 3:26pm

Hi,

Our database receives data from multiple sources. Sometimes, various sources send the death records of the same person to us. I understand the death table in CDM v5 (or the person table in CDM v6) is supposed to have only up to one death record per person. What should we do if we want to keep all the death records of a person from different sources?

Tagging: @clairblacketer

Thanks!
Hiro

MPhilofsky · March 28, 2023, 3:07pm

Hello @hiro-mishima,

Before asking “how to keep all source data in the CDM”, you should ask “why do we need it”? What is the use case to have two death records? The OMOP CDM is driven by real world use cases.

Christian_Reich · March 28, 2023, 8:40pm

@hiro-mishima:

Nothing is certain in this world, except death (and taxes). Obviously, the patients only die once. The analyst working on @MPhilofsky’s use case has no clue how to interpret the double death. It is the job of the ETL team to figure out which of the death dates to believe more. You could think of different heuristics:

The more reliable feed
The date after which no new data are coming in
The first date, because the second date is just the time stamp of the certificate coming in
etc.

cukarthik · April 3, 2023, 4:22pm

That makes sense. For the All of Us program we might be getting death data from various sources beyond the EHR (i.e. coordinators reaching out the participant family, CMS, state death registry, etc.). The issue we have is some of this data we might now have much control over to say which one is better, but I guess you are right @Christian_Reich, we need to create some type of rule to provide to users.

Mark · April 3, 2023, 5:05pm

Not ‘All of Us’ know where the death data comes from, we either have it or we don’t and there is no way to verify if it is correct.

Pun intended.

Christian_Reich · April 3, 2023, 7:24pm

Understood, but you can’t kick the can down the road to the analyst. That poor wretch has even less context to work with. So, push back against your sources and make them pick a date. If they can’t, have your own heuristic.

Mark · April 3, 2023, 7:38pm

I should have sent that as a private message to Cukartik; my comment was more directed at the AOU process and not important for general OMOP; I apologize for the confusion.

I agree with your statement; not knowing if the data is precise or not, I would like to exclude it altogether.

PriyaDesai · September 5, 2023, 10:19pm

At Stanford, in addition to EHR death data, we are getting LADMF death data, and will soon be getting CDPH-VR data. Our analysis has shown that SSA data can often be unreliable. While I understand that a patient can only have one death record per person, we have folks here who would like to know the dates reported via other agencies. We are proposing that for our internal OMOP, we may add some extra columns for the different sources. We are curious to know if other in the community have done that or used some other strategy.

@Christian_Reich @cukarthik @MPhilofsky @hiro-mishima

#death-data #vocabulary-users #cdm-builders

jmethot · September 5, 2023, 10:31pm

In our EDW we store multiple sources of death data (EHR, Cancer Registry, NDI, SSA, etc.), but for our research data sets we have a “derived date of death” algorithm that ranks sources by quality and delivers the “best” death date. This is the death date we are storing in OMOP.

MPhilofsky · September 6, 2023, 5:36pm

Hello @PriyaDesai,

In Colorado we use a heuristic to select what we believe to be the most accurate death date (subject to change at any time ). We have quite a few expansion columns in our OMOP CDM, but additional death date isn’t one of them. However, if we were to store multiple dates, we would add an additional column for each death date source so we can keep a single row per person and then name them <DataSourceName>_death_date_x and store the death date from that data source in the column.

PriyaDesai · September 7, 2023, 3:51pm

@jmethot: That is really interesting- Can you tell us a little more about your “algorithm” -happy to set up a quick call as we are hoping to do something similar.
@Alvaro_A_Alvarez

jmethot · September 7, 2023, 9:43pm

Here are the field descriptions from our data guide (we also deliver death dates from each of the mentioned sources):

HYBRID_DEATH_IND
Indicator that the patient has died based on all data sources, including Epic, historical systems, and NDI data. Y indicates that the patient is dead, and N indicates that the patient is alive.

HYBRID_DEATH_DT
The date of death for the patient based on all available sources, including the NDI, the Epic date of death, and historical systems date of death. If an NDI death date exists for the patient, it is used to populate the field; else if an Epic date of death exists, it is used; else if the Historical date of death exists, it is used. If none of these are populated, the field is blank.

MPhilofsky · September 8, 2023, 3:27pm

Curious, for Epic death data, what are the requirements for finding a person has died?

jmethot · September 8, 2023, 3:39pm

@MPhilofsky if you’re asking me: that’s complicated. My understanding is there are several ways death dates get into Epic. Two and a half that I know of: entered by care team when they have direct knowledge; entered by patient registration when informed by family; populated by disease center registry teams who monitor/search obituaries to maintain their registries (less sure about that path). Perhaps also when patient dies while an inpatient?

MPhilofsky · September 8, 2023, 3:45pm

I should have asked the question differently. From the Epic data, what are your requirements for finding a person has died? For our EHR data, we use a combination of the date of death field and a person’s vital status.

David_Dorr · September 8, 2023, 3:52pm

First, no guarantee the death data is accurate, which you know already. We miss about 50% of deaths. Since it isn’t really a formal requirement in EHRs, our OMOP extract leverages any information we can get, mostly focusing on date of death but including vital status. As I recall, we needed some kind of death date so we made some estimates. We also incorporated other death information from a second source and then used this to refine our algorithm. In this case, though, we made an additional table to compare our results and ended up refining our death assessments. The data quality around death is very frustrating and so we encourage people to think about the purpose when using it - if the person might be dead: don’t recruit them for trials; if you are calculating mortality, especially in relation to an exposure, please leverage the higher quality data.

Christian_Reich · September 11, 2023, 4:02pm

@David_Dorr: What’s the problem with the death data? The fact itself, or the date? If the latter, by how much does it deviate in your estimation?

Reason I am asking is this: Mortality is usually measured in many months to years. So, a couple days earlier or later won’t make a difference. Even years. But if the data are completely made up then it would be useless indeed.

PriyaDesai · September 12, 2023, 4:22am

thanks- so HYBRID_DEATH_DT is a coalesce?

PriyaDesai · September 12, 2023, 4:24am

@MPhilofsky what is a person’s vital status?

jmethot · September 12, 2023, 2:38pm

@PriyaDesai - I don’t know if our DBAs are using an actual COALESCE function but it sounds like it would have the same effect.