Our database receives data from multiple sources. Sometimes, various sources send the death records of the same person to us. I understand the death table in CDM v5 (or the person table in CDM v6) is supposed to have only up to one death record per person. What should we do if we want to keep all the death records of a person from different sources?
Before asking âhow to keep all source data in the CDMâ, you should ask âwhy do we need itâ? What is the use case to have two death records? The OMOP CDM is driven by real world use cases.
Nothing is certain in this world, except death (and taxes). Obviously, the patients only die once. The analyst working on @MPhilofskyâs use case has no clue how to interpret the double death. It is the job of the ETL team to figure out which of the death dates to believe more. You could think of different heuristics:
The more reliable feed
The date after which no new data are coming in
The first date, because the second date is just the time stamp of the certificate coming in
That makes sense. For the All of Us program we might be getting death data from various sources beyond the EHR (i.e. coordinators reaching out the participant family, CMS, state death registry, etc.). The issue we have is some of this data we might now have much control over to say which one is better, but I guess you are right @Christian_Reich, we need to create some type of rule to provide to users.
Understood, but you canât kick the can down the road to the analyst. That poor wretch has even less context to work with. So, push back against your sources and make them pick a date. If they canât, have your own heuristic.
I should have sent that as a private message to Cukartik; my comment was more directed at the AOU process and not important for general OMOP; I apologize for the confusion.
I agree with your statement; not knowing if the data is precise or not, I would like to exclude it altogether.
At Stanford, in addition to EHR death data, we are getting LADMF death data, and will soon be getting CDPH-VR data. Our analysis has shown that SSA data can often be unreliable. While I understand that a patient can only have one death record per person, we have folks here who would like to know the dates reported via other agencies. We are proposing that for our internal OMOP, we may add some extra columns for the different sources. We are curious to know if other in the community have done that or used some other strategy.
In our EDW we store multiple sources of death data (EHR, Cancer Registry, NDI, SSA, etc.), but for our research data sets we have a âderived date of deathâ algorithm that ranks sources by quality and delivers the âbestâ death date. This is the death date we are storing in OMOP.
In Colorado we use a heuristic to select what we believe to be the most accurate death date (subject to change at any time ). We have quite a few expansion columns in our OMOP CDM, but additional death date isnât one of them. However, if we were to store multiple dates, we would add an additional column for each death date source so we can keep a single row per person and then name them <DataSourceName>_death_date_x and store the death date from that data source in the column.
@jmethot: That is really interesting- Can you tell us a little more about your âalgorithmâ -happy to set up a quick call as we are hoping to do something similar. @Alvaro_A_Alvarez
Here are the field descriptions from our data guide (we also deliver death dates from each of the mentioned sources):
HYBRID_DEATH_IND
Indicator that the patient has died based on all data sources, including Epic, historical systems, and NDI data. Y indicates that the patient is dead, and N indicates that the patient is alive.
HYBRID_DEATH_DT
The date of death for the patient based on all available sources, including the NDI, the Epic date of death, and historical systems date of death. If an NDI death date exists for the patient, it is used to populate the field; else if an Epic date of death exists, it is used; else if the Historical date of death exists, it is used. If none of these are populated, the field is blank.
@MPhilofsky if youâre asking me: thatâs complicated. My understanding is there are several ways death dates get into Epic. Two and a half that I know of: entered by care team when they have direct knowledge; entered by patient registration when informed by family; populated by disease center registry teams who monitor/search obituaries to maintain their registries (less sure about that path). Perhaps also when patient dies while an inpatient?
I should have asked the question differently. From the Epic data, what are your requirements for finding a person has died? For our EHR data, we use a combination of the date of death field and a personâs vital status.
First, no guarantee the death data is accurate, which you know already. We miss about 50% of deaths. Since it isnât really a formal requirement in EHRs, our OMOP extract leverages any information we can get, mostly focusing on date of death but including vital status. As I recall, we needed some kind of death date so we made some estimates. We also incorporated other death information from a second source and then used this to refine our algorithm. In this case, though, we made an additional table to compare our results and ended up refining our death assessments. The data quality around death is very frustrating and so we encourage people to think about the purpose when using it - if the person might be dead: donât recruit them for trials; if you are calculating mortality, especially in relation to an exposure, please leverage the higher quality data.
@David_Dorr: Whatâs the problem with the death data? The fact itself, or the date? If the latter, by how much does it deviate in your estimation?
Reason I am asking is this: Mortality is usually measured in many months to years. So, a couple days earlier or later wonât make a difference. Even years. But if the data are completely made up then it would be useless indeed.