OHDSI Home | Forums | Wiki | Github

What to do with NULL Death dates in OMOP?

(Matthew Joss) #1


I work at Partners Healthcare, and we are developing our ETL to transform our biobank data from i2b2 into OMOP 5.2 for eMERGE. We noticed that about half of our recorded patients that are recorded as deceased are missing their death dates (currently NULL values). We are aware that the OMOP 5.2 documentation says that the DEATH table should have death_date as a NOT NULL column. How should we handle this problem? If we import our data as-is, then we will leave out about half of our patients with death data.

Thank you for any guidance that you can offer.

(Matthew Joss) #2

I should also mention additional information that Vivian Gainer wrote to me regarding this problem. This correction is an important distinction from my prior post. :

“The problem is that we can’t provide date of death for many people because of the SSDI policy/regulations. (However, the date is ‘rolling’, so as soon as 3 years have passed, we can publish it.)
I don’t know exactly how this table will be used, but it would seem that it should represent all those who have died, not just those who died more than 3 years ago…
If any other sites are using DMF data, then it most likely will affect them, too, and may be a factor in any discussion about this.”

(Christian Reich) #3


Not sure I understand the point of knowing a patient died, but not when. We all will die. So, anybody could have such a record with a NULL in the date. For research purposes (such as mortality) we need it.

I’d drop it unless there is a use case.

(Matthew Joss) #4

Jeff Klann and I think you are right that if we did not know the death date at all, then the information that the patients have died is not useful. However we do know that these patients must have died within the last 3 years… we just cannot disclose the exact date until 3 years have passed due to the aforementioned policy.

Is there a way of representing that the patient died within the last 3 years, but where we do not know the exact date?

(Tom Galia) #5

Christian, We have a use case for knowing that a patient is deceased. We are working on a conversion that will be used to identify patients for enrollment in a study. The source data has an IsDead=‘true’ indicator but no other information. Populating the death table with these patients would help the enrollment process. I would like to bring this in but the rules seem very clear.

(Qi Yang) #6

I think this is a common problem that OMOP should offer a long term solution. Everyone on Person table will eventually die. But the date of death may not be available for many of them for privacy reason or for the other reasons. If there is no place in OMOP tables to indicate that a person is dead, then I feel we lose significant information. A use case is given above as Tom stated that death information is still useful for enrollment purpose even without knowing the date or cause of death. I suggest we put a Is Death indicator in Person table. If date of death is not known, then Death table is not loaded but Death indicator in Person table still gets populated. Maybe a subject for Themis.

(Christian Reich) #7


The problem with such static data is that apart from the PERSON table everything has a date in OMOP. For a good reason, because you then use the date to establish temporal order, which you can use for inferring causal relationship. I understand your use case: You don’t want to recruit the dead guy into a study. But if you put in a DEATH record you will need a date. If you make one up folks will use it for mortality calculations and make Kaplan-Meier plots.

In your data: Can you infer when the death occurred by going to older versions of the database and see when the flag became true? And use that as a date?

(Tom Galia) #8

Thanks for the follow up and explanation. There is no way to access previous versions of the data in the context of this project so we will make sure deceased people are not identified through other means. I do like Qi’s suggestion for a death indicator in person. I will add this to the issues list for the person table in the Themis.

(Christian Reich) #9

No need. The discussion has concluded here:and here. Bigger than THEMIS.

(Karthik) #10

Reviving this old discussion. I looked at both the above discussion threads but I didn’t see how they address the initial question of the post of how to handle null deaths? Maybe I missed. @Christian_Reich @QI_omop @clairblacketer an one of you point me to do when I know a person is deceased but don’t have their death date? Thanks.

(Christian Reich) #11

What’s the use case, @karthik? Why do you want to know?

(Nick) #12

You can make a fake date of death from logic rules or prediction model outputs.

If mortality is well understood in a subset of the data, and is missing at random from the rest of the data; and the estimate is wide(year of death) as opposed to narrow (second of death in day of year) you could consider ML or MI ((weighted) multiple imputation) approaches. It would mean that your conclusion is theoretical and not real world but it should be all right provided cases were enrolled at similar start points for similar reasons and from similar places (if etiology, geography and enrollment is controlled).

If this is a must do I would recommend using national mortality data within demography to weight the risk of death in the death observed cases and then use ML to find the relationships between variables that span observed and unobserved mortality cases for a set where death is observed and a set where death is not. Finally use the ML output to re-weight the likelihood of being able to observe death from your records and impute with ‘probability of observation within demography’ from observed case weights, as well as ‘probability within clinical condition death observed observed weights’ (like Charleston comorbidity score or something) to enhance the solution to the MI equation.

If you frame your prediction broadly (vaguely) you have a higher chance of being right; dead within five years of study start is more accurate when predicted from observed weighted imputed records than dead within one year of study start.

The best option is to get a death certificate; but you can ‘wing it’ provided you either know they died or know when some people who really look a lot like them died.

(Christian Reich) #13

Do you have something like that you want to consider publishing in the OHDSI GitHub? Would be really cool to have. Everybody has the problem with the death dates.

(Karthik) #14

Hi @Christian_Reich,
For the All of Us project, the program is following up to find out if a participant is deceased. This scenario of identifying death goes beyond this project and is applicable for clinical care. For example, in renal transplant, nurse coordinators follow-up with patients on a waitlist or post-transplantation. The death information is captured via a phone call and recorded in the EHR usually as a death flag b/c the date is unknown. How would we capture this information in OMOP? We want to know that a person has expired. @MPhilofsky, what do you all do?

(Christian Reich) #15


I got it. So, there are two use cases:

  1. Select patient care situations, like selection of patients for transplantion. Generally, I would be cautious, the OMOP CDM really is not built for patient care and lacks many necessary components (e.g. identifiability). But your use case of a transplantation weighting list is different as it is inherently outside a single institution and not transactional. Would Observation Period help here if we introduced the concept of “till now”?

  2. Clinical trial recruitment. This use case should be discussed with the Clinical Trial WG. Again, it appears to me that the special Observation Period convention could do the trick.

Thoughts? @sonia?

(Karthik) #16

I’m not sure if observation period would help. Would it make sense to put a death as a row in the observation table? Ideally, I would want a death indicator flag in the death table as @QI_omop mentioned, but it seems like that was not accepted.

(Christian Reich) #17


Look. The reason I am so stubbornly pushing back is that we cannot create an inflation of random conventions, one for each use case. The model and the conventions should be consistent and sparse: The least amount of rules and provisions that the poor ETL schmocks can implement and the analyst have a chance of exploiting.

Should we add a convention like death_datetime=“2099-12-31” as a code for “dead as of the time of database refresh, but we don’t know when exactly”?

(Melanie Philofsky) #18

No, don’t do it. Don’t make up a date in the future. A death flag is more preferable. You know someone died, but you don’t know when. Everyone looking at the data will see the date of death is unknown because death date will be NULL.

(Melanie Philofsky) #19

Colorado has decided to use the last Provider-Person Interaction/Visit as the date of death. Sounds simple, but with Epic data, identifying the actual Provider-Person interactions from the “paperwork/run a hospital/allow an EHR to function” encounters isn’t easy. Using the last Provider-Person Interaction/Visit is the last time we know they were alive. Not perfect. And we are definitely open to other options: death flag, improved heuristic, etc. Fortunately, we have state death registry data, so we don’t have a large percentage of death records using the above logic.

(Karthik) #20

Hi @MPhilofsky, I was thinking the same thing, but wasn’t sure b/c the patient could be past a a year or more later in the case of Tx patients.

@Christian_Reich the future date is something you mentioned before, but I agree w/ @MPhilofsky, I would prefer a NULL date over a future date (at least for v5.x). I don’t envy your task to keep the model and conventions slim, @Christian_Reich; it’s not an easy task. I do see that the future date idea works for both cdm versions, so I could be convinced to use it until we can do a death indicator somehow.