What to do with NULL Death dates in OMOP?

Christian_Reich · October 27, 2020, 8:49am

I got it. So, there are two use cases:

Select patient care situations, like selection of patients for transplantion. Generally, I would be cautious, the OMOP CDM really is not built for patient care and lacks many necessary components (e.g. identifiability). But your use case of a transplantation weighting list is different as it is inherently outside a single institution and not transactional. Would Observation Period help here if we introduced the concept of “till now”?
Clinical trial recruitment. This use case should be discussed with the Clinical Trial WG. Again, it appears to me that the special Observation Period convention could do the trick.

Thoughts? @sonia?

cukarthik · October 27, 2020, 2:54pm

I’m not sure if observation period would help. Would it make sense to put a death as a row in the observation table? Ideally, I would want a death indicator flag in the death table as @QI_omop mentioned, but it seems like that was not accepted.

Christian_Reich · October 27, 2020, 6:41pm

@cukarthik:

Look. The reason I am so stubbornly pushing back is that we cannot create an inflation of random conventions, one for each use case. The model and the conventions should be consistent and sparse: The least amount of rules and provisions that the poor ETL schmocks can implement and the analyst have a chance of exploiting.

Should we add a convention like death_datetime=“2099-12-31” as a code for “dead as of the time of database refresh, but we don’t know when exactly”?

MPhilofsky · October 27, 2020, 8:21pm

No, don’t do it. Don’t make up a date in the future. A death flag is more preferable. You know someone died, but you don’t know when. Everyone looking at the data will see the date of death is unknown because death date will be NULL.

MPhilofsky · October 27, 2020, 8:40pm

Colorado has decided to use the last Provider-Person Interaction/Visit as the date of death. Sounds simple, but with Epic data, identifying the actual Provider-Person interactions from the “paperwork/run a hospital/allow an EHR to function” encounters isn’t easy. Using the last Provider-Person Interaction/Visit is the last time we know they were alive. Not perfect. And we are definitely open to other options: death flag, improved heuristic, etc. Fortunately, we have state death registry data, so we don’t have a large percentage of death records using the above logic.

cukarthik · October 28, 2020, 4:13pm

Hi @MPhilofsky, I was thinking the same thing, but wasn’t sure b/c the patient could be past a a year or more later in the case of Tx patients.

@Christian_Reich the future date is something you mentioned before, but I agree w/ @MPhilofsky, I would prefer a NULL date over a future date (at least for v5.x). I don’t envy your task to keep the model and conventions slim, @Christian_Reich; it’s not an easy task. I do see that the future date idea works for both cdm versions, so I could be convinced to use it until we can do a death indicator somehow.

MPhilofsky · October 29, 2020, 9:36pm

It doesn’t work because those without direct access to your data will create queries without the knowledge you put death into the future. This is especially troublesome if you have a high percentage of dead Persons lacking a death date in your database. In the following hypothetical study, you will add a lot of false data and skew your results heavily by using 2099.

A Person who was diagnosed with X in 2015 and is now dead (per your source data) will have a longevity of 84 years from date of diagnosis. Not good. They died between 2015 and the date of your data pull.

For your use case, which is specific to the All of Us project, then I suggest

And make it apparent in your ETL specifications and in your end user documentation. The date of death would be the date the data was recorded or date of data pull.

Or lobby hard for the death flag

Christian_Reich · October 29, 2020, 10:17pm

The former solution will get us closer to the US tax code (or the German one, which apparently is larger than that of all EU countries combined). So, maybe we do the latter. Want to put an issue in, @MPhilofsky?

cukarthik · October 30, 2020, 1:56am

@MPhilofsky, let’s lobby hard for a death flag

gregk · October 30, 2020, 12:40pm

if you do now someone is dead but do not know when exactly - sure, death flag could be work for queries that need to know if patient is just dead. But then you are risking that any of the existing tools and apps that might be looking at death date for other purposes - as well as conventions around events after death date etc… - would not work anymore.

But if you follow a convention that Christian is proposing and calculate the death date - let’s say as of the last ETL refresh (or data extract?), it would be also nice to store a reference to the source of this information e.g. “national registry” (NDI in US), “nurse”, “estimated” etc… Then you have a date but also a way to measure the trust of the information. Actually, I had multiple discussions on this exact topic elsewhere since folks wanted to know where the death info is coming from.

Mark · October 30, 2020, 1:50pm

From a pure data design stand point, a death flag is a horrible idea as long as there is a death date. This would introduce new nulls into the system. From a data design standpoint, anytime a field is created that one knows that nulls will be inserted, it is the wrong design. It creates a nightmare to get correct data out of the system. It is doubly bad when one has a common schema that spans different database engines. How one works around nulls in Postgres is different than SQL Server which is different than Oracle.

@Christian_Reich introduced the option of recording the death date at load time when the actual date is not known; I see no problem with this. My work is on the ETL side and this is an easy fix.

There is another fix that is commonly used in database designs is a magic number, or in this case, a magic date. Pick a date that is impossible to be real, normally this would be a date that creates a negative age. This is ugly but it will not cause the SQL statements to break on query side. @Christian_Reich’s idea is better, but here is another option to keep from having to introduce another problem.

Chris_Knoll · October 30, 2020, 2:21pm

But there isn’t. Currently the model only supports ‘when they died’ which we can then infer ‘are they dead’ information from. But now we need ‘are they dead’ separately from ‘when they died’. I don’t see a problem with 2 fields storing 2 different pieces of information, now that you can have a separate usecase of ‘we know they are dead, we don’t know when they died’.

We’ve had null discussions on other threads. Let’s not introduce new fear about using nulls. Nulls are the proper way to represent missing information.

Let’s not do this. Magic numbers are bad in that you have to change all your analysis code to be aware of magic numbers, and if the magic numbers change, you have to go back to everything. This touches on our ‘flavors of null’ discussion we had in other places, and we don’t want to go there again.

roger.carlson · October 30, 2020, 3:27pm

I’ve read through this thread twice, and I don’t see the issue.

You know a person is dead if they have a record in the Death table. There’s no need to have a flag in the death table to show it. They would not be in the Death table if they weren’t dead.

There should not be a Death_flag field in the Person table, because that sets up a referential integrity issue. A person could be flagged as dead in the Person table with no record in the Death table, and vice versa.

In the Death table, Death_date should be nullable to show that you don’t know exact date. Maybe have verbatim_death_date and estimated_death_date fields. (Verbatim death date could remain death_date.) I’m not real happy with that, because it could introduce functional integrity issues, but this can be handled by the ETL, and there can be data quality checks to check for reasonableness.

If you only know the year, estimated_death_date could be the last day of the year. Similarly, if you only know the month and year, it would be the last day of that month. If there is a known death_date, that would be populated into both fields.

There could also be a Death_Type field with values like “returned from EHR”, “external source”, inferred history", “estimated”, etc. (all with relevant type_concepts, of course).

Researchers who need death information should always use a Left Join to the Death table.

Am I missing something?

MPhilofsky · October 30, 2020, 3:51pm

There isn’t a Death table in v6. It’s now Person.death_datetime. No date, no death

roger.carlson · October 30, 2020, 3:59pm

Well, that explains why no one else was seeing what was blindly obvious to me.

In my opinion, this highlights the reason that there should be a Death table. There’s more information to store about a death than just its date. For example the death_type.

Are there any links to discussions about why they eliminated it?

MPhilofsky · October 30, 2020, 4:15pm

and

Looks like I’m reaching consensus on bringing back the death_type_concept_id for death records

Mark · October 30, 2020, 6:09pm

I think this is the best idea that has been proposed. It could be a death_date_source as surmised or something as simple as
death_date_trust BIT NOT NULL DEFAULT 0
where 1 would flag that the death date is verified.

There is no reason to make life too complicated.

EDIT: Belay the BIT idea… that would introduce the same problem as a death flag.

Chris_Knoll · October 31, 2020, 8:14am

Here you go: Condition_occurrence, Death diagnoses

Christian_Reich · November 1, 2020, 2:32pm

Friends:

Death (the precise one), cause of death (the Condition leading to it) and where we got that information from (Type Concept) we have solved. No need to reinvent. In the same realm we could reinvent the BIRTH table, the GENDER table and the ETHNICITY table. Nulls are indeed the same logical thing as “magic dates”, just the natural way databases handle missing information, as @Chris_Knoll pointed out. Bottom line: We are good!

The only addition is a new use case where folks have information about death, but not the precise date. This is not that useful for mortality calculations, but it is useful for trial recruitment purposes. Obviously, a death flag is the same as death before a certain date (the date we learned about the patient’s death). Whether we represent this using a flag or an operator (less than or equal to) is not much different. The operator is more obscure, but it actually gives us more information because we could keep the date over refreshes (we remember when we first learned the patient is dead, and hence died before that date). @Chris_Knoll may have a preference.

Please put in a GitHub issue, and we feed it into the CDM WG machinery.

MPhilofsky · December 1, 2020, 11:32pm

This isn’t solved in v6. There isn’t a type_concept_id for the person.death_date. If the Death doesn’t have an associated Condition, then the record doesn’t have provenance.