What to do with NULL Death dates in OMOP?

MPhilofsky · October 29, 2020, 9:36pm

It doesn’t work because those without direct access to your data will create queries without the knowledge you put death into the future. This is especially troublesome if you have a high percentage of dead Persons lacking a death date in your database. In the following hypothetical study, you will add a lot of false data and skew your results heavily by using 2099.

A Person who was diagnosed with X in 2015 and is now dead (per your source data) will have a longevity of 84 years from date of diagnosis. Not good. They died between 2015 and the date of your data pull.

For your use case, which is specific to the All of Us project, then I suggest

And make it apparent in your ETL specifications and in your end user documentation. The date of death would be the date the data was recorded or date of data pull.

Or lobby hard for the death flag

Christian_Reich · October 29, 2020, 10:17pm

The former solution will get us closer to the US tax code (or the German one, which apparently is larger than that of all EU countries combined). So, maybe we do the latter. Want to put an issue in, @MPhilofsky?

cukarthik · October 30, 2020, 1:56am

@MPhilofsky, let’s lobby hard for a death flag

gregk · October 30, 2020, 12:40pm

if you do now someone is dead but do not know when exactly - sure, death flag could be work for queries that need to know if patient is just dead. But then you are risking that any of the existing tools and apps that might be looking at death date for other purposes - as well as conventions around events after death date etc… - would not work anymore.

But if you follow a convention that Christian is proposing and calculate the death date - let’s say as of the last ETL refresh (or data extract?), it would be also nice to store a reference to the source of this information e.g. “national registry” (NDI in US), “nurse”, “estimated” etc… Then you have a date but also a way to measure the trust of the information. Actually, I had multiple discussions on this exact topic elsewhere since folks wanted to know where the death info is coming from.

Mark · October 30, 2020, 1:49pm

From a pure data design stand point, a death flag is a horrible idea as long as there is a death date. This would introduce new nulls into the system. From a data design standpoint, anytime a field is created that one knows that nulls will be inserted, it is the wrong design. It creates a nightmare to get correct data out of the system. It is doubly bad when one has a common schema that spans different database engines. How one works around nulls in Postgres is different than SQL Server which is different than Oracle.

@Christian_Reich introduced the option of recording the death date at load time when the actual date is not known; I see no problem with this. My work is on the ETL side and this is an easy fix.

There is another fix that is commonly used in database designs is a magic number, or in this case, a magic date. Pick a date that is impossible to be real, normally this would be a date that creates a negative age. This is ugly but it will not cause the SQL statements to break on query side. @Christian_Reich’s idea is better, but here is another option to keep from having to introduce another problem.

Chris_Knoll · October 30, 2020, 2:21pm

But there isn’t. Currently the model only supports ‘when they died’ which we can then infer ‘are they dead’ information from. But now we need ‘are they dead’ separately from ‘when they died’. I don’t see a problem with 2 fields storing 2 different pieces of information, now that you can have a separate usecase of ‘we know they are dead, we don’t know when they died’.

We’ve had null discussions on other threads. Let’s not introduce new fear about using nulls. Nulls are the proper way to represent missing information.

Let’s not do this. Magic numbers are bad in that you have to change all your analysis code to be aware of magic numbers, and if the magic numbers change, you have to go back to everything. This touches on our ‘flavors of null’ discussion we had in other places, and we don’t want to go there again.

roger.carlson · October 30, 2020, 3:16pm

I’ve read through this thread twice, and I don’t see the issue.

You know a person is dead if they have a record in the Death table. There’s no need to have a flag in the death table to show it. They would not be in the Death table if they weren’t dead.

There should not be a Death_flag field in the Person table, because that sets up a referential integrity issue. A person could be flagged as dead in the Person table with no record in the Death table, and vice versa.

In the Death table, Death_date should be nullable to show that you don’t know exact date. Maybe have verbatim_death_date and estimated_death_date fields. (Verbatim death date could remain death_date.) I’m not real happy with that, because it could introduce functional integrity issues, but this can be handled by the ETL, and there can be data quality checks to check for reasonableness.

If you only know the year, estimated_death_date could be the last day of the year. Similarly, if you only know the month and year, it would be the last day of that month. If there is a known death_date, that would be populated into both fields.

There could also be a Death_Type field with values like “returned from EHR”, “external source”, inferred history", “estimated”, etc. (all with relevant type_concepts, of course).

Researchers who need death information should always use a Left Join to the Death table.

Am I missing something?

MPhilofsky · October 30, 2020, 3:51pm

There isn’t a Death table in v6. It’s now Person.death_datetime. No date, no death

roger.carlson · October 30, 2020, 3:59pm

Well, that explains why no one else was seeing what was blindly obvious to me.

In my opinion, this highlights the reason that there should be a Death table. There’s more information to store about a death than just its date. For example the death_type.

Are there any links to discussions about why they eliminated it?

MPhilofsky · October 30, 2020, 4:15pm

and

Looks like I’m reaching consensus on bringing back the death_type_concept_id for death records

Mark · October 30, 2020, 5:09pm

I think this is the best idea that has been proposed. It could be a death_date_source as surmised or something as simple as
death_date_trust BIT NOT NULL DEFAULT 0
where 1 would flag that the death date is verified.

There is no reason to make life too complicated.

EDIT: Belay the BIT idea… that would introduce the same problem as a death flag.

Chris_Knoll · October 31, 2020, 8:14am

Here you go: Condition_occurrence, Death diagnoses - #53 by pavgra

Christian_Reich · November 1, 2020, 2:32pm

Friends:

Death (the precise one), cause of death (the Condition leading to it) and where we got that information from (Type Concept) we have solved. No need to reinvent. In the same realm we could reinvent the BIRTH table, the GENDER table and the ETHNICITY table. Nulls are indeed the same logical thing as “magic dates”, just the natural way databases handle missing information, as @Chris_Knoll pointed out. Bottom line: We are good!

The only addition is a new use case where folks have information about death, but not the precise date. This is not that useful for mortality calculations, but it is useful for trial recruitment purposes. Obviously, a death flag is the same as death before a certain date (the date we learned about the patient’s death). Whether we represent this using a flag or an operator (less than or equal to) is not much different. The operator is more obscure, but it actually gives us more information because we could keep the date over refreshes (we remember when we first learned the patient is dead, and hence died before that date). @Chris_Knoll may have a preference.

Please put in a GitHub issue, and we feed it into the CDM WG machinery.

MPhilofsky · December 1, 2020, 11:32pm

This isn’t solved in v6. There isn’t a type_concept_id for the person.death_date. If the Death doesn’t have an associated Condition, then the record doesn’t have provenance.

cukarthik · December 11, 2020, 4:16am

Does this mean for v5.3.1 we make death_date nullable? I’m not a fan of “magic dates” since there is no consistence of what that date might be and end users would have to know something specific for each OMOP instance about the date used in the death date. Should I create a GitHub issue on this topic as well?

dastumpf · December 11, 2020, 5:59am

You folks have got to fix this problem. Death is the ultimate outcome. Years ago when I was trying to assess end of life costs, I was amazed at how difficult it was to find out if someone was dead, and more so a date of death. With a maximum effort, looking in 4 sources (e.g., no standard practice), we only got there in, as I recall, data for ~70%. You need a flag and a date. The American College of Surgeon, as I recall, made this a mandatory field in their registries.

Chris_Knoll · December 11, 2020, 7:23am

It’s difficult to fix a paradox. You rightly state that death is an outcome, but how can you evaluate an outcome that doesn’t have a date? But, I think there’s a solution under the 5.X version of the cdm with the DEATH table: if the death table allows null death date, then you can insert a record into the death table to serve as the status of death, and include any other death-oriented information. This would allow you to exclude people from a study if you can’t determine the exact date of death (simply: people who have exactly 0 death records with death_date is null).

But I think they changed things in the 6.X line of the cdm to attach the death date to a person, and you can’t have a person without a record (to indicate the death occurred, but not when), so I don’t know where you go in the 6.0 version of the CDM. Maybe go back to a death table…it is a very specific context in the patient experience, right? Maybe it deserves a domain.

Christian_Reich · December 29, 2020, 5:17pm

Nobody will disagree, @dastumpf, we definitely need the death information. Problem is that observational data are a product of the healthcare system, and by definition patients stop being an object of that the moment they die. Capture of death might occur if it happens during a hospital stay, but even that is not sure. Patients dying at home - all bets are off. So, good luck to the ACS trying to enforce anything like that.

hannes · January 7, 2022, 9:29pm

@Christian_Reich @MPhilofsky
Coming in very late to this discussion. In CDM v 5.4 it looks like the death_date column still has a NOT NULL constraint. I sympathize with the arguments made for having a data and against having a flag.
I agree that one should be able to put a date in there, the actual date of death or a proxy somehow computed from the source. I would have expected though that there is a way to qualify what kind of death_date I am recording. If I interpret the available columns in the table correctly then there are only qualifiers for the source of the death information and the cause of death.
Would it not make sense to have a death_date_type_concept_id? Something that encodes whether this is an actual death date or an imputed ones (conceivably specifying whether it was a ‘last encounter’, a ‘source record date’, a ‘loading_date’, etc.).

Christian_Reich · January 9, 2022, 3:32am

What would be the vocabulary for that, @hannes? And what is the use case? How are you going to use it? Only believe they are dead if there is a certain type?