OHDSI Home | Forums | Wiki | Github

What to do with NULL Death dates in OMOP?


(Melanie Philofsky) #21

It doesn’t work because those without direct access to your data will create queries without the knowledge you put death into the future. This is especially troublesome if you have a high percentage of dead Persons lacking a death date in your database. In the following hypothetical study, you will add a lot of false data and skew your results heavily by using 2099.

  • A Person who was diagnosed with X in 2015 and is now dead (per your source data) will have a longevity of 84 years from date of diagnosis. Not good. They died between 2015 and the date of your data pull.

For your use case, which is specific to the All of Us project, then I suggest

And make it apparent in your ETL specifications and in your end user documentation. The date of death would be the date the data was recorded or date of data pull.

Or lobby hard for the death flag :slight_smile:


(Christian Reich) #22

The former solution will get us closer to the US tax code (or the German one, which apparently is larger than that of all EU countries combined). So, maybe we do the latter. Want to put an issue in, @MPhilofsky?


(Karthik) #23

@MPhilofsky, let’s lobby hard for a death flag :slight_smile:


(Gregory Klebanov) #24

if you do now someone is dead but do not know when exactly - sure, death flag could be work for queries that need to know if patient is just dead. But then you are risking that any of the existing tools and apps that might be looking at death date for other purposes - as well as conventions around events after death date etc… - would not work anymore.

But if you follow a convention that Christian is proposing and calculate the death date - let’s say as of the last ETL refresh (or data extract?), it would be also nice to store a reference to the source of this information e.g. “national registry” (NDI in US), “nurse”, “estimated” etc… Then you have a date but also a way to measure the trust of the information. Actually, I had multiple discussions on this exact topic elsewhere since folks wanted to know where the death info is coming from.


(Mark Seal) #25

From a pure data design stand point, a death flag is a horrible idea as long as there is a death date. This would introduce new nulls into the system. From a data design standpoint, anytime a field is created that one knows that nulls will be inserted, it is the wrong design. It creates a nightmare to get correct data out of the system. It is doubly bad when one has a common schema that spans different database engines. How one works around nulls in Postgres is different than SQL Server which is different than Oracle.

@Christian_Reich introduced the option of recording the death date at load time when the actual date is not known; I see no problem with this. My work is on the ETL side and this is an easy fix.

There is another fix that is commonly used in database designs is a magic number, or in this case, a magic date. Pick a date that is impossible to be real, normally this would be a date that creates a negative age. This is ugly but it will not cause the SQL statements to break on query side. @Christian_Reich’s idea is better, but here is another option to keep from having to introduce another problem.


(Chris Knoll) #26

But there isn’t. Currently the model only supports ‘when they died’ which we can then infer ‘are they dead’ information from. But now we need ‘are they dead’ separately from ‘when they died’. I don’t see a problem with 2 fields storing 2 different pieces of information, now that you can have a separate usecase of ‘we know they are dead, we don’t know when they died’.

We’ve had null discussions on other threads. Let’s not introduce new fear about using nulls. Nulls are the proper way to represent missing information.

Let’s not do this. Magic numbers are bad in that you have to change all your analysis code to be aware of magic numbers, and if the magic numbers change, you have to go back to everything. This touches on our ‘flavors of null’ discussion we had in other places, and we don’t want to go there again.


(Roger Carlson) #27

I’ve read through this thread twice, and I don’t see the issue.

You know a person is dead if they have a record in the Death table. There’s no need to have a flag in the death table to show it. They would not be in the Death table if they weren’t dead.

There should not be a Death_flag field in the Person table, because that sets up a referential integrity issue. A person could be flagged as dead in the Person table with no record in the Death table, and vice versa.

In the Death table, Death_date should be nullable to show that you don’t know exact date. Maybe have verbatim_death_date and estimated_death_date fields. (Verbatim death date could remain death_date.) I’m not real happy with that, because it could introduce functional integrity issues, but this can be handled by the ETL, and there can be data quality checks to check for reasonableness.

If you only know the year, estimated_death_date could be the last day of the year. Similarly, if you only know the month and year, it would be the last day of that month. If there is a known death_date, that would be populated into both fields.

There could also be a Death_Type field with values like “returned from EHR”, “external source”, inferred history", “estimated”, etc. (all with relevant type_concepts, of course).

Researchers who need death information should always use a Left Join to the Death table.

Am I missing something?


(Melanie Philofsky) #28

There isn’t a Death table in v6. It’s now Person.death_datetime. No date, no death


(Roger Carlson) #29

Well, that explains why no one else was seeing what was blindly obvious to me.

In my opinion, this highlights the reason that there should be a Death table. There’s more information to store about a death than just its date. For example the death_type.

Are there any links to discussions about why they eliminated it?


(Melanie Philofsky) #30

and

and

Looks like I’m reaching consensus on bringing back the death_type_concept_id for death records :slight_smile:


(Mark Seal) #31

I think this is the best idea that has been proposed. It could be a death_date_source as surmised or something as simple as
death_date_trust BIT NOT NULL DEFAULT 0
where 1 would flag that the death date is verified.

There is no reason to make life too complicated.

EDIT: Belay the BIT idea… that would introduce the same problem as a death flag.


(Chris Knoll) #32

Here you go: Condition_occurrence, Death diagnoses


(Christian Reich) #33

Friends:

Death (the precise one), cause of death (the Condition leading to it) and where we got that information from (Type Concept) we have solved. No need to reinvent. In the same realm we could reinvent the BIRTH table, the GENDER table and the ETHNICITY table. Nulls are indeed the same logical thing as “magic dates”, just the natural way databases handle missing information, as @Chris_Knoll pointed out. Bottom line: We are good!

The only addition is a new use case where folks have information about death, but not the precise date. This is not that useful for mortality calculations, but it is useful for trial recruitment purposes. Obviously, a death flag is the same as death before a certain date (the date we learned about the patient’s death). Whether we represent this using a flag or an operator (less than or equal to) is not much different. The operator is more obscure, but it actually gives us more information because we could keep the date over refreshes (we remember when we first learned the patient is dead, and hence died before that date). @Chris_Knoll may have a preference.

Please put in a GitHub issue, and we feed it into the CDM WG machinery.


t