This issue has been discussed multiple times over multiple years (see Related Posts below). There has even been a THEMIS thread proposing a solution. However, the issue remains, and we’d like to stick a pin in it for good.
Issue Name: Multiple Death Records Due to Death_type_concept or Cause_concept Version 5.4
Issue Type: Duplicate records caused by multiple standard concept_ids for death cause or death type (provenance)
Description:OMOP CDM v5.4
The table description for the Death table has the following: “A person can have up to one record if the source system contains evidence about the Death”. However, an ICD diagnosis code may map to multiple SNOMED codes:
ICD10: J96.21 (Acute on chronic respiratory failure with hypoxia (HCC))
maps to SNOMED: 389086002 (Hypoxia) and 67905004 (Acute-on-chronic respiratory failure)
This results in two records with the same death date.
Similarly, a person may have death records from multiple systems (provenance), i.e. EHR and death registry.
Suggested Solution: For OMOP version 5.4, allow multiple records in the Death table caused by either the death_type_concept_id and/or cause_concept_id. However the Death_Date(s) must match.
Possible Implications:
CDM documentation will need to be modified.
The rules of the DQD (Data Quality Dashboard) will have to be adjusted to match. @clairblacketer
Question @Chris_Knoll on whether this change will affect Atlas. Does Atlas expect only one death record?
I have another potential solution that may not impact tooling as much.
Preferred Concepts Based on Context (Death)
Process
ETL creates multiple records, then drops duplicates keeping the single ‘preferred’ concept_id for the context.
Both records must have the same person_id, date, and death_type_id for this drop to occur
All dropped records with non-preferred concept-id’s would be retained in cause_source_value as a list VARCHAR[code1, code2, code3]
Example:
In the context of cause of death, ICD10(J96.21) maps to SNOMED(67905004 (Acute-on-chronic respiratory failure)) would be the preferred term and used as the cause_id.
Both codes would be included in cause_source_value as a list [67905004, 389086002] with the preferred term being first.
Rationale:
Hypoxia is a disposition (an intrinsic characteristic) of the object(person) in the state of reduced oxygen supply to tissue. - This state can be both chronic (e.g., COPD) and emergent (e.g., asphyxiation). - This state can be the result of singular or plural system deterioration and is an inherent feature of the object(person) respiratory or circulatory system.
Acute-on-chronic respiratory failure is both a disposition and a role (context dependent) of the object(person) in the state of critically deteriorated respiratory function - This state is temporary (it will either be recovered from or be fatal). - This state is not an inherent feature of the respiratory system, it is contingent on the current state of the object(person) health.
Acute-on-chronic respiratory failure would be the ‘preferred’ concept_id as a cause of death because it more accurately aligns with a direct cause of death.
Pros:
Increase accuracy for cause of death.
Tooling would not need to be adjusted or re-coded.
Cons:
Potentially significant undertaking by the Vocabulary WG to create ‘preferred’ concept lists for cause of death for each vocabulary.
We would need to find a feasible way to distribute the ‘preferred’ concept list. This could through using one of the concept tables or establishing a ‘Death’ domain.
VARCHAR is a bad way to save lists and not all programming languages and database systems interpret lists the same way (this more of a technical issue for reproducibility or reversing an ETL)
Now that the Holidays are over, I’m getting back to this.
My original solution was ETL-centric, since I’m an ETL guy. I don’t have any idea of the ‘re-tooling’ that will be needed as a result if it. Does anyone know specifically what re-tooling will be required? To which products?
Hayden’s solution requires a heavier ETL modification, which I suppose is to be preferred overall. But I’m not sure who would be responsible to create preferred concept_ids or how to go about doing it.
Overall, I think the original proposal is less complicated.
This is not a new issue. Everyone who has cause_of_death in the death table has had to deal with in in some way.
This is another case where forcing document type data into a relational data format causes issues. What ever the solution is chosen, no data should be dropped, no matter how painful it is to us ETL’er; I do not like the idea of dropping any concepts.
As outpatient only, we rarely have the cause of death, almost all of ours will be ‘0’. Perhaps I am not the best one to be commenting on this topic but I hate the idea of lost data.
While Hayden’s solution is interesting and accurate from a physiological point of view, it doesn’t solve for the case when a site has more than one source for death data.
@Chris_Knoll, as the OHDSI forums Atlas guru, will having > 1 death record for a person break or cause unnecessary harm to an Atlas study? The death records for a person would all have the same date. Only the cause_concept_id and death_type_concept_id would be allowed to be different between death records for a person.
Not that I can think of…the way it works is Death criteria just looks at the Death table, I don’t think there would be an issue if there were multiple records in there.
This issue has been out here for some time without any objections. Here is one last call for any objection before we put it on the Themis agenda for ratification.
@katy-sadowski Do we have a one death record per person check?