OHDSI Home | Forums | Wiki | Github

Duplicate Diagnosis [THEMIS WG3]

Looks like I really stepped in it with this THEMIS recommendation. :sweat_smile: Great points made.

While I havenā€™t discussed with THEMIS WG3 I may propose to back out this recommendation. I think @jenniferduryea has highlighted for me this recommendation might lead to more harmful behavior than good and @MPhilofsky example is a shining example of where this recommendation is dangerous. I think @Gowtham_Rao says it best here . . .

Iā€™m still interested in if others have feedback and I will take this thread to our next THEMIS WG3 meeting.

1 Like

We need two visit_type_concept_idā€™s that are descendants of 44818518 Visit derived from EHR record http://www.ohdsi.org/web/atlas/#/concept/44818518

Visit derived from EHR billing record
Visit derived from EHR encounter record

Note: hierarchy among _type_concept_id
I have updated the issue request here https://github.com/OHDSI/Vocabulary-v5.0/issues/156

@Chris_Knoll this makes https://github.com/OHDSI/Atlas/issues/521 important - because we are adding descendant/hierachy to _type_concept_id

@Gowtham_Rao,

We donā€™t want to create two visit records from one encounter. We only have one visit. We want to leave the distinction as close to the source as possible. We want to distinguish the source of the condition, either it came from a billing table or it came from an encounter table.

Yup - the construct of the visit is complex. Generally a visit is defined as the unique combination of person_id, visit_start_date, care_site_id.

What I have seen as being consistently re-deliberated is ā€“ whether this definition of visit should be handled at the ETL time or the analytic time. I believe it should be at the analytic time, and the ETL should have as much provenance to the source data as possible - keep record level referential integrity. Otherwise, the assumption made during ETL will propagate to all downstream analysis - make it difficult to generalize the findings, as the assumptions are not overtly stated.

In your particular case, I dont know the answer ā€“ because it depends on how the source system is handling the data. If it is data from two different source systems for the same person (i.e. billing system and encounter system), then I think it should be two different records in the visit_occurrence; because that will allow for lineage to the source.

1 Like

I agree with this completely. However, I know that others have clearly stated that OMOP is an analysis data model, where these decisions should be baked into the ETL. It would be good to get a definitive opinion on this.

On a related note, there needs to be an unambiguous definition of a visit. You could argue that provider needs to be part of the definition. I believe THEMIS is working on this?

1 Like

Friends:

Apart from the Visit problem: Did you end up with a compromise with the deduping? Or a recommendation? Did you collect USE CASES?

@Christian_Reich - I think we landed on that we shouldnā€™t make any recommendation with deduping as the potential for harm is greater than the small gains that you could make in cleaning up the duplication.

Let me be more clear the conclusion to this thread:

We will not be making a recommendation to deduplicating diagnosis. We believe the opportunity for error is higher than what would be gained.

3 Likes
t