Looks like I really stepped in it with this THEMIS recommendation. Great points made.
While I havenāt discussed with THEMIS WG3 I may propose to back out this recommendation. I think @jenniferduryea has highlighted for me this recommendation might lead to more harmful behavior than good and @MPhilofsky example is a shining example of where this recommendation is dangerous. I think @Gowtham_Rao says it best here . . .
Iām still interested in if others have feedback and I will take this thread to our next THEMIS WG3 meeting.
We donāt want to create two visit records from one encounter. We only have one visit. We want to leave the distinction as close to the source as possible. We want to distinguish the source of the condition, either it came from a billing table or it came from an encounter table.
Yup - the construct of the visit is complex. Generally a visit is defined as the unique combination of person_id, visit_start_date, care_site_id.
What I have seen as being consistently re-deliberated is ā whether this definition of visit should be handled at the ETL time or the analytic time. I believe it should be at the analytic time, and the ETL should have as much provenance to the source data as possible - keep record level referential integrity. Otherwise, the assumption made during ETL will propagate to all downstream analysis - make it difficult to generalize the findings, as the assumptions are not overtly stated.
In your particular case, I dont know the answer ā because it depends on how the source system is handling the data. If it is data from two different source systems for the same person (i.e. billing system and encounter system), then I think it should be two different records in the visit_occurrence; because that will allow for lineage to the source.
I agree with this completely. However, I know that others have clearly stated that OMOP is an analysis data model, where these decisions should be baked into the ETL. It would be good to get a definitive opinion on this.
On a related note, there needs to be an unambiguous definition of a visit. You could argue that provider needs to be part of the definition. I believe THEMIS is working on this?
@Christian_Reich - I think we landed on that we shouldnāt make any recommendation with deduping as the potential for harm is greater than the small gains that you could make in cleaning up the duplication.