OHDSI Home | Forums | Wiki | Github

Populating CONDITION_OCCURRENCE using EPIC Clarity and its duplications

Hi everyone,

I am at Columbia University Medical Center, and we use Epic system’s Clarity for our OMOP.
We recently ran into an issue getting duplicated condition_occurrence rows from same day at different visits: two condition_occurrence rows for a patient with same diagnosis code, a same condition start date with different visit ids.

For example:
(Fabricated scenario…)
Patient A had an office visit on January 1, 2020 and an imaging visit for an X-ray for her arm in pain.
At her office visit, doctor noted one of her diagnosis as “Pain in arm, unspecified: M79. 603 (ICD-10)”
Right after her office visit, she had an X-ray at an imaging department and her diagnosis was also documented as “Pain in arm, unspecified: M79.603 (ICD-10)”
The two visits both took place at Columbia University, at different departments.

Outcome of this in our data was as follows:

  • visit_occurrence: we have two rows: one for office visit, one for x-ray imaging visit, each with different care_site_ids, happened on a same day, with even slightly overlapping time.
  • condition_occurrence: we have two rows with condition_source_concept_id = 45587064 for ICD-10 M79.603, one for office visit, and one for imaging visit.

Although this looks perfectly “reasonable” as patient was diagnosed at two different visits by different doctors, it is still a duplicate in condition because Patient A has two rows of one diagnosis code on exactly same day.

Has anyone using Clarity found this to be an issue and have handled it somewhat differently?
Any share of ideas/thoughts would be greatly appreciated!
Thank you!

I would leave the ETL as is. As you say, it is perfectly “reasonable” as patient was diagnosed at two different visits. What problems are created by having two almost duplicate Condition Occurrence records?

I agree with @DTorok; I do not consider the appearance of these diagnosis codes “duplicates” because they were coded by different providers in completely different contexts. To me, the fact that they occurred on the same day is irrelevant.

Consider this example: A primary care physician coding Type 2 diabetes mellitus as a chronic condition repeatedly over time as justification for billing a certain CPT code level of service. Are these “duplicates”? The patient’s chronic condition is expected to persist and there is a valid reason why the same diagnosis code will recur.

I don’t see any problem with this.

Actually, this is a good discussion: What kind of records are allowed to be duplicated, and which ones are not? Obviously, you cannot have more than one diabetes at the same day, it’s the same. But stating the diagnosis many times does not hurt (as you pointed out above). On the other hand, acute conditions like asthma attacks or myocardial infarctions can happen more than once a day (reinfarction), and it is very relevant to the interpretation of the disease.

Same is true for drugs and procedures. Drug abusers will try to get the same prescription several times in a day, but in most normal cases duplications are administrative artefacts and should be removed. You cannot have more than one appendectomy per day, but you can have more than one x-ray.

I am not sure how to formalize this and provide some kind of attribute telling ETLers to dedup or not. Any good ideas?