In our claims dataset, we have multiple records that have the same diagnosis codes. The way our claims data comes in, each record is one big gigantic line for each line item on the claim. This results in the same diagnosis is repeated many times, position and record wise. We do have the positions in our dataset as well, however we’ve been told that if it’s the same diagnosis in multiple positions, take the first position. Reason is for many of our analysis, we don’t have to do the de-duplication the same condition codes during the analysis phase.
We also have other EMR datasets that has duplicated diagnosis codes and in those datasets, there is no way to tell if it’s pirmary or secondary.
@MPhilofsky - we use the concept type to help us distinguish between different diagnosis types, like admitting, primary, and secondary.
@ericaVoss For procedures, the rule is based on a combination of fields. If there is a duplicate, then we increment the quantity by 1 therefore we are not losing any information. The combination of fields are person_id, date, procedure code, provider, visit_occurrence_id, and visit_detail_id. If this combination is not met, then we leave them as separate records.