A recent use case we got from a group seeking to define thrombotic events within a given patient cohort relies on a tiered confidence-level approach. In this approach, the lowest confidence level is the existence of a diagnosis corresponding to the corresponding thrombotic event. The second highest is the existence of a diagnosis coupled with an order for an anticoagulant within a given time frame (or an imaging procedure, etc etc). The highest then builds on the previous, with the diagnosis plus an order plus a lab value within a given range (or some other qualifying observation, based on the specific type of clot).
We’re trying to build this out as a custom mart based on CDM tables, and in doing so, we ran into a weird issue. One of the thrombotic events the group was seeking to define was placental thrombosis. For this, they wanted to use as the first criteria a diagnosis corresponding to an ICD-9-CM code of 663.6* or an ICD-10-CM code of O43.818. When we ran our initial code, we found tens of thousands of placental thromboses - which seemed a little odd, given the overall low incidence of this condition.
Digging into this a bit further, I found that the issue is attributable to a mapping in CONCEPT_RELATIONSHIP:
select top 1000 c.concept_name, c.concept_code, c.vocabulary_id, c1.concept_id, c1.concept_name, c1.concept_code, c1.vocabulary_id from concept c
join concept_relationship cr on c.concept_id = cr.concept_id_1
join concept c1 on c1.concept_id = cr.concept_id_2
where c.concept_code like '663.6%' or c.concept_code = 'O43.813'
and relationship_id = 'Maps to'
between the standard Condition concepts “Finding related to pregnancy” and “Postpartum finding” and the given ICD-9 codes. In other words, since almost every gravida had at least one observation that mapped to these terms, and since a valid mapping exists between the ICD-9 code for placental thrombosis, we were flagging anything related to pregnancy as a placental thrombosis.
This ties into a larger issue - how are we supposed to handle one-to-many mappings between source and standard concepts? When we have a record with an ICD-9 code in the source data, and there are two valid mappings in concept_relationship to standard concepts, are we supposed to create two records in condition_occurrence with different condition_occurrence_ids (even though they are actually the same diagnosis in the source data)? If not, how are we supposed to pick which one to use? Some instances, like this one:
select c.concept_name, c.concept_code, c.vocabulary_id, c.standard_concept from concept c join CONCEPT_RELATIONSHIP cr on cr.concept_id_2 = c.concept_id join concept c1 on c1.concept_id = cr.concept_id_1 where c1.concept_code = '764.10' and c.vocabulary_id = 'SNOMED'
seem to have one clearly superior mapping - but there doesn’t seem to be any clear way to determine which one is “better.” We are currently taking the max() of the mapping to pick one standard concept, but, as this problem indicates, that isn’t always correct - because sometimes we get mappings to non-specific SNOMED codes that also map to other, more specific ICD-9s or 10s and yield false positives like this. However, even if we accepted the cross-join and created a condition_occurrence record for each valid mapping, we’d still run into the same issue with these SNOMED codes.
So out of this epistle emerge two salient questions:
-
How can we allow users to pass ICD-9 codes as definitions and identify patients without running into issues like this?
-
How do we handle one-to-many source-to-concept maps from ICD-9 or ICD-10 to SNOMED?
@aostropolets, @Dymshyts, @Christian_Reich, thoughts?