In our recent work on finalizing the ETL for the ETL-CMS project we found numerous problems with the CDM v5 vocabulary files that have us scratching our head. I started to create a list, but realized it might be best to tackle these issues one at a time in a series of posts. I’m hoping to get clarity on why we see the following issues and how people have been accommodating these problems with their existing ETLs.
The vocabulary specification allows for a source concept to be mapped to more than one target concept. The type of use cases where this make the most sense are, for example, when a drug also requires a procedure to administer, so the source concept might have a “Maps to” to a target vocabulary RxNorm term as well as a CPT4 term. Unfortunately, most of the multiple mappings we see are not of this type. A drug term will map to two different drug terms, or an ICD9 code will map to two different places in the SNOMED vocabulary. This leads to at least two problems. The first problem is that if we create records for both terms (sometimes 3), it will look like multiple prescriptions were given, or multiple Conditions were diagnosed at different levels of specificity when there was only one. A more troubling issue is that some of these terms appear to be wrong. The source dosage matches the target dosage for one term, but not the other, or the source drug name matches one target correctly, but not the second target. For example:
The OHDSI concept_id 44844261, “Metoprolol Tartrate 25MG Oral Tablet”, from the NDC vocabulary, maps to the following two RxNorm terms:
- 40167213 Metoprolol Tartrate 25 MG Oral Tablet Drug
- 40166828 24 HR metoprolol succinate 25 MG Extended Release Oral Tablet Drug
The second one does not appear to be correct: succinate != Tartrate.
Another example:
44843559 Oxygen 99 % Gas for Inhalation Drug NDC 11-digit NDC 10109123403, maps to:
- 19025280 Oxygen 99 % Gas for Inhalation Drug RxNorm
- 19025301 Oxygen 99.5 % Gas for Inhalation Drug RxNorm
Why not only map to the first one, since the source concept is 99% oxygen, not 99.5%?
We see 20372 NDC codes with 2 or more mappings, and every one of those mappings goes to a Drug vocabulary term. Thus I think we may be mapping to problematic synonyms in some or all of these cases.
In the case of ICD9CM codes there are instances of conditions mapping to both a SNOMED Condition and a SNOMED observation, which seems reasonable. However, consider the ICD9CM example of 44820348 (E954), Suicide and self-inflicted injury by submersion [drowning]. It maps to 3 SNOMED codes:
- 435174 Suicide - drowning Observation (SNOMED:287192004)
- 440925 Suicide Observation (SNOMED: 44301001)
- 40421994 Suicide and selfinflicted injury by drowning Condition (SNOMED:219141008)
The first code (435174) is a more specific code that seems to be the appropriate mapping for the observation, whereas, the second code is the parent code for the first in the SNOMED hierarchy. It does not seem appropriate to include both the daughter and the parent code, as that is what the CONCEPT_ANCESTOR hierarchy is for.
More issues to follow, but I’ll stop here for tonight.