I want to raise an issue regarding how SPL concepts are functioning within the drug hierarchy, specifically in contrast to true classification vocabularies like ATC.
Currently, the ATC vocabulary serves as a reliable hierarchy for the higher-order anatomic and therapeutic classification of RxNorm drugs. Because it provides true higher-order groupings, ATC is highly effective for broad analytical tasks, such as propensity score matching.
However, when pulling the ancestors of RxNorm ingredients, the hierarchy also returns SPL terms that have been designated as Classification (standard_concept = 'C') concepts. The problem is that these SPL terms do not act as true classifications. Instead, they are a chaotic mix of higher-order groupings and lower-order, highly specific dosages and routes of administration.
The fundamental purpose of the âCâ designation is to provide a standard vocabulary entry for grouping and classification. The SPL vocabulary breaks this paradigm when it places a concept containing specific dosages and delivery forms above a broad base RxNorm ingredient.
While we currently rely on ATC for grouping, if an analyst were to mistakenly trust the âCâ designation on these SPL concepts for hierarchical roll-ups or matching, it would create a mess, pulling in highly specific formulation strings rather than clean therapeutic classes.
Here is the data demonstrating the issue with sertraline, though this systemic issue affects RxNorm ingredients universally:
1. The concepts in question:
Notice that the SPL concept is designated as a âCâ (Classification) concept, despite containing multiple specific strengths and a route.
select * from concept where concept_id in (19079497, 739209, 45633072, 739138) ORDER by concept_id;
| concept_id | concept_name | domain_id | vocabulary_id | concept_class_id | standard_concept | concept_code | valid_start_date | valid_end_date | invalid_reason |
|---|---|---|---|---|---|---|---|---|---|
| 739138 | sertraline | Drug | RxNorm | Ingredient | S | 36437 | 1970-01-01 | 2099-12-31 | |
| 739209 | sertraline 50 MG Oral Tablet | Drug | RxNorm | Clinical Drug | S | 312941 | 1970-01-01 | 2099-12-31 | |
| 19079497 | sertraline 25 MG Oral Tablet | Drug | RxNorm | Clinical Drug | S | 312940 | 1970-01-01 | 2099-12-31 | |
| 45633072 | sertraline hydrochloride 25mg/1 / 50mg/1 / 100mg/1 ORAL TABLET | Drug | SPL | Prescription Drug | C | 6c120728-527e-4e79-9951-5af61f9e3480 | 2013-05-15 | 2099-12-31 |
2. The SPL concept showing up as an ancestor:
select * from concept where concept_id in (select ancestor_concept_id from concept_ancestor where ancestor_concept_id=45633072 and descendant_concept_id=739138));
| concept_id | concept_name | domain_id | vocabulary_id | concept_class_id | standard_concept | concept_code | valid_start_date | valid_end_date | invalid_reason |
|---|---|---|---|---|---|---|---|---|---|
| 45633072 | sertraline hydrochloride 25mg/1 / 50mg/1 / 100mg/1 ORAL TABLET | Drug | SPL | Prescription Drug | C | 6c120728-527e-4e79-9951-5af61f9e3480 | 2013-05-15 | 2099-12-31 |
Because the concept_ancestor table routes from the highly specific SPL formulation (45633072) down to the RxNorm Ingredient (739138), the hierarchy is logically inverted.
Could the Vocabulary team review the relationship logic between SPL Prescription Drug concepts and RxNorm Ingredient concepts? These mixed-dosage SPL terms do not seem appropriate for the âCâ designation or for acting as higher-order ancestors to base ingredients.
Thanks!