Issue with SPL dosage/route concepts functioning as 'Classification' ancestors to RxNorm Ingredients

I want to raise an issue regarding how SPL concepts are functioning within the drug hierarchy, specifically in contrast to true classification vocabularies like ATC.

Currently, the ATC vocabulary serves as a reliable hierarchy for the higher-order anatomic and therapeutic classification of RxNorm drugs. Because it provides true higher-order groupings, ATC is highly effective for broad analytical tasks, such as propensity score matching.

However, when pulling the ancestors of RxNorm ingredients, the hierarchy also returns SPL terms that have been designated as Classification (standard_concept = 'C') concepts. The problem is that these SPL terms do not act as true classifications. Instead, they are a chaotic mix of higher-order groupings and lower-order, highly specific dosages and routes of administration.

The fundamental purpose of the ‘C’ designation is to provide a standard vocabulary entry for grouping and classification. The SPL vocabulary breaks this paradigm when it places a concept containing specific dosages and delivery forms above a broad base RxNorm ingredient.

While we currently rely on ATC for grouping, if an analyst were to mistakenly trust the ‘C’ designation on these SPL concepts for hierarchical roll-ups or matching, it would create a mess, pulling in highly specific formulation strings rather than clean therapeutic classes.

Here is the data demonstrating the issue with sertraline, though this systemic issue affects RxNorm ingredients universally:

1. The concepts in question:
Notice that the SPL concept is designated as a ‘C’ (Classification) concept, despite containing multiple specific strengths and a route.

select * from concept where concept_id in (19079497, 739209, 45633072, 739138) ORDER by concept_id;
concept_id concept_name domain_id vocabulary_id concept_class_id standard_concept concept_code valid_start_date valid_end_date invalid_reason
739138 sertraline Drug RxNorm Ingredient S 36437 1970-01-01 2099-12-31
739209 sertraline 50 MG Oral Tablet Drug RxNorm Clinical Drug S 312941 1970-01-01 2099-12-31
19079497 sertraline 25 MG Oral Tablet Drug RxNorm Clinical Drug S 312940 1970-01-01 2099-12-31
45633072 sertraline hydrochloride 25mg/1 / 50mg/1 / 100mg/1 ORAL TABLET Drug SPL Prescription Drug C 6c120728-527e-4e79-9951-5af61f9e3480 2013-05-15 2099-12-31

2. The SPL concept showing up as an ancestor:

select * from concept where concept_id in (select ancestor_concept_id from concept_ancestor where ancestor_concept_id=45633072  and descendant_concept_id=739138));
concept_id concept_name domain_id vocabulary_id concept_class_id standard_concept concept_code valid_start_date valid_end_date invalid_reason
45633072 sertraline hydrochloride 25mg/1 / 50mg/1 / 100mg/1 ORAL TABLET Drug SPL Prescription Drug C 6c120728-527e-4e79-9951-5af61f9e3480 2013-05-15 2099-12-31

Because the concept_ancestor table routes from the highly specific SPL formulation (45633072) down to the RxNorm Ingredient (739138), the hierarchy is logically inverted.

Could the Vocabulary team review the relationship logic between SPL Prescription Drug concepts and RxNorm Ingredient concepts? These mixed-dosage SPL terms do not seem appropriate for the ‘C’ designation or for acting as higher-order ancestors to base ingredients.

Thanks!

Quick follow-up: looking at the concept_relationship table shows this inverted hierarchy actually creates a cycle:

select * from concept_relationship where concept_id_1 = 45633072;
concept_id_1 concept_id_2 relationship_id valid_start_date valid_end_date invalid_reason
45633072 19079497 SPL - RxNorm 2015-01-29 2099-12-31
45633072 739209 Maps to 1970-01-01 2099-12-31
45633072 739209 SPL - RxNorm 2015-01-29 2099-12-31
45633072 739207 SPL - RxNorm 2015-01-29 2099-12-31

The Contradiction:

  1. The specific “sertraline hydrochloride 25mg/1 / 50mg/1 / 100mg/1 ORAL TABLET” SPL concept (45633072) acts as a Classification (C) ancestor to the broad base RxNorm Ingredient.
  2. Simultaneously, it “Maps to” RxNorm “sertraline 50 MG Oral Tablet” (739209), which is a descendant of that exact same ingredient.

I’m thinking the 45633072 concept cannot logically be an ancestor of the RxNorm ingredient while also mapping to a specific dosage form that falls under that RxNorm ingredient.

Hi @Christophe_Lambert:

You realize that the likelihood of a Forum response is reverse proportional to the length of your post? :slight_smile: But that’s all right. I will make an exclusion from that rule.

The problem with the SPL is that it really doesn’t fit any of our concept classes. It isn’t a branded drug, because it is often provided generically. It isn’t a clinical drug, because it often covers several dose forms of an ingredient. But not all of them. To make it work with our system, we use the hierarchical relationships, because we cannot map those umbrella SPLs to many different drug concepts.

What’s your use case? How do you want to utilize the SPLs?

Hi Christian,

I appreciate you always asking for the use case! To answer your question directly: we actually don’t want to use SPL. We just want to ensure it isn’t breaking our propensity models.

Our use case is an OHDSI network study involving many pairwise comparisons with propensity score matching (PSM). Here is the workflow issue we are running into:

  • We need to automate the removal of comparator drugs and their ancestors from PSM covariate construction.
  • Atlas uses classification concepts (standard_concept = 'C'), like ATC classes, to build these covariates.
  • Because SPL codes also share this ‘C’ status and sit above ingredients in the concept_ancestor table, we are concerned Atlas will walk the SPL hierarchy and pull these messy, mixed-dosage concepts in as covariates at various network study sites.

This leaves me with a few main questions:

  1. Does Atlas automatically filter out certain vocabularies (like SPL) when generating covariates for network studies? If it strictly walks ATC, we are fine.
  2. What is the guaranteed behavior for walking ancestors? It seems strange that the system wouldn’t just walk all ingredient ancestors in concept_ancestor. If SPL is included, it introduces a lot of noise—which makes me worry about what other unvetted vocabularies might do the same if they happen to be loaded from Athena.
  3. What is the actual use case for structuring SPL this way? Outside of our immediate Atlas concerns, the mappings just seem counterintuitive. Was there a specific analytical reason they were built to sit above ingredients like this?

Thanks again for looking into this!

Ha! I love it. Turning the killer argument “what’s your use case” around!

I don’t know what the covariate selection module does, @schuemie or somebody should answer that who knows how it is built. But even if it took the SPLs as aggregating classifiers as is - I don’t think the noise it introduces matters to propensity score generation. The regularization would kick them out anyway.

Our use case: The SPLs contain the indication and side effect warning of drugs. We wanted to allow folks to use this information in their research. @jon_duke used to do a lot of that work.

1 Like

If SPLs are used for aggregated classifiers, it seems unlikely they would all be regularized out, as they have lots of high-order umbrella terms that could be predictive of treatment. We saw that the treatment and comparator cohort codeset ATC ancestors had to be excluded as they were being incorporated into the propensity score model by default.

For now we are proceeding by assuming SPL gets ignored in any propensity score adjustment produced by any version of Atlas.

I still advocate that RxNorm codes, including ingredients should not be mapped to SPL ancestor concepts in concept_ancestor. Thanks much for your expertise, Christian.