OHDSI Home | Forums | Wiki | Github

Missing NDCs remote studies

There are 150K+ unmapped NDC concepts.
We are going to make these mappings.
But before doing this we need to know if it worth it and what concepts are really used,
so it would be helpful to run such a remote study on datasets you have:

Select concept_code, concept_name
from drug_exposure
join concept c on concept_id=drug_source_concept_id
left join concept_relationship r on r.concept_id_1 = c.concept_id and r.invalid_reason is null and r.relationship_id =‘Maps to’
where c.vocabulary_id=‘NDC’ and r.concept_id_1 is null
;

@Dymshyts:

Wonderful initiative. Can you put it here and here, with a short abstract what you are trying to achieve, and the code? Many folks will not have created Source Concepts, we will need to get them to send us spreadsheets.

You can get started with these:

I know that @ericaVoss and @Chris_Knoll created lists like that.

Let me know where this lands. We will contribute to the results.

Great project and approach!

We have some limited NDC data from 1991 to 2017 and would like to help out by providing what NDCs are needed (more than others). (prioritizing some out of the 150k of unmapped)

btw - last 2 links (redmine.xxxx.com/xxxxx) require login. can you post screenshot of what would we see if logged in?

They are an obsolete backlog system for the vocab team, Vojtech. @Dymshyts can open that, get those NDC lists and incorporate. Don’t worry.

Here is a first bit.

using limited data (from one year only), we see 6k missing mappings.

R code that uses this helper tool

#object ba has data on NDC
#map one year to OHDSI
#load NDC map
#folder is expected to contain files parsed athena CSV files 
athena_folder='n:/athena'
load(file.path(athena_folder,'athena.rda'))

#get NDC cuis
ndc<-concept %>% filter(vocabulary_id=='NDC')
names(concept_relationship)
map<-concept_relationship %>% inner_join(ndc,by=c('concept_id_1'='concept_id'))


#join to map
names(map)
bb<-ba %>% select(NDC) %>% distinct() %>% left_join(rename(map,NDC=concept_code))
table(is.na(bb$concept_id_2))


> table(is.na(bb$concept_id_2))

FALSE  TRUE 
35618  6366

It would be also helpful to ask people to post the number of drug exposure records affected by the missing relationships (grouped by NDC). This way, we can prioritize the work based on the most commonly used NDCs.

I think we should treat this as a proper OHDSI Network Study as @Christian_Reich suggested and I would be happy to start the protocol and GitHub repo. I think once we have a repo we can collaborate on code better than sharing back and forth on the forum/email. Everyone okay with that?

I agree with @abedtash_hamed that we should add counts to help us prioritize.

The code found in my NDC issue in the Vocabulary Repo might be useful for our work here. I basically try to patch the Vocabulary by pushing out the VALID_START_DATE and VALID_END_DATE prior to processing a CDM. We treat NDCs as a date centric vocabulary and have noticed the dates are too tight.

Some things I’d like to learn from this study:

  • What NDC codes do people have that don’t map to a CONCEPT_ID
  • For NDCs that do map, what are the dates that you are finding them in the raw data?
  • Percent of drug exposures an individual NDC makes up for a DB, this helps with the priority but doesn’t provide exact patient counts by DB
  • Could we characterize any trends in the NDCs

Right now I’m tracking the following data participants, but we can obviously ask for more once we have a proper request put together.
@ericaVoss
@Vojtech_Huser
@MPhilofsky

2 Likes

I created us a repo and generated a high-level protocol. Until our protocol is ready we need to work in the sandbox.

#NDC Review Study Protocol Sandbox:

If you have code snippets let’s start to share them here. I imagine at the end we’ll have an R script or SQL Script that users can grab and run on their side to produce results.

Last week - I provided unmapped NDC over email to help with the study.

Having a “source of truth” resource for a given NDC would be nice.

I have following questions:

When was first NDC code assigned? In other words, since when we can expect NDC data (e.g., since 1991?)

Assuming FDA is the key agency - is their .txt download the best source to decide what a given NDC is? (proving a month-year and NDC code) (e.g., the following NDCs are interesting

This NDC with code 57866705401 (retrievable here http://www.ohdsi.org/web/atlas/#/search/57866705401 ) is metoprolol tartate.
Yet by some our data - the generic name for this NDC is MORPHINE SULFATE.
The NDC is not findable via other services such as RxNav or FDA resource

Another is
https://rxnav.nlm.nih.gov/REST/ndcstatus.json?ndc=58016098300

By FDA source - I mean https://www.fda.gov/Drugs/InformationOnDrugs/ucm142438.htm

Another important clarification is to clearly/specify our problem

Let’s consider situation
situation A) NDC is NOT in athena and has NO mapping to RxNorm
situation B) NDC IS in athena and has NO mapping to RxNorm

Consider those two NDCs from 1992

1992,00998000000
1992,11701000201

(one is A and one is B)

Obviously, there is also situation C) NDC IS in athena and HAS mapping
situation C has mutliple sub-scenarios of C1,C2 and C3 (how many mappings over time, etc…)

See more examples here: (view B has names) (sorry for too many columns and R dplyr suffixes)

Thanks, @Vojtech_Huser. However, you need to add:

  • Names/descriptions
  • Timing of use

Can you add that?

NDCs exist since the early 80ies.

No. They dropped the ball in the late 80ies. The key agency is the NLM, and then there are private repositories (Red Book, FDB, Medi-Span, etc.)

Because we got it from another source, most likely from Dailymed and the SPL.

Because we got it from another source and cobbled the mapping together. But this really should go to the NLM.

That’s what we get from RxNorm, and that’s what we want to use as a standard routine way of maintaining.

t