OHDSI Home | Forums | Wiki | Github

Checking for INVALID_REASON for concepts found using 'Maps To' relationship

@Christian_Reich,
Working on a study, we’re using NDC codes that are mapped to standard concepts. We have the list of standard concepts in a sub query. I have a question about the concepts that appear in CONCEPT_ID_1 where the concept_relationship INVALID REASON is null, but there’s a CONCEPT_ID_1 value that references a concept that has a non-null INVALID_REASON.

I’m working with 2 queries now, one where we find all standard concepts in concept_relationship’s CONCEPT_ID_2, and the concept_relationsihp’s INVALID REASON is not null, and we’ve also enforced that the set of standard concepts have INVALID_REASON is not null.

The difference becomes when we check that the NDC concept (found in concept_relationships CONCEPT_ID_1) has a null INVALID REASON. If we make sure that the concept in CONCEPT_ID_1 is INVALID_REASON is NULL, we drop records compared to when we don’t check for INVALID_REASON for the concept found in CONCEPT_ID_1.

The question is: why do we have a valid concept relationship (ie: the concept_relationship INVALID_REASON is null) for a relationship that is mapping an invalid source concept using the ‘Maps To’ relationship? Should I be checking that both the concept in concept_id_1 has INVALID_REASON is null, and concept_id_2 has INVALID_REASON is null, and the concept_relationship also has INVALID_REASON is null?

Thanks!

-Chris

I could try to help here. But it might help Christian if you show him an example query and an example record of interest - but I think I know what query you are talking about.

An NDC code can be valid at a certain point of time. Even if it is currently invalid, meaning no one should be able to write a script with that NDC, it doesn’t mean it was always invalid. If NDC 123 was good in 2003, we want to map it to a CONCEPT_ID if we have a prescription record from 2003 with that NDC 123 on it. The relationship is valid, just the source code is no longer being used today. If our CDMs are done right, the proper concept mapping would take this into account. At Janssen the next round of CDMs will have this applied. Here is the thread where @Christian_Reich talks about that here.

@Chris_Knoll if I’m misunderstood your question let me know and @Christian_Reich please correct me if I got something wrong here!

Thanks for your help, Erica.

The crux of my question relates to a record in concept relationship that describes a valid relationship between source concept A and standard concept B, but at the time the source concept A is invalid. I’m not sure if that’s the case you were describing in your message. I don’t see in your example where a concept relationship was invalid in 2003 when the NDC 123 is good. But that’s the center of my question: why would I find a valid concept relationship betwen NDC 123 to standard concept ABC in 2003 when NDC is not valid in 2003? or, to put another way where NDC 123 is valid only in 2003, why would I find a valid CR between NDC 123 in 2005 when NDC 123 is not valid in 2005 (it is only valid in 2003)?

-Chris

I have an example of what I’m refering to:

codeset_name	vocabulary_id	SOURCE_CODE	concept_name	source_start	source_end	mapping_start	mapping_end
All Biologics	NDC	00002053905	Cyclosporine 100 MG/ML Oral Solution [Sangcya]	2009-07-01	2011-02-01	2009-07-06	2099-12-31

So, this NDC (00002053905) has a valid range of 2009-07-01 to 2011-02-01. However, the concept relationship I found that maps this NDC over to a Standard Concept is good from 2009-07-06 to 2099-12-31 (in other words, it is currently valid). Is the lesson here to look both at the concept relationship valid_start/valid end along with the concept’s valid_start/valid_end when using anything out of concept_relationship?

@Chris_Knoll, the way I think of it is the old NDC 00002053905-Cyclosporine 100 MG/ML Oral Solution [Sangcya] has a valid realtionship to this RxNorm CONCEPT_ID 19010521-Cyclosporine 100 MG/ML Oral Solution [Sangcya]. Just because the source code isn’t being used any more doesn’t mean that that NDC source code doesn’t have a direct map to something in RxNorm. The relationship is valid just the source code is time bound.

I think I kind of see where your heads at (why isn’t the CONCEPT_RELATIONSHIP mapping time bound with the source code). Don’t disagree with it but it would add another date check to the query. Maybe @Christian_Reich could add his 2 cents on what triggers a CONCEPT_RELATIONSHIP to become invalid.

I used this query in this case:

SELECT c.concept_code AS SOURCE_CODE, c.concept_id AS SOURCE_CONCEPT_ID, c.concept_name AS SOURCE_CODE_DESCRIPTION, c.vocabulary_id AS SOURCE_VOCABULARY_ID, 
    c.domain_id AS SOURCE_DOMAIN_ID, c.CONCEPT_CLASS_ID AS SOURCE_CONCEPT_CLASS_ID, 
    c.VALID_START_DATE AS SOURCE_VALID_START_DATE, c.VALID_END_DATE AS SOURCE_VALID_END_DATE, c.INVALID_REASON AS SOURCE_INVALID_REASON, 
    cr.*,
	c1.concept_id AS TARGET_CONCEPT_ID, c1.concept_name AS TARGET_CONCEPT_NAME, c1.VOCABULARY_ID AS TARGET_VOCABUALRY_ID, c1.domain_id AS TARGET_DOMAIN_ID, c1.concept_class_id AS TARGET_CONCEPT_CLASS_ID, 
    c1.INVALID_REASON AS TARGET_INVALID_REASON, c1.standard_concept AS TARGET_STANDARD_CONCEPT
FROM CONCEPT C
        JOIN CONCEPT_RELATIONSHIP CR
                ON C.CONCEPT_ID = CR.CONCEPT_ID_1
                AND CR.invalid_reason IS NULL
                AND cr.relationship_id = 'Maps To'
        JOIN CONCEPT C1
                ON CR.CONCEPT_ID_2 = C1.CONCEPT_ID
                AND C1.INVALID_REASON IS NULL
WHERE c.concept_code = '00002053905'

Hi, @ericaVoss,
Yes, that’s the exact query that led me into my question. I think it’s very dangerous (as in could lead you to improper results) relying solely on INVALID_REASON for purposes of determining if a relationship exists. Every question about ‘does X Map To Y’ has to be applied to a corresponding time index. It’s much like asking ‘how many 50year olds are in your db?’ you can’t answer that unless you give it an ‘as of when?’ answer.

So, to your query: I don’t think you’d wan to add the CR.invalid_reason is null or the C1.INVALID_REASON is null. because even tho there might be an invalid reason, you may be asking about a concept or a Maps To relationship at a time index where the concept was actually valid. The joins you have set up above are actually restricting relationships that are ‘currently valid’ but most/all of our studies with observational data are retrospective, so nothing in the data is ‘current’. So, only feedback on the above query I’d give is to remove the IS null chekcs, and ensure that you check the source_valid_start/end_date along with the cr.valid_start/end_dates when identifying concepts that are Mapped.

@Christian_Reich, we need some expert input here! :smile:

-Chris

Agree we need @Christian_Reich because my understanding is this query is correct for how the Vocabulary is built today. My understanding is you can have a valid relationship even if the source code is no longer valid.

My vote, if I get a vote, would be to limit the places you need to do a date check plus if you are going the opposite way the date aspect makes less sense to me (if I have an RxNorm and want its NDCs - just because the NDC is old doesn’t mean you shouldn’t be thinking about it).

@Chris_Knoll:

Two situations here. First let me explain how it should be:

  • The invalid_reason of concept_id_1 can be whatever it might be. Don’t check.
    • In your case of NDC, the invalid_reason and valid_start_ and valid_end_date indicate when a certain NDC was valid, ie. used to denote a product in the market. But your data do not necessarily care and neither should you.
  • The invalid_reason of concept_id_2 should be null. Otherwise the mapping record has a bug. So, you don’t have to check.
  • The invalid_reason of the CONCEPT_RELATIONSHIP record absolutely needs to be null. Otherwise, the mapping might be wrong. We are thinking to exclude those records from the standard download process altogether, because you as a user will never need those where invalid_reason = ‘D’

Bottom line: Only check the CONCEPT_RELATIONSHIP record.

Now, in realilty you will find a few mappings to a foul concept_id_2. The reason is that the vocabularies get updated all the time and pull the rug out of some of the concept that have mapping relationships. We need to figure out a way to deal with that efficiently. It’s in current development. But till then it might be safe to check concept_id_2 as well.

Thanks @Christian_Reich, but I still have a gap: when to evaluate valid_start_date and valid_end_date.

Sounds like you’re saying that concept relationships must have a null invalid reason if they are to be applied at all. Easy enough to implement. But what about an NDC exposure that was recorded on 1/1/2007 and a corresponding concept relationship that has a valid start of 1/1/2008 - 1/1/2012? I’m assuming that means that the mapping did not exist at 1/1/2007 so that NDC won’t map to anything.

-Chris

@Chris_Knoll, your example is correct - I would map that old drug to 0 in a CDM ETL. For NDC/DRG you need to check the source record date to select the proper SOURCE_CODE. For all other source codes you can just set INVALID_REASON IS NULL to select the most up-to-date version of the code (e.g. ICD9, CPT4, etc).

Ok, so I think I have my strategy. As for some background:

I’m doing a study that involves TNFi and non-TNFi biologics. We’re identifying the drugs by using the concept "Disease Modifying Anti-Rheumatoid Drugs (DMARD)’ plus adding in the ingredient rituximab. I learned recently that the mappings between the NDCs for drugs under this classification were incomplete based on the vocabulary version that our CDMs were built under, but a later vocab version had the NDCs as source codes with the proper mappings to standard concepts. So, since rebuilding wasn’t an option, I’m using the drug_source_value (which we know in our sources are NDCs) and by joining against this new vocabulary we were able to find more drug exposures that mapped to descendents of "Disease Modifying Anti-Rheumatoid Drugs (DMARD)’. A little spot checking led to the above thread question about when concept relationships should have null invalid reasons (always), but since I’m mapping NDCs all the way back to 2005, I needed to understand if the source concept that matched the NDC was valid for the source concept at the time of the exposure. So I believe the answer is ‘yes’ because i’m trying to map directly from source code values. Normally, after an ETL has done all this work, I woudln’t need to do this, but this is a special case.

I think i’m all squared away now.

-Chris

@Chris_Knoll:

Erica is right, and you would have to map the NDC validity with when it was actually put in the data.

But. NDC codes are reused so rarely, that we just switched off that feature. DRGs are the only ones where that works that way. So, all NDC codes are uniquely mapped, Which means you really need not check the dates. In addition, the fact that an NDC code is no longer available and has a valid_end_date doesn’t mean the pharmacies won’t keep selling the old inventory till the expiration date is hit.

Bottom line: Just take NDC codes in your list and use them. Forget the dates or invalid reasons.

@Christian_Reich: one question about those unique NDCs: one of the issues with this specific study was that (if I understand Patrick’s explanation correctly) some of the drugs in question got a ‘temporary’ NDC that was used to record the administration of the drug before the drug got a ‘finalized NDC’ code (I forget the circumstances on how this happens, but there’s some disconnect in the process where 2 different organizations deal with the drug and an NDC will be referenced by each of them, the first NDC being temporary). So: if this circumstance does happen, and these temporary NDCs are reused (once the drug in question receives a finalized NDC the temporary one could supposedly be used again at a later time).

In this case (if it exists) the NDC at time index one would be associated to one concept ID, and the same NDC woudl be associated to another Concept ID at a later time. I feel like this is where valid_start, valid_end is important.

I should also add that maybe why I saw this was that we were looking at data spanning from 2003 to 2014.

@Patrick_Ryan, am I remembering this correctly? I’m absolutely fine to take Christian’s guidance on NDC sources, but I feel like this specific issue hit me during this study (and will hit again on these type of retrospective drug studies)

I am doing some mapping as of now and just want to be sure I am doing the right ETL approach.

Per final answers, the correct code for mapping should not be checking for is_valid status.

Is this query correct as of May 2017?

    SELECT c.concept_code AS SOURCE_CODE, c.concept_id AS SOURCE_CONCEPT_ID, c.concept_name AS SOURCE_CODE_DESCRIPTION, c.vocabulary_id AS SOURCE_VOCABULARY_ID, 
        c.domain_id AS SOURCE_DOMAIN_ID, c.CONCEPT_CLASS_ID AS SOURCE_CONCEPT_CLASS_ID, 
        c.VALID_START_DATE AS SOURCE_VALID_START_DATE, c.VALID_END_DATE AS SOURCE_VALID_END_DATE,
         c.INVALID_REASON AS SOURCE_INVALID_REASON, 
        cr.*,
    	c1.concept_id AS TARGET_CONCEPT_ID, c1.concept_name AS TARGET_CONCEPT_NAME, c1.VOCABULARY_ID AS TARGET_VOCABUALRY_ID, c1.domain_id AS TARGET_DOMAIN_ID, c1.concept_class_id AS TARGET_CONCEPT_CLASS_ID, 
        c1.INVALID_REASON AS TARGET_INVALID_REASON, c1.standard_concept AS TARGET_STANDARD_CONCEPT
    FROM CONCEPT C
            JOIN CONCEPT_RELATIONSHIP CR
                    ON C.CONCEPT_ID = CR.CONCEPT_ID_1
                    --AND CR.invalid_reason IS NULL
                    AND cr.relationship_id = 'Maps To'
            JOIN CONCEPT C1
                    ON CR.CONCEPT_ID_2 = C1.CONCEPT_ID
                    --AND C1.INVALID_REASON IS NULL
    WHERE c.concept_code in ('250.00','00002053905','17110') --dm,drug ndc cylcosporine, cpt wart removal

Did you ever get an answer?

Not quite.

  1. The relationship_id is ‘Maps to’, not ‘Maps To’.
  2. You need to say WHERE invalid_reason IS NULL. Though we will soon stop distributing non-valid CONCEPT_RELATIONSHIP records, so it will no longer matter.

updated code is thus: @ericaVoss thank you for the forum formatting trick

SELECT c.concept_code AS SOURCE_CODE, c.concept_id AS SOURCE_CONCEPT_ID, c.concept_name AS SOURCE_CODE_DESCRIPTION, c.vocabulary_id AS SOURCE_VOCABULARY_ID, 
    c.domain_id AS SOURCE_DOMAIN_ID, c.CONCEPT_CLASS_ID AS SOURCE_CONCEPT_CLASS_ID, 
    c.VALID_START_DATE AS SOURCE_VALID_START_DATE, c.VALID_END_DATE AS SOURCE_VALID_END_DATE,
     c.INVALID_REASON AS SOURCE_INVALID_REASON, 
    cr.*,
    --changed erica code below a bit (to get 0 for not mapped concepts)
	isnull(cr.concept_id_2,0)  AS TARGET_CONCEPT_ID, c1.concept_name AS TARGET_CONCEPT_NAME, c1.VOCABULARY_ID AS TARGET_VOCABUALRY_ID, c1.domain_id AS TARGET_DOMAIN_ID, c1.concept_class_id AS TARGET_CONCEPT_CLASS_ID, 
    c1.INVALID_REASON AS TARGET_INVALID_REASON, c1.standard_concept AS TARGET_STANDARD_CONCEPT
FROM CONCEPT C
        LEFT JOIN CONCEPT_RELATIONSHIP CR
                ON C.CONCEPT_ID = CR.CONCEPT_ID_1
                AND CR.invalid_reason IS NULL
                AND cr.relationship_id = 'Maps to'
--changed erica's code to left join to keep unmapped concepts (like D0120)               
        LEFT JOIN CONCEPT C1
                ON CR.CONCEPT_ID_2 = C1.CONCEPT_ID                
WHERE c.concept_code in ('250.00','00002053905','17110','67028','V71.89','D0120','J9380') 
--dm,drug ndc cylcosporine, cpt wart removal ,cpt ,  ICD Proc ,dental code (has no mapping) , CPT proc drug
;

Well, looks good:)

1 Like
t