LAERTES: pre-load queries - hoi from X to snomed, rxnorm BD to CD

rkboyce · December 18, 2014, 4:48pm

@Christian_Reich and @ericaVoss - would you please check the following v5 queries that I am running prior to loading data into the LAERTES evidence base tables so that drugs and hois are coded as rxnorm ingredients/clinical drugs and snomed respectively? ‘%s’ is substituted with the concept id from the vocab on the left.

hoi from mesh to snomed:
SELECT B.CONCEPT_ID, B.CONCEPT_NAME, B.CONCEPT_CODE, A.CONCEPT_NAME, A.CONCEPT_CODE FROM CONCEPT A, CONCEPT B, CONCEPT_RELATIONSHIP CR WHERE A.CONCEPT_ID = %s AND CR.CONCEPT_ID_1 = A.CONCEPT_ID AND CR.CONCEPT_ID_2 = B.CONCEPT_ID AND CR.RELATIONSHIP_ID = 'Maps to' AND B.CONCEPT_ID = CR.CONCEPT_ID_2 AND B.VOCABULARY_ID = 'SNOMED';

hoi from meddra to snomed:
SELECT B.CONCEPT_ID, B.CONCEPT_NAME, B.CONCEPT_CODE, A.CONCEPT_NAME, A.CONCEPT_CODE FROM CONCEPT A, CONCEPT B, CONCEPT_RELATIONSHIP CR WHERE A.CONCEPT_ID = %s AND CR.CONCEPT_ID_1 = A.CONCEPT_ID AND CR.CONCEPT_ID_2 = B.CONCEPT_ID AND CR.RELATIONSHIP_ID = 'MedDRA - SNOMED eq' AND B.CONCEPT_ID = CR.CONCEPT_ID_2 AND B.VOCABULARY_ID = 'SNOMED';

rxnorm braded drug to clinical drug:

please recommend

thanks,
-R

ericaVoss · December 19, 2014, 3:29am

I’ll take the first shot at answering this. @Christian_Reich, please chime in if I’m wrong or if you would take a slightly different approach.

Two things I would consider:

Only using valid concepts (concepts can deprecate so I usually only use the non-deprecated ones but not everyone takes the approach)
If we know the MeSH term should be representing a condition, we may want to filter our SNOMED to ‘Clinical Finding’

Here is slightly updated code, which will pull back the complete MeSH to SNOMED map.

SELECT B.CONCEPT_ID, B.CONCEPT_NAME, b.vocabulary_id, a.CONCEPT_ID, A.CONCEPT_NAME, A.CONCEPT_CODE, a.vocabulary_id
FROM CONCEPT A, CONCEPT B, CONCEPT_RELATIONSHIP CR 
WHERE A.VOCABULARY_ID = 'Mesh' AND
CR.CONCEPT_ID_1 = A.CONCEPT_ID AND
CR.CONCEPT_ID_2 = B.CONCEPT_ID AND
CR.RELATIONSHIP_ID = 'Maps to' AND
CR.INVALID_REASON IS NULL AND
B.CONCEPT_ID = CR.CONCEPT_ID_2 AND
B.VOCABULARY_ID = 'SNOMED' AND 
a.INVALID_REASON IS NULL AND 
b.INVALID_REASON IS NULL AND 
b.CONCEPT_CLASS_ID = 'Clinical Finding'
ORDER BY B.CONCEPT_ID, a.CONCEPT_ID

Like I said, I rarely use the CONCEPT_RELATIONSHIP table so I learned something tonight with the ‘MedDRA - SNOMED eq’.

However I would make similar recommendations as I stated above.

SELECT B.CONCEPT_ID, B.CONCEPT_NAME, B.CONCEPT_CODE, b.VOCABULARY_ID, a.CONCEPT_ID, A.CONCEPT_NAME, A.CONCEPT_CODE, a.VOCABULARY_ID
FROM CONCEPT A, CONCEPT B, CONCEPT_RELATIONSHIP CR 
WHERE A.VOCABULARY_ID = 'MedDRA' AND
CR.CONCEPT_ID_1 = A.CONCEPT_ID AND 
CR.CONCEPT_ID_2 = B.CONCEPT_ID AND
CR.RELATIONSHIP_ID = 'MedDRA - SNOMED eq' AND
CR.INVALID_REASON IS NULL AND
B.CONCEPT_ID = CR.CONCEPT_ID_2 AND
B.VOCABULARY_ID = 'SNOMED' AND
B.INVALID_REASON IS NULL AND
b.CONCEPT_CLASS_ID = 'Clinical Finding' AND
A.INVALID_REASON IS NULL
ORDER BY B.CONCEPT_ID, a.CONCEPT_ID

Some branded drugs terminate in a Clinical Drug, some only terminate in a ingredient (e.g. 19000951-Diazepam 5 MG Rectal Suppository [Valium]), and some terminate no where (e.g. 19034360-Nandrolone 50 MG/ML Injectable Solution [Hybolin-Improved]). This query will attempt to get a Clinical Drug and if that fails try an ingredient.

Could we just skip the Clinical Drug step all together and go straight to ingredient?

WITH CTE_BRANDED_TO_CLINICAL_DRUG AS (
	SELECT c2.CONCEPT_ID AS CONCEPT_ID, c2.CONCEPT_NAME AS CONCEPT_NAME, c2.CONCEPT_CODE AS CONCEPT_CODE, c2.VOCABULARY_ID AS VOCAB_ID, 
		C.CONCEPT_ID AS BD_CONCEPT_ID, C.CONCEPT_NAME AS BD_CONCEPT_NAME, C.CONCEPT_CODE AS BD_CONCEPT_CODE, C.VOCABULARY_ID AS BD_VOCAB_ID
	FROM CONCEPT c
		JOIN CONCEPT_ANCESTOR ca
			ON ca.descendant_concept_id = c.CONCEPT_ID
		JOIN CONCEPT c2
			ON c2.CONCEPT_ID = ca.ANCESTOR_CONCEPT_ID
			AND c2.vocabulary_id = 'RxNorm'
			AND c2.concept_class_id = 'Clinical Drug'
			AND c2.invalid_reason IS NULL
	WHERE c.VOCABULARY_ID = 'RxNorm'
	AND c.concept_class_id IN ( 'Branded Drug', 'Branded Pack')
	AND c.invalid_reason IS NULL
),
CTE_BRANDED_TO_INGREDIENT AS (
	SELECT c2.CONCEPT_ID AS CONCEPT_ID, c2.CONCEPT_NAME AS CONCEPT_NAME, c2.CONCEPT_CODE AS CONCEPT_CODE, c2.VOCABULARY_ID AS VOCAB_ID, 
		C.CONCEPT_ID AS BD_CONCEPT_ID, C.CONCEPT_NAME AS BD_CONCEPT_NAME, C.CONCEPT_CODE AS BD_CONCEPT_CODE, C.VOCABULARY_ID AS BD_VOCAB_ID
	FROM CONCEPT c
		JOIN CONCEPT_ANCESTOR ca
			ON ca.descendant_concept_id = c.CONCEPT_ID
		JOIN CONCEPT c2
			ON c2.CONCEPT_ID = ca.ANCESTOR_CONCEPT_ID
			AND c2.vocabulary_id = 'RxNorm'
			AND c2.concept_class_id = 'Ingredient'
			AND c2.invalid_reason IS NULL
	WHERE c.VOCABULARY_ID = 'RxNorm'
	AND c.concept_class_id IN ( 'Branded Drug', 'Branded Pack')
	AND c.invalid_reason IS NULL
	AND c.CONCEPT_ID NOT IN (
		SELECT DISTINCT BD_CONCEPT_ID FROM CTE_BRANDED_TO_CLINICAL_DRUG
	)
)
SELECT *
FROM CTE_BRANDED_TO_CLINICAL_DRUG
UNION
SELECT *
FROM CTE_BRANDED_TO_INGREDIENT

Christian_Reich · January 3, 2015, 2:49am

Rich, Erica et al.:

Here is the query that will do the following:

Stratify by ingredient only
Stratify by ingredient and HOI
Stratify by ingredient and clinical drug
Give the full detail (all ingredients, drugs, HOI and the numbers)

and create one summary report for each of them. The numbers are:

ct_count - clinical trial count. The details are listed for ingredients and HOIs, not for clinical drugs. The summary is provided as a total.
case_count - care report count. Same as above.
other_count - other mesh based count. Same as above.
splicer_count - count in splicer. The details are listed for clinical drugs and HOIs. The summary is provided as a total.
eb05 - ebgm confidence interval. The details are listed for ingredients and HOIs. The summary provides an average.
ebgm - empirical Bayes geometric mean of the EARS report. Same as above.
aers_report_count - absolute number of AERS reports. The summary is provided as a total.

Here is the query (takes a minute to run):

select
  report_name,
  case 
    when lag(ingredient_id) over (partition by ingredient_id order by report_order, clinical_drug_id, hoi_id) = ingredient_id then null
    else ingredient_id
  end as ingredient_id, 
  case 
    when lag(ingredient_id) over (partition by ingredient_id order by report_order, clinical_drug_id, hoi_id) = ingredient_id then null
    else ingredient
  end as ingredient,
  case 
    when lag(clinical_drug_id) over (partition by ingredient_id, clinical_drug_id order by report_order, hoi_id) = clinical_drug_id then null
    else clinical_drug_id
  end as clinical_drug_id, 
  case 
    when lag(clinical_drug_id) over (partition by ingredient_id, clinical_drug_id order by report_order, hoi_id) = clinical_drug_id then null
    else clinical_drug
  end as clinical_drug,
  hoi_id, hoi, ct_count, case_count, other_count, splicer_count, eb05, ebgm, aers_report_count
from (
  select distinct
    1 as report_order,
    'Stratified by ingredient' as report_name,
    dv1.drug as ingredient_id, 
    dv1.rxnorm_drug as ingredient, 
    cast(null as integer) as clinical_drug_id, 
    cast(null as varchar(1)) as clinical_drug, 
    cast(null as integer) as hoi_id, 
    cast(null as varchar(1)) as hoi, 
    sum(dv1.medline_mesh_clin_trial_count) over (partition by dv1.drug) as ct_count, -- Ingredient-based
    sum(dv1.medline_mesh_case_report_count) over (partition by dv1.drug) as case_count, -- Ingredient-based
    sum(dv1.medline_mesh_other_count) over (partition by dv1.drug) as other_count, -- Ingredient-based
    sum(dv2.spl_splicer_count) over (partition by dv1.drug) as splicer_count, -- Drug-based
    avg(dv1.aers_eb05) over (partition by dv1.drug) as eb05, -- Ingredient-based
    avg(dv1.aers_ebgm) over (partition by dv1.drug) as ebgm, -- Ingredient-based
    sum(dv1.aers_report_count) over (partition by dv1.drug) as aers_report_count-- Ingredient-based
  from drug_hoi_evidence_view dv1, drug_hoi_evidence_view dv2, concept_ancestor ca, concept ing, concept drg
  where dv1.hoi = dv2.hoi 
  and dv1.drug = ca.ancestor_concept_id and dv1.drug=ing.concept_id and ing.concept_class_id='Ingredient'
  and dv2.drug = ca.descendant_concept_id and dv2.drug=drg.concept_id and drg.concept_class_id like '%Drug'
union
  select distinct
    2 as report_order,
    'Stratified by ingredient and HOI' as report_name,
    dv1.drug as ingredient_id, 
    dv1.rxnorm_drug as ingredient, 
    cast(null as integer) as clinical_drug_id, 
    cast(null as varchar(1)) as clinical_drug, 
    dv1.hoi as hoi_id, 
    dv1.snomed_hoi as hoi,
    sum(dv1.medline_mesh_clin_trial_count) over (partition by dv1.drug, dv1.hoi) as ct_count, -- Ingredient-based
    sum(dv1.medline_mesh_case_report_count) over (partition by dv1.drug, dv1.hoi) as case_count, -- Ingredient-based
    sum(dv1.medline_mesh_other_count) over (partition by dv1.drug, dv1.hoi) as other_count, -- Ingredient-based
    sum(dv2.spl_splicer_count) over (partition by dv1.drug, dv1.hoi) as splicer_count, -- Drug-based
    avg(dv1.aers_eb05) over (partition by dv1.drug, dv1.hoi) as eb05, -- Ingredient-based
    avg(dv1.aers_ebgm) over (partition by dv1.drug, dv1.hoi) as ebgm, -- Ingredient-based
    sum(dv1.aers_report_count) over (partition by dv1.drug, dv1.hoi) as aers_report_count-- Ingredient-based
  from drug_hoi_evidence_view dv1, drug_hoi_evidence_view dv2, concept_ancestor ca, concept ing, concept drg
  where dv1.hoi = dv2.hoi 
  and dv1.drug = ca.ancestor_concept_id and dv1.drug=ing.concept_id and ing.concept_class_id='Ingredient'
  and dv2.drug = ca.descendant_concept_id and dv2.drug=drg.concept_id and drg.concept_class_id like '%Drug'
union
  select distinct
    3 as report_order,
    'Stratified by ingredient and clinical drug' as report_name,
    dv1.drug as ingredient_id, 
    dv1.rxnorm_drug as ingredient, 
    dv2.drug as clinical_drug_id, 
    dv2.rxnorm_drug as clinical_drug, 
    cast(null as integer) as hoi_id, 
    cast(null as varchar(1)) as hoi, 
    sum(dv1.medline_mesh_clin_trial_count) over (partition by dv1.drug, dv2.drug) as ct_count, -- Ingredient-based
    sum(dv1.medline_mesh_case_report_count) over (partition by dv1.drug, dv2.drug) as case_count, -- Ingredient-based
    sum(dv1.medline_mesh_other_count) over (partition by dv1.drug, dv2.drug) as other_count, -- Ingredient-based
    sum(dv2.spl_splicer_count) over (partition by dv1.drug, dv2.drug) as splicer_count, -- Drug-based
    avg(dv1.aers_eb05) over (partition by dv1.drug, dv2.drug) as eb05, -- Ingredient-based
    avg(dv1.aers_ebgm) over (partition by dv1.drug, dv2.drug) as ebgm, -- Ingredient-based
    sum(dv1.aers_report_count) over (partition by dv1.drug, dv2.drug) as aers_report_count-- Ingredient-based
  from drug_hoi_evidence_view dv1, drug_hoi_evidence_view dv2, concept_ancestor ca, concept ing, concept drg
  where dv1.hoi = dv2.hoi 
  and dv1.drug = ca.ancestor_concept_id and dv1.drug=ing.concept_id and ing.concept_class_id='Ingredient'
  and dv2.drug = ca.descendant_concept_id and dv2.drug=drg.concept_id and drg.concept_class_id like '%Drug'
union
  select
    4 as report_order,
    'Full detail: ingredient, clinical drug, HOI' as report_name,
    dv1.drug as ingredient_id, 
    dv1.rxnorm_drug as ingredient, 
    dv2.drug as clinical_drug_id, 
    dv2.rxnorm_drug as clinical_drug, 
    dv1.hoi as hoi_id, 
    dv1.snomed_hoi as hoi, 
    dv1.medline_mesh_clin_trial_count as ct_count, -- Ingredient-based
    dv1.medline_mesh_case_report_count as case_count, -- Ingredient-based
    dv1.medline_mesh_other_count as other_count, -- Ingredient-based
    dv2.spl_splicer_count as splicer_count, -- Drug-based
    dv1.aers_eb05 as eb05, -- Ingredient-based
    dv1.aers_ebgm as ebgm, -- Ingredient-based
    dv1.aers_report_count as aers_report_count-- Ingredient-based
  from drug_hoi_evidence_view dv1, drug_hoi_evidence_view dv2, concept_ancestor ca, concept ing, concept drg
  where dv1.hoi = dv2.hoi 
  and dv1.drug = ca.ancestor_concept_id and dv1.drug=ing.concept_id and ing.concept_class_id='Ingredient'
  and dv2.drug = ca.descendant_concept_id and dv2.drug=drg.concept_id and drg.concept_class_id like '%Drug'
) d
order by d.ingredient_id, report_order, d.clinical_drug_id, d.hoi_id
limit 1000;

rkboyce · January 15, 2015, 2:03pm

@Christian_Reich, @ericaVoss, @Patrick_Ryan, @nick, @Vojtech_Huser and anyone else interested,

Would you please review the results of Christian’s query (modified to include EU SPC data) and post your thoughts and questions to this thread?

The results are in this google doc: https://docs.google.com/spreadsheets/d/11LUTrteCVPyoJT9-4nCxy7fD9LY9WVaKolcz1T-rEm4/edit?usp=sharing

The doc shows a report of drug-HOI counts over a version of the evidence base that includes the following:

all EU SCP drug - HOI pairs that could be mapped to RxNorm and SNOMED
all SPLICER data from FDA product labels up to date as of December (note this thread through LAERTES: How to Present SPLICER's Adverse Reaction Labeling Statistics)
all drug - HOI pairs pulled from PubMed using my implementation (which still needs validation) of http://www.ncbi.nlm.nih.gov/pubmed/?term=23195749

ericaVoss · January 23, 2015, 3:51pm

@Christian_Reich taking a holistic approach to answering Rich’s question was the right approach (wish I thought of it )

@rkboyce I have been struggling to understand how to get distinct counts. For example, I would think that one AE source like Splicer could have many AEs for one drug. So if you were to just roll up to an ingredient using DRUG_HOI_EVIDENCE_VIEW you would count the drug multiple times because it has an entry for multiple AEs. But I may not be understanding this properly?

But let’s ignore that for now and look at @Christian_Reich’s query and its first unit of work.

Christian_Reich:

select distinct
1 as report_order,
‘Stratified by ingredient’ as report_name,
dv1.drug as ingredient_id,
dv1.rxnorm_drug as ingredient,
cast(null as integer) as clinical_drug_id,
cast(null as varchar(1)) as clinical_drug,
cast(null as integer) as hoi_id,
cast(null as varchar(1)) as hoi,
sum(dv1.medline_mesh_clin_trial_count) over (partition by dv1.drug) as ct_count, – Ingredient-based
sum(dv1.medline_mesh_case_report_count) over (partition by dv1.drug) as case_count, – Ingredient-based
sum(dv1.medline_mesh_other_count) over (partition by dv1.drug) as other_count, – Ingredient-based
sum(dv2.spl_splicer_count) over (partition by dv1.drug) as splicer_count, – Drug-based
avg(dv1.aers_eb05) over (partition by dv1.drug) as eb05, – Ingredient-based
avg(dv1.aers_ebgm) over (partition by dv1.drug) as ebgm, – Ingredient-based
sum(dv1.aers_report_count) over (partition by dv1.drug) as aers_report_count-- Ingredient-based
from drug_hoi_evidence_view dv1, drug_hoi_evidence_view dv2, concept_ancestor ca, concept ing, concept drg
where dv1.hoi = dv2.hoi
and dv1.drug = ca.ancestor_concept_id and dv1.drug=ing.concept_id and ing.concept_class_id=‘Ingredient’
and dv2.drug = ca.descendant_concept_id and dv2.drug=drg.concept_id and drg.concept_class_id like ‘%Drug’

Here it looks like he looks for ingredients in DV1 and rolls up lower level drugs in DV2 to that one ingredient record. But I don’t understand why the counts, e.g. CT_COUNT, make sense.

Taking the guts of his query I narrowed down to one DV1 example:

SELECT dv1.ID, dv1.DRUG, dv1.RXNORM_DRUG, dv1.HOI, dv1.SNOMED_HOI, dv1.medline_mesh_clin_trial_count, dv2.ID, dv2.DRUG, dv2.RXNORM_DRUG, dv2.HOI, dv2.SNOMED_HOI, dv2.medline_mesh_clin_trial_count
from drug_hoi_evidence_view dv1, drug_hoi_evidence_view dv2, concept_ancestor ca, concept ing, concept drg
where dv1.hoi = dv2.hoi
and dv1.drug = ca.ancestor_concept_id and dv1.drug=ing.concept_id and ing.concept_class_id='Ingredient'
and dv2.drug = ca.descendant_concept_id and dv2.drug=drg.concept_id and drg.concept_class_id like '%Drug'
AND dv1.DRUG = 745466 /*"Valproate"*/
AND dv1.HOI = 31967 /*"Nausea"*/
ORDER BY dv2.RXNORM_DRUG

You can see that there are 12 lower level drugs that connected into the DV1 ingredient/hoi. Also see that the MEDLINE_MESH_CLIN_TRIAL_COUNT is being duplicated. In @Christian_Reich’s query above the CT_COUNT would be 2*12 or 24. But I think in this case the answer is actually 2, the DV1 row has a MEDLINE_MESH_CLIN_TRIAL_COUNT has a count of 2 for “Valproate”-“Nausea” and none of the lower level rows have any MEDLINE_MESH_CLIN_TRIAL_COUNTs, therefore we do not add additional counts to the 2.

@rkboyce if I’m way off base let’s chat again.

ericaVoss · January 28, 2015, 4:40pm

If my thinking above is correct, we can flip Christians queries around like this and get the counts we need.

One thing I’d like to understand again is why are we doing rolling up prior to getting to this table. I think it will work but there sometimes may be some dis-entangling we may need to do. I’d like to discuss this again.

WITH CTE_INGREDIENTS AS (
	/*FIND INGREDIENT COUNTS FIRST*/
	SELECT dv1.ID, dv1.DRUG, dv1.RXNORM_DRUG, dv1.HOI, dv1.SNOMED_HOI, dv1.medline_mesh_clin_trial_count
	from drug_hoi_evidence_view dv1
		JOIN concept ing
			ON dv1.drug=ing.concept_id 
			AND ing.concept_class_id='Ingredient'
	AND dv1.DRUG = 745466 /*"Valproate"*/
	AND dv1.HOI = 31967 /*"Nausea"*/
),
CTE_LOWER_LEVEL AS (
	/*FINDER LOWER LEVEL COUNTS SECOND*/
	/*ROLL UP TO INGREDIENT LEVEL*/
	SELECT i.ID, i.DRUG, i.RXNORM_DRUG, i.HOI, i.SNOMED_HOI, dv2.medline_mesh_clin_trial_count
	FROM drug_hoi_evidence_view dv2
		JOIN concept_ancestor ca
			ON ca.DESCENDANT_CONCEPT_ID = dv2.DRUG
		JOIN CTE_INGREDIENTS i
			ON i.DRUG = ca.ANCESTOR_CONCEPT_ID
			AND i.HOI = dv2.HOI
		JOIN concept drg
			ON drg.CONCEPT_ID = dv2.DRUG
			AND drg.concept_class_id like '%Drug'
)
/*SUM TOGETHER*/
SELECT ID, DRUG, RXNORM_DRUG, HOI, SNOMED_HOI, SUM(MEDLINE_MESH_CLIN_TRIAL_COUNT) AS MEDLINE_MESH_CLINICAL_TRIAL_COUNT
FROM (
	SELECT *
	FROM CTE_INGREDIENTS
	UNION
	SELECT *
	FROM CTE_LOWER_LEVEL
) z
GROUP BY ID, DRUG, RXNORM_DRUG, HOI, SNOMED_HOI
ORDER BY ID, DRUG, RXNORM_DRUG, HOI, SNOMED_HOI

ericaVoss · January 29, 2015, 3:15am

Couple concerns with this query:

There are some issues with CONCEPT_ANCESTOR in the VOCAB, however @Christian_Reich’s team needs to address them and we should not try to navigate around them here.
Not every CONCEPT maps to an INGREDIENT.
One gotcha to investigate is some DRUGs map up to multiple CLINICAL DRUGS, they are usually drugs in a pack. @Christian_Reich may have a preferred method of dealing with this.

@Christian_Reich I noticed when I did this none of the DRUGs mapped to an INGREDIENT and CLINICAL DRUG, it was either or none.

I need to do more testing on this . . .

WITH CTE_TRANSLATE AS (
	SELECT v.*,
		c.CONCEPT_ID AS INGREDIENT_ID, 
		c.CONCEPT_NAME AS INGREDIENT_NAME,
		CASE WHEN c.CONCEPT_ID IS NULL THEN 0 ELSE 1 END INGREDIENT_EXISTS,
		c2.CONCEPT_ID AS CLINICAL_DRUG_ID, 
		c2.CONCEPT_NAME AS CLINICAL_DRUG_NAME,
		CASE WHEN c2.CONCEPT_ID IS NULL THEN 0 ELSE 1 END CLINICAL_DRUG_EXISTS 
	FROM drug_hoi_evidence_view V
		/*ROLL DOWN TO INGREDIENTS*/
		LEFT OUTER JOIN CONCEPT_ANCESTOR ca
			ON ca.ANCESTOR_CONCEPT_ID = v.DRUG
			AND ca.DESCENDANT_CONCEPT_ID IN (
				SELECT DISTINCT CONCEPT_ID
				FROM CONCEPT
				WHERE VOCABULARY_ID = 'RxNorm'
				AND CONCEPT_CLASS_ID = 'Ingredient'
				AND INVALID_REASON IS NULL
			)
		LEFT OUTER JOIN CONCEPT c
			ON c.CONCEPT_ID = ca.DESCENDANT_CONCEPT_ID
		/*ROLL UP TO CLINICAL DRUG*/
		LEFT OUTER JOIN CONCEPT_ANCESTOR ca2
			ON ca2.DESCENDANT_CONCEPT_ID = v.DRUG
			AND ca2.ANCESTOR_CONCEPT_ID IN (
				SELECT DISTINCT CONCEPT_ID
				FROM CONCEPT
				WHERE VOCABULARY_ID = 'RxNorm'
				AND CONCEPT_CLASS_ID = 'Clinical Drug'
				AND INVALID_REASON IS NULL
			)
		LEFT OUTER JOIN CONCEPT c2
			ON c2.CONCEPT_ID = ca2.ANCESTOR_CONCEPT_ID
), 
CTE_REPORT_1 AS (
	SELECT 1 AS REPORT_ORDER, 
		CAST('Stratified by ingredient' AS TEXT) as report_name,
		INGREDIENT_ID AS INGREDIENT_ID, 
		INGREDIENT_NAME AS INGREDIENT, 
		cast(null as integer) as clinical_drug_id, 
		cast(null as varchar(1)) as clinical_drug, 
		cast(null as integer) as hoi_id, 
		cast(null as varchar(1)) as hoi, 
		sum(medline_mesh_clin_trial_count) as ct_count, -- Ingredient-based
		sum(medline_mesh_case_report_count) as case_count, -- Ingredient-based
		sum(medline_mesh_other_count) as other_count, -- Ingredient-based
		sum(spl_splicer_count) as splicer_count, -- Drug-based
		avg(aers_eb05) as eb05, -- Ingredient-based
		avg(aers_ebgm) as ebgm, -- Ingredient-based
		sum(aers_report_count) as aers_report_count-- Ingredient-based
	FROM CTE_TRANSLATE
	WHERE INGREDIENT_EXISTS = 1
	GROUP BY INGREDIENT_ID, INGREDIENT_NAME
), 
CTE_REPORT_2 AS (
	SELECT 2 AS REPORT_ORDER, 
		CAST('Stratified by ingredient and HOI' AS text) as report_name,
		INGREDIENT_ID AS INGREDIENT_ID, 
		INGREDIENT_NAME AS INGREDIENT, 
		cast(null as integer) as clinical_drug_id, 
		cast(null as varchar(1)) as clinical_drug, 
		HOI as hoi_id, 
		SNOMED_HOI as hoi, 
		sum(medline_mesh_clin_trial_count) as ct_count, -- Ingredient-based
		sum(medline_mesh_case_report_count) as case_count, -- Ingredient-based
		sum(medline_mesh_other_count) as other_count, -- Ingredient-based
		sum(spl_splicer_count) as splicer_count, -- Drug-based
		avg(aers_eb05) as eb05, -- Ingredient-based
		avg(aers_ebgm) as ebgm, -- Ingredient-based
		sum(aers_report_count) as aers_report_count-- Ingredient-based
	FROM CTE_TRANSLATE
	WHERE INGREDIENT_EXISTS = 1
	GROUP BY INGREDIENT_ID, INGREDIENT_NAME, HOI, SNOMED_HOI 
), 
CTE_REPORT_3 AS (
	SELECT 3 AS REPORT_ORDER, 
		CAST('Stratified by ingredient and clinical drug' AS text) as report_name,
		INGREDIENT_ID AS INGREDIENT_ID, 
		INGREDIENT_NAME AS INGREDIENT, 
		CLINICAL_DRUG_ID as clinical_drug_id, 
		CLINICAL_DRUG_NAME as clinical_drug, 
		cast(null as integer) as hoi_id, 
		cast(null as varchar(1)) as hoi, 
		sum(medline_mesh_clin_trial_count) as ct_count, -- Ingredient-based
		sum(medline_mesh_case_report_count) as case_count, -- Ingredient-based
		sum(medline_mesh_other_count) as other_count, -- Ingredient-based
		sum(spl_splicer_count) as splicer_count, -- Drug-based
		avg(aers_eb05) as eb05, -- Ingredient-based
		avg(aers_ebgm) as ebgm, -- Ingredient-based
		sum(aers_report_count) as aers_report_count-- Ingredient-based
	FROM CTE_TRANSLATE
	WHERE CLINICAL_DRUG_EXISTS = 1
	GROUP BY INGREDIENT_ID, INGREDIENT_NAME, CLINICAL_DRUG_ID, CLINICAL_DRUG_NAME
),
CTE_REPORT_4 AS (
	SELECT 4 AS REPORT_ORDER, 
		CAST('Full detail: ingredient, clinical drug, HOI' AS text) as report_name,
		INGREDIENT_ID AS INGREDIENT_ID, 
		INGREDIENT_NAME AS INGREDIENT, 
		CLINICAL_DRUG_ID as clinical_drug_id, 
		CLINICAL_DRUG_NAME as clinical_drug, 
		HOI as hoi_id, 
		SNOMED_HOI as hoi, 
		sum(medline_mesh_clin_trial_count) as ct_count, -- Ingredient-based
		sum(medline_mesh_case_report_count) as case_count, -- Ingredient-based
		sum(medline_mesh_other_count) as other_count, -- Ingredient-based
		sum(spl_splicer_count) as splicer_count, -- Drug-based
		avg(aers_eb05) as eb05, -- Ingredient-based
		avg(aers_ebgm) as ebgm, -- Ingredient-based
		sum(aers_report_count) as aers_report_count-- Ingredient-based
	FROM CTE_TRANSLATE
	GROUP BY INGREDIENT_ID, INGREDIENT_NAME, CLINICAL_DRUG_ID, CLINICAL_DRUG_NAME, HOI, SNOMED_HOI 
)
SELECT *
FROM CTE_REPORT_1
UNION
SELECT *
FROM CTE_REPORT_2
UNION
SELECT *
FROM CTE_REPORT_3
UNION
SELECT *
FROM CTE_REPORT_4
ORDER BY INGREDIENT_ID, REPORT_ORDER, CLINICAL_DRUG_ID, HOI_ID

rkboyce · January 29, 2015, 2:55pm

Great. Let me know when you think its ok. Like some of the other issues we discussed yesterday, some known issues might be pre-empted by adding certain checks during the process of loading the sources into the evidence base. If you see any possible candidates for that approach, I can implement the rules for the next load that will include SemMed which I expect to complete over the weekend or on next Monday.

Christian_Reich · January 30, 2015, 12:30pm

Erica et al:

We do need to talk about the packs and the quantified version of some drugs, because they are equivalent from an outcomes perspective. For example:

44785790 Treprostinil 0.125 MG Extended Release Oral Tablet [Orenitram] is essentiallz the same as 44785532 12 HR Treprostinil 0.125 MG Extended Release Oral Tablet [Orenitram], except one of them indicates the release duration of 12 hours. Currently, we don’t treat them similarly. The relationship_id connecting these is “Has quantified form”.

Likewise, we have the problem with the packs: 43013205 {84 (cabozantinib 20 MG Oral Capsule) } Pack really is the same as 43012020 “cabozantinib 20 MG Oral Capsule”, connected by “Contains”.

My idea was to promote only one of them to standard, and kick the other ones out, so we get their identity established.

Thoughts?

Patrick_Ryan · January 31, 2015, 8:09pm

My perspective:

It’s VERY important to have a complete mapping between RxNorm ingredients and the descendent RxNorm clinical drug/brand drug/clinical drug component/clinical pack concepts. In this way, the analytical use case of ‘select all brands/dose/forms that contain drug X’ is straightforwardly handled through the CONCEPT_ANCESTOR table. I know we often talk about the relationship between RxNorm ingredient and RxNorm clinical drugs, but it seems many source codes map to branded drug, clinical drug component, or clinical pack concepts, so ensuring their hierarchical relationships are properly maintained is our top priority.

It is possible that source codes will allow differentiation between the normal and extended release formulations, and it could be important to separate those out for specific use cases. For example, we may be interested in a comparative effectiveness study to determine if extended release has different effects from the normal form. I know this is important in research with opioids, for example, and probably would be relevant elsewhere. Therefore, I would recommend we DO NOT restrict the use of the clinical drug concepts as standard concepts, but permit them in the DRUG domain and allow source code mapping to these concepts unless they are known to be deprecated.

To the question of normalizing RxNorm clinical drug concepts which have a relationship with ‘has quantified form’: if we can ensure that these are true equivalences (that the concept without the release duration only have one release duration, as indicated by the other related concept), then I would be comfortable assigning one as the standard. But, I’m not convinced this standardization necessarily improves any use case that I can think of, and I suspect that trying to assign one as standard and marking the other as nonstandard could introduce opportunities for bugs in CONCEPT_RELATIONSHIP, CONCEPT_ANCESTOR and SOURCE_TO_CONCEPT_MAP (for v4). So, I’d consider this low priority and recommend we only consider this to be implemented after a comprehensive test apparatus was put in place to verify that we achieve the desired behavior.

The example of the cabozanthinib clinical pack concept, and its ‘contains’ relationship to the clinical drug concept: when looking at that example in HERMES (http://www.ohdsi.org/web/hermes/index.html#/concept/43012020) , it seems like many RxNorm clinical pack concepts contain more information than the RxNorm clinical drug concept that it related to via a ‘contains’ relation. In particular, for combination products or packs with multiple doses, there may be multiple clinical drug concepts related to one pack. Given that, and because I don’t immediately see an analytical use case where consolidation would solve a problem I’m having, I’d be reluctant to designate only one concept a standard.

Christian_Reich · February 1, 2015, 2:06am

Patrick:

Yes, that’s the debate we need to have.

One clarification before you dive in again: The “Has quantified form” relationship does not connect normal release to extended release drug. All it does it provides quantitative information of two kinds (funny that they treat them both the same way, they are very different) for the same product:

The amount of delay of the release. So, Clinical Drug “Ingredient X mg Extended Release Tablet” would be linked to “12 Hr Ingredient X mg Extended Release Tablet”, which are both slow release formulations. However, there could be two or more such formulations, like another “24 Hr Ingredient X mg Extended Release Tablet”. Most of them are one-to-one relationship, but an example of a one-to-many would be concept_id=40222066 and concept_id in (40221860, 40221872)
The amount of a solution in a product, like “Ingredient X mg/ml solution” to “20 ml Ingredient X mg/ml solution”. An example of one-to-many again would be 965328 and (45756300, 45756299)

For the former situation you could claim that there are use cases that would want to distinguish the duration of release. For the latter the use case is unclear to me, because the size of the bottle or syringe has no effect on the actual dose provided.

How about this solution: For the former cases, we extend the CONCEPT_ANCESTOR table to these quantified forms, creating a hierarchy Ingredient - Drug Form - Clinical Drug - Quantified Clinical Drug (and the branded equivalent). We would introduce a new concept_class_id for that. For the latter - I don’t know. I would just kick out the quantified forms altogether and map them over to the simple form.

Now the packs. The packs really only combine existing products together. There is no use case, because we don’t know how the packs are used and how the individual components are exposed to the patient over time. We can only assume that all the dope in a pack is in the patient for the entire drug_supply time. So, my suggestion would be to kick the packs out and resolve them to the individual products. For example, “{21 (Desogestrel 0.15 MG / Ethinyl Estradiol 0.03 MG Oral Tablet) } Pack” would be de-standardized and mapped to both “Desogestrel 0.15 MG” and “Ethinyl Estradiol 0.03 MG Oral Tablet”.

If we don’t do that, we essentially have multiple different concepts for what is the same thing in most every use case.

Patrick_Ryan · February 1, 2015, 2:20am

I agree with that recommendation. Is there a way we can test that this
approach is consistently applied on the new vocab?

Christian_Reich · February 1, 2015, 6:10pm

Erica has the access to the RxNorm dev schema. You can have it too, or anybody who wants it. Let me re-run the assembly and the ancestor constructor. Probably tomorrow. I write a backlog for Timur.

rkboyce · February 4, 2015, 2:05pm

Here is a sample of the results of Erica’s new query for some drugs I a bit familiar with (bupropion, selegiline, ketoconazole) - however, pending changes to the standard vocab might yield slightly difference results…we will reload the vocab onto the LAERTES dev server after all changes have been finalized…

ericaVoss · February 5, 2015, 2:31am

Based on my 1:1 meeting today with @rkboyce I’m going to look into the following:

Review the query should work in updated VOCAB
Review why Selegiline & Hyoglycemia example seems to have 1 splicer for the report with the clinical drugs but doesn’t seem to have it elsewhere (in the higher roll ups)
Review why Kenakonozol seems to bee missing report #2/#1

rkboyce · February 9, 2015, 10:47pm

Erica,

Please note that I have now loaded SemMedDB alongside of all of PubMed (MeSH tags), SPLICER, EU SPC, and ye olde PV Signals (including AERS counts). This means that that pivot table structure is now different because SemMed has negative modality evidence items:

select * from drug_hoi_evidence_view 
where 
neg_medline_semmed_clin_trial_count > 0 or
neg_medline_semmed_case_report_count > 0;

The new pivot structure is as follows:
id,
drugs.drug,
drugs.rxnorm_drug,
drugs.hoi,
drugs.snomed_hoi,
pos_modality_stats.aers_eb05,
pos_modality_stats.aers_ebgm,
pos_modality_stats.aers_report_count,
pos_modality_stats.medline_mesh_clin_trial_count,
pos_modality_stats.medline_mesh_case_report_count,
pos_modality_stats.medline_mesh_other_count,
pos_modality_stats.medline_semmed_clin_trial_count,
pos_modality_stats.medline_semmed_case_report_count,
pos_modality_stats.spl_eu_spc_count,
pos_modality_stats.spl_splicer_count,
neg_modality_stats.neg_medline_semmed_clin_trial_count,
neg_modality_stats.neg_medline_semmed_case_report_count,
drill_down_links.aers_eb05_link,
drill_down_links.aers_ebgm_link,
drill_down_links.aers_report_link,
drill_down_links.medline_mesh_clin_trial_link,
drill_down_links.medline_mesh_case_report_link,
drill_down_links.medline_mesh_other_link,
drill_down_links.medline_semmed_clin_trial_link,
drill_down_links.medline_semmed_case_report_link,
drill_down_links.spl_eu_spc_link,
drill_down_links.spl_splicer_link,
neg_modality_links.neg_medline_semmed_clin_trial_link,
neg_modality_links.neg_medline_semmed_case_report_link

Plenty to discuss on Wed.

best,
-R

ericaVoss · February 10, 2015, 4:05pm

@rkboyce,

I see the updates, thank you for letting me know.

[Check Query] I found a typo in my first query that should be fixed (see below)
[Specific Examples] You asked me to review a few specific drugs:
- When we reviewed the other day, I was going to dig into the 40165265 - selegiline hydrochloride 1.25 MG Disintegrating Oral Tablet example. Well, I no longer can find this in the drug_hoi_evidence_view table. Any clue? In fact I cannot find RXNORM_DRUG LIKE ‘%selegiline%’.
- For the 985711 - Ketoconazole200 MG Oral Tablet this will fix with an updated Vocabulary. It now will have 1,2,3,4 Reports.

I’m going to keep reviewing . . . but let me get this out to you now . . . Also, maybe I should be doing this in GitHub. Let me know.

I’d like to write this table out, it will make testing easier. I’ve asked @lee_evans what my permissions are and where I can write to. Right now I’m running it into a TEMP table.

UPDATED QUERY:

WITH CTE_TRANSLATE AS (
	SELECT v.*,
		c.CONCEPT_ID AS INGREDIENT_ID, 
		c.CONCEPT_NAME AS INGREDIENT_NAME,
		CASE WHEN c.CONCEPT_ID IS NULL THEN 0 ELSE 1 END INGREDIENT_EXISTS,
		c2.CONCEPT_ID AS CLINICAL_DRUG_ID, 
		c2.CONCEPT_NAME AS CLINICAL_DRUG_NAME,
		CASE WHEN c2.CONCEPT_ID IS NULL THEN 0 ELSE 1 END CLINICAL_DRUG_EXISTS 
	FROM drug_hoi_evidence_view V
		/*ROLL DOWN TO INGREDIENTS*/
		LEFT OUTER JOIN CONCEPT_ANCESTOR ca
			ON ca.DESCENDANT_CONCEPT_ID = v.DRUG
			AND ca.ANCESTOR_CONCEPT_ID IN (
				SELECT DISTINCT CONCEPT_ID
				FROM CONCEPT
				WHERE VOCABULARY_ID = 'RxNorm'
				AND CONCEPT_CLASS_ID = 'Ingredient'
				AND INVALID_REASON IS NULL
			)
		LEFT OUTER JOIN CONCEPT c
			ON c.CONCEPT_ID = ca.ANCESTOR_CONCEPT_ID
		/*ROLL UP TO CLINICAL DRUG*/
		LEFT OUTER JOIN CONCEPT_ANCESTOR ca2
			ON ca2.DESCENDANT_CONCEPT_ID = v.DRUG
			AND ca2.ANCESTOR_CONCEPT_ID IN (
				SELECT DISTINCT CONCEPT_ID
				FROM CONCEPT
				WHERE VOCABULARY_ID = 'RxNorm'
				AND CONCEPT_CLASS_ID = 'Clinical Drug'
				AND INVALID_REASON IS NULL
			)
		LEFT OUTER JOIN CONCEPT c2
			ON c2.CONCEPT_ID = ca2.ANCESTOR_CONCEPT_ID
), 
CTE_REPORT_1 AS (
	SELECT 1 AS REPORT_ORDER, 
		CAST('Stratified by ingredient' AS TEXT) as report_name,
		INGREDIENT_ID AS INGREDIENT_ID, 
		INGREDIENT_NAME AS INGREDIENT, 
		cast(null as integer) as clinical_drug_id, 
		cast(null as varchar(1)) as clinical_drug, 
		cast(null as integer) as hoi_id, 
		cast(null as varchar(1)) as hoi, 
		sum(medline_mesh_clin_trial_count) as ct_count, -- Ingredient-based
		sum(medline_mesh_case_report_count) as case_count, -- Ingredient-based
		sum(medline_mesh_other_count) as other_count, -- Ingredient-based
		sum(spl_splicer_count) as splicer_count, -- Drug-based
		sum(spl_eu_spc_count) AS EU_splicer_count,
		sum(medline_semmed_clin_trial_count) AS SemMedDB_CT_COUNT,
		sum(medline_semmed_case_report_count) AS SemMedDB_CASE_COUNT,
		sum(neg_medline_semmed_clin_trial_count) AS SemMedDB_Neg_CT_COUNT,
		sum(neg_medline_semmed_case_report_count) AS SemMedDB_Neg_CASE_COUNT,
		avg(aers_eb05) as eb05, -- Ingredient-based
		avg(aers_ebgm) as ebgm, -- Ingredient-based
		sum(aers_report_count) as aers_report_count-- Ingredient-based
	FROM CTE_TRANSLATE
	WHERE INGREDIENT_EXISTS = 1
	GROUP BY INGREDIENT_ID, INGREDIENT_NAME
), 
CTE_REPORT_2 AS (
	SELECT 2 AS REPORT_ORDER, 
		CAST('Stratified by ingredient and HOI' AS text) as report_name,
		INGREDIENT_ID AS INGREDIENT_ID, 
		INGREDIENT_NAME AS INGREDIENT, 
		cast(null as integer) as clinical_drug_id, 
		cast(null as varchar(1)) as clinical_drug, 
		HOI as hoi_id, 
		SNOMED_HOI as hoi, 
		sum(medline_mesh_clin_trial_count) as ct_count, -- Ingredient-based
		sum(medline_mesh_case_report_count) as case_count, -- Ingredient-based
		sum(medline_mesh_other_count) as other_count, -- Ingredient-based
		sum(spl_splicer_count) as splicer_count, -- Drug-based
		sum(spl_eu_spc_count) AS EU_splicer_count,
		sum(medline_semmed_clin_trial_count) AS SemMedDB_CT_COUNT,
		sum(medline_semmed_case_report_count) AS SemMedDB_CASE_COUNT,
		sum(neg_medline_semmed_clin_trial_count) AS SemMedDB_Neg_CT_COUNT,
		sum(neg_medline_semmed_case_report_count) AS SemMedDB_Neg_CASE_COUNT,
		avg(aers_eb05) as eb05, -- Ingredient-based
		avg(aers_ebgm) as ebgm, -- Ingredient-based
		sum(aers_report_count) as aers_report_count-- Ingredient-based
	FROM CTE_TRANSLATE
	WHERE INGREDIENT_EXISTS = 1
	GROUP BY INGREDIENT_ID, INGREDIENT_NAME, HOI, SNOMED_HOI 
), 
CTE_REPORT_3 AS (
	SELECT 3 AS REPORT_ORDER, 
		CAST('Stratified by ingredient and clinical drug' AS text) as report_name,
		INGREDIENT_ID AS INGREDIENT_ID, 
		INGREDIENT_NAME AS INGREDIENT, 
		CLINICAL_DRUG_ID as clinical_drug_id, 
		CLINICAL_DRUG_NAME as clinical_drug, 
		cast(null as integer) as hoi_id, 
		cast(null as varchar(1)) as hoi, 
		sum(medline_mesh_clin_trial_count) as ct_count, -- Ingredient-based
		sum(medline_mesh_case_report_count) as case_count, -- Ingredient-based
		sum(medline_mesh_other_count) as other_count, -- Ingredient-based
		sum(spl_splicer_count) as splicer_count, -- Drug-based
		sum(spl_eu_spc_count) AS EU_splicer_count,
		sum(medline_semmed_clin_trial_count) AS SemMedDB_CT_COUNT,
		sum(medline_semmed_case_report_count) AS SemMedDB_CASE_COUNT,
		sum(neg_medline_semmed_clin_trial_count) AS SemMedDB_Neg_CT_COUNT,
		sum(neg_medline_semmed_case_report_count) AS SemMedDB_Neg_CASE_COUNT,
		avg(aers_eb05) as eb05, -- Ingredient-based
		avg(aers_ebgm) as ebgm, -- Ingredient-based
		sum(aers_report_count) as aers_report_count-- Ingredient-based
	FROM CTE_TRANSLATE
	WHERE CLINICAL_DRUG_EXISTS = 1
	GROUP BY INGREDIENT_ID, INGREDIENT_NAME, CLINICAL_DRUG_ID, CLINICAL_DRUG_NAME
),
CTE_REPORT_4 AS (
	SELECT 4 AS REPORT_ORDER, 
		CAST('Full detail: ingredient, clinical drug, HOI' AS text) as report_name,
		INGREDIENT_ID AS INGREDIENT_ID, 
		INGREDIENT_NAME AS INGREDIENT, 
		CLINICAL_DRUG_ID as clinical_drug_id, 
		CLINICAL_DRUG_NAME as clinical_drug, 
		HOI as hoi_id, 
		SNOMED_HOI as hoi, 
		sum(medline_mesh_clin_trial_count) as ct_count, -- Ingredient-based
		sum(medline_mesh_case_report_count) as case_count, -- Ingredient-based
		sum(medline_mesh_other_count) as other_count, -- Ingredient-based
		sum(spl_splicer_count) as splicer_count, -- Drug-based
		sum(spl_eu_spc_count) AS EU_splicer_count,
		sum(medline_semmed_clin_trial_count) AS SemMedDB_CT_COUNT,
		sum(medline_semmed_case_report_count) AS SemMedDB_CASE_COUNT,
		sum(neg_medline_semmed_clin_trial_count) AS SemMedDB_Neg_CT_COUNT,
		sum(neg_medline_semmed_case_report_count) AS SemMedDB_Neg_CASE_COUNT,
		avg(aers_eb05) as eb05, -- Ingredient-based
		avg(aers_ebgm) as ebgm, -- Ingredient-based
		sum(aers_report_count) as aers_report_count-- Ingredient-based
	FROM CTE_TRANSLATE
	GROUP BY INGREDIENT_ID, INGREDIENT_NAME, CLINICAL_DRUG_ID, CLINICAL_DRUG_NAME, HOI, SNOMED_HOI 
)
SELECT *
FROM CTE_REPORT_1
UNION
SELECT *
FROM CTE_REPORT_2
UNION
SELECT *
FROM CTE_REPORT_3
UNION
SELECT *
FROM CTE_REPORT_4
ORDER BY INGREDIENT_ID, REPORT_ORDER, CLINICAL_DRUG_ID, HOI_ID