Do all drug_exposure records participate in drug_era formation?

Eldar · December 18, 2018, 5:36pm

I wonder how drug_exposure and drug_era tables are related.
As far as I understand ALL records from drug_exposure table should be considered while building drug_era regardless empty _end_date, days_supply etc.

Thus, if I compare number of persons ever used some ingredient via drug_era to number of persons ever used drugs(containing the same ingredient ) via drug_exposure, the counts should be equal.

@Christian_Reich, @mvanzandt is my logic correct?

DTorok · December 18, 2018, 6:05pm

Your logic is correct, the number of people in Drug Era should match those in Drug Exposure for a drug that maps to the ingredient.

aostropolets · December 18, 2018, 7:21pm

Bring this CVX thing on. I think that we should just fix drug era scripts to include it, as CVX is a totally legitimate vocabulary (and we even saw it in the source data).

cukarthik · December 18, 2018, 7:35pm

If the CVX vocabulary is updated, it would be also be good to include the vaccine groupings.

Dymshyts · December 18, 2018, 10:48pm

@cukarthik, interesting,
the problem is that we don’t have the descriptions of these vaccine groups, so we need to make some research.
any use cases when we need this?

Alexdavv · December 19, 2018, 6:41pm

Hi @Eldar! As far as I remember the published DRUG_ERA scrips does NOT capture the records from DRUG_EXPOSURE if days_supply is null. So it looks like the script should be amended on ETL since days_supply and even _end_date/datetime are not mandatory fields now.

@Christian_Reich, @aostropolets, @Dymshyts, Is it actually correct to be an optional field for drug_exposure_end_datetime? I cannot find this in v6.0 specs release notes.

Chris_Knoll · December 19, 2018, 6:58pm

Just a correction to @Alexdavv statement: the published drug era table does use all records, even if days_supply is null. here is the relevant portion of the code:

		, COALESCE(
			NULLIF(drug_exposure_end_date, NULL) ---If drug_exposure_end_date != NULL, return drug_exposure_end_date, otherwise go to next case
			, NULLIF(drug_exposure_start_date + (INTERVAL '1 day' * days_supply), drug_exposure_start_date) ---If days_supply != NULL or 0, return drug_exposure_start_date + days_supply, otherwise go to next case
			, drug_exposure_start_date + INTERVAL '1 day' ---Add 1 day to the drug_exposure_start_date since there is no end_date or INTERVAL for the days_supply
		) AS drug_exposure_end_date

This code creates a drug_exposure end date based on end date, days supply or defaults to 1 day after exposure.

Alexdavv · December 19, 2018, 7:26pm

@Chris_Knoll, thanks a lot for your clarification. Really, everything is foreseen there.
@Maria_ya, don’t you remember why did we face this issue and what version of “standard” script was used?

DTorok · December 19, 2018, 8:29pm

I thought drug exposure end date was going to be made NOT NULL in V6. Documentation says it is NULLable, but the SQLServer DDL has drug_exposure_end_datetime as NOT NULL. Did not check other DDL. Anyway my point is that you can probably make some improvements in current algorithm by looking at the drug type. For example, if the drug_type is ‘mail order’ I suggest the end date should be start + 90 and if ‘prescription written’ start + 30.

Christian_Reich · December 20, 2018, 10:58am

Correct. Need to fix. And need to change the version control we have. Will roll that out next year.

We need to look at this for the eras. If CVXs are not ingredients they shouldn’t be in the eras. If they are they may duplicate existing RxNorm / Extension ingredients.

Christian_Reich · December 20, 2018, 11:15am

That’s what I mean, @cukarthik. Are these CVX groupings ingredients or classifications? Right now it’s wishy washy. Will figure it out.

Eldar · December 20, 2018, 3:30pm

Well, that is exactly why I raised the question - the same era constructor has optional filters.

--AND d.drug_concept_id != 0
---AND d.days_supply >= 0.

The second one is used to filter out records with negative days_supply. And while nulls are also filtered out,
it decreases number of records used for drug_era.

I want to clarify what records SHOULD NOT be used for drug_era:

-records with unmapped concepts (drug_concept_id = 0)
-records with standard drug_concept_id which somehow do not follow classification system (i.e CVX which is now standard but doesn’t have the same structure as RxNorm and thus there is now assigned ingredient for it)

Also, I didn’t find any mentions of OBSERVATION_PERIOD in era constructor, and thus, if there are DRUG_EXPOSURE records with start/end dates outside observation_period (correct me if I’m wrong but this is now allowed) they are still used for drug_era.

Didn’t I miss anything?

Chris_Knoll · December 20, 2018, 4:57pm

Hi Eldar,
Those ‘optional’ filters wasn’t part of the original algorithm that I wrote, and someone added it to account for bad data in the ETL (which I am an advocate of people fixing their data, not having all other analyses work around it ).

You are correct that applying those filters will remove records with null values. those filters should have been AND (d.days_supply is null or days_supply >= 0). But, as I said, those were just suggestion filters, and not part of the ‘standard’ era building logic.

The d.drug_concept_id != 0 filter is redundant, because the drug era logic finds the ‘Ingredient class’ of the drug exposure to roll up to, and conceptID = 0 does not roll up to an ingredient, so records with concept_id = 0 won’t be selected.

I’m hesitant to say what records ‘SHOULD NOT’ be used, rather the algorithm just specifies which records will be used:

any record whose drug_concept_id rolls up to an Ingredient

That’s it. If they want to introduce a new rollup (they mentioned a CVX which is a sort-of ingredient level grouping?) then they just add that to the query to associate drug_exposure records to the correct concept to group with. I think we should be careful with this because the drug_era currently has a nice-and-comfortable understanding about rolling up to a specific level of the vocabulary hierarchy (ingredient) and I’m not sure what un-intended consequences we will get if we throw arbitrary rollups into the era table.

Chris_Knoll · December 20, 2018, 4:59pm

That’s correct, the algorithm assumes observation period rules for V5 are in place: that any data observed will be found in exactly one observation period. So, if that rule holds (ie: the ETL complies with this) then you won’t have any ERAs that fall outside of observation periods. In V6, if they will allow that, then you may have ERAs that span observation periods, and if so, well, they are allowing that behavior, so you won’t have to concern yourself with observation periods anyway.

Eldar · December 21, 2018, 12:23pm

Thank you, @Chris_Knoll.
You confirmed all my thoughts about drug_era population