OHDSI Home | Forums | Wiki | Github

Proposal for conventions regarding quantity and strength


I have a few proposals with respect to the consistency of OMOP. I would like to start with quantity and strength:
Currently, we do not have any convention how the quantity attribute in the DRUG_EXPOSURE table is linked to the DRUG_STRENGTH table.
This prevents us from calculating dosages and applying OMOP to European data sources.
Imagine if the quantity refers to mg but the corresponding entry in the DRUG_STRENGTH refers to something else like ml or percentages.

I suggest the following two conventions:

  • Entries in the DRUG_STRENGTH table which belong to the same
    drug_concept_id must use the same denominator_unit_concept_id or use
    only amounts.
  • The quantity in the DRUG_EXPOSURE table must refer to
    the denominator_unit_concept_id in the DRUG_STRENGTH table in case of concentrations or to “pieces” in case of amounts.


(Christian Reich) #2


Thanks for bringing these up, there are really good points. It pays to actually be working on a use case, instead of just collecting data and enjoy how pretty they look. :smile:

There are two issues you are mentioning:

  1. Inconsisten denominator units in DRUG_STRENGTH. We do currently have 16 such cases, where the different ingredients of a fixed combination drug product have non-matching units, for example 42799258 “Benzyl Alcohol 0.1 ML/ML / Pramoxine hydrochloride 0.01 MG/MG Topical Gel”. I can imagine how this happened: The original product information probably gave the content in %, and RxNorm then applied its conventions by which % of a liquid (benzyl alcohol) gets converted to ml/ml, and the % of the solid ingredient (pramoxine) gets converted to mg/mg. To be honest, I am a little bit at a loss how to fix that. In those cases, we could just replace the ml/ml to mg/mg and be done with it (can’t do the other way around because ml/ml doesn’t make sense for a solid substance, unless you melt it). Let me know what you think.

  2. Unit of the quantity field in DRUG_EXPOSURE. This is actually currently not properly defined in the OMOP CDM. In the US claims data, where this field originated from, the convention seems to be that for solid products (tablets etc.) the quantity refers to the unit of drug (tablet in this case, or “pieces” as you call it), while the liquid products seem to be referring to the denominator unit (mL in most cases). At least, that seems to be the case in the claims data I have access to: PharMetrics Plus. @dtorok, @ericaVoss, @Patrick_Ryan, could you be so kind and look at the various MarketScans and Optums and check it out?

Let us know what you think. I will enter this into the list of CDM improvements. Please fill out the detailed page, so we can toss it around in the community and get it done.


@Christian_Reich: Thanks for discussing this.

Ad 1:
We do not need a common concentration unit, we just need a common denominator unit. Let’s take a look to a very similar product 42799262 “Benzyl Alcohol 0.1 ML/ML / Pramoxine hydrochloride 10 MG/ML Topical Spray [Itch-X]”. It’s a spray instead of a gel but has the same concentration. Here we have the common denominator mL and we can easily deduce that we have per mL 0.1mL Alcohol and 10mg Pramoxine hydrochloride.
We could do the same for the topical gel. Alternatively, we could provide the same in mg instead of mL: It’s 0.1mg/mg Alcohol or (applying roughly 1g=1mL) 0.0001mL/mg Alcohol and 0.01mg/mg Pramoxine hydrochloride. The percentages in RxNorm make these conversions very easy.

We should chose that denominator unit which fits to the “sales unit”. Eventually, we need the total amount of the active ingredients for dosage calculations. A google search (for example http://www.americanotc.com/mobile.php?seller=AmericanOTC&navt0=a&navc0=&Tsearch=&searchc=&navt1=92837&navt2=96612) shows that both products are sold with the weight unit “oz”: The topical gel is prescribed in 1.25oz, which is appr. 36mg. Thus, we will prefer mg for the denominator units in the DRUG_STRENGTH table.

If we provide 36 in the quantity field of the DRUG_EXPOSURE table, we can calculate the total amount of active ingredients:
36 x 0.0001mL = 0.0036mL Benzyl Alcohol and 36 x 0.01mg = 0.36mg Pramoxine hydrochloride.

Ad 2:
The convention that the unit refers to tablets for solid forms and mL for liquids is fine. But this convention must be consistently applied in the DRUG_STRENGTH table. For solid forms we use the amount_value, the denominator unit is not applicable (conceptual “tablets” or “pieces”).
The problem of divergent units typically applies for gels and sprays. In PharMetrics Plus examples are some prescriptions for “Adapalene Topical Cream“ (quantity in g, strength in ml), “Dapsone Topical Gel” (quantity in g, strength in mg) and “Ipratropium Nasal Spray” (quantity in mL, strength in {actuat}).
Many of them should be easy to fix, for example g to mg or g to mL. Some may need manual research.

There is one additional use case to consider: Patches have a dose release over time. However, we can map this to the current OMOP attributes with the following convention: Put the number of patches in quantity. In the DRUG_STRENGTH table put the release dose in the nominator and 1 hour in the denominator.

Apart from that there are some more particularities (for example release dose per cm²), but they are rarely used and we can neglect them (at least for the time being).

Do you need more information?

(Don Torok) #4

From MarketScan
Column: METQTY

The number of units dispensed without regard to packaging format. The first nine digits of the NDC number describe how the drug is packaged.

Data Type:
Numeric, three decimal places of precision.

As coded on claim. Should correspond to packaging; e.g. if the drug package is tabs, the metric quantity should also be in tabs.

My comments:
No column in the MarketScan drug table (other than the NDC code) exists to describe the packaging.

(Christian Reich) #5

Wonderful. Thanks @DTorok.

Can you look what quantity there is for the following NDC:

select m.source_code 
from prodv4.drug_strength s 
join prodv4.source_to_concept_map m on target_concept_id=s.drug_concept_id and m.source_vocabulary_id = 9
where s.concentration_denom_unit='mL'

These are liquids, ointments, creams, etc. Can you look in Marketscan what’s in there? In particular, how does it compare to days_supply?

(colin e.) #6

RE: Unit of quantity field in DRUG_EXPOSURE
We noticed this case of Itch-X before in some recent work with RXNORM to create a customized naming format. I honestly have no idea where they are pulling the ML strength basis from. Neither the NDC or the FDA SPL data say anything with respect to any actives being in units of “ML” - it’s all in “G” (/100g specifically):

RE: Unit of quantity field in DRUG_EXPOSURE - for the sake of discussion:
Is it important to fundamentally distinguish between Unit of quantity in terms of cumulative dosage quantity vs. billing code quantities? It seems like depending on the source of the data, we may not have any choice but to deal with inconsistencies.

It might be helpful to have someone more familiar with claims data (or real world data that would actually populate the DRUG_EXPOSURE table) to chime in - but it seems like there are (at least) two cases:

  1. For sources from claims-like data. It sounds like most are given with respect to a “Billing unit” code of “G”, “EA” (“pieces”) or “ML”. This seems to agree with @Christian_Reich’s initial “unofficial” description of how Qty works in DRUG_EXPOSURE. I am fairly certain that “Billing unit” is required by FDA at the NDC or SPL level for claims related billing consistency and many of the data sources (CMS NADACS for instance) present drug price data this way:
  • e.g. we tend to see: (Drug, units, quantity)
    “10mg Oxycodone oral tablet”,10.00, “EA”
    “10mg Oxycodone tablet”,100.00,“MG”
  1. For some other observational sources (specifically Clinical trials data), which we are working with now, it seems many of the trials give dosages in terms of total active ingredient dosage - and we do tend to see things like this after interpreting the verbatim protocol for a given arm:
  • “RIBAVIRIN 200MG TABLET”, 1000.00, “MG”
    Side Note: FAERS data also provides some dosage in cumulative / total values.

(Don Torok) #7

Top 20 days_supply by count from MarketScan for the following query
SELECT days_supply, count(*)
FROM drug_exposure rx
join drug_strength s ON s.drug_concept_id = rx.drug_concept_id
join source_to_concept_map m on target_concept_id=s.drug_concept_id
and m.source_vocabulary_id = 9
where s.concentration_denom_unit=‘mL’
group by days_supply;

Days_Supply -> Count
10 -----> 5,936,647,819
30 -----> 3,502,241,952
5 -------> 1,791,874,259
7 -------> 1,713,378,768
15 -----> 1,292,391,400
6 -------> 1,081,561,050
4 --------> 911,696,738
20 ------> 781,000,500
12 ------> 725,655,756
14 ------> 699,003,315
3 --------> 685,527,860
8 --------> 647,935,052
25 ------> 413,346,386
90 ------> 370,623,764
1 -------> 340,288,922
2 -------> 318,230,157
28 ------> 302,168,809
16 ------> 289,940,383
9 -------> 240,74,8421

(Christian Reich) #8


So, maybe the entire problem can be handled at the tool level by treating mg and mL in the denominator as interchangeable? And leave the vocabulary alone? Would that work?

Yes, that’s what you want. The question is, where do you find the information that the bottle contains 36 mg:

  • You could Google each product when ETLing. Probably won’t scale well.
  • The source data have that information in the transaction tables. The US-based claims data might. We are looking into this. PharMetrics does.
  • The vocabulary has it. And it often does: There are the ordinary Clinical and Branded Drugs, but also the so-called Quantified Clinical/Branded Drugs. For example, for 1636815 “Testosterone 0.01 MG/MG Topical Gel” there is 45892525 '5000 MG Testosterone 0.01 MG/MG Topical Gel", giving you the size of the container (5 g). Unfortunately, this is not available for all products in RxNorm. So, the above promixine gels/sprays don’t have those pairs.

The DRUG_STRENGTH table just provides what comes with RxNorm. However, it looks like in most cases it provides a concentration for a liquid, and a amount for a solid drug.

We need to add conventions for the quantity. It might well be, that we will have to introduce two different quantity fields: quantity and amount, where the former gives the number of individual “pieces”, and the latter the amount in the unit used to define the content.

We should probably put together a systematic count, so we know how severe which of these problems are.

(Christian Reich) #9


Well, they harmonizing the units with the ingredients. Benzyl alcohol is a liquid, and they always convert the unit to mL (assuming that 1g=1mL, which probably is close enough for those aqueous solutions). It’s actually nice, because that way you can be sure that if you compare different drugs containing the same ingredient, you always end up with the same units, making that comparison apples to apples.

Currently, we are not capturing this in the CDM. We will need to fix that. Do you know whether the claims data always have those three flags? I should know, and I can find out.

@DTorok: Not days_supply. Those are clean. We are discussing the quantity. Can you re-run this thing but count quantity instead?


Only if I knew the unit the quantity is referring to. If I don’t know the meaning of quantity I won’t know to what measurement I need to convert to.

A colleague provided these counts for PharMetrics+:

unit		cnt
mg		67225
mg/mL		28234
mg/mg		2816
[U]/mL		1907
[U]		1670
10*-3.eq/mL	1257
mg/{actuat}	849
mg/h		702
[U]/mg		462
10*-3.eq	363
mL/mL		208
10*-3.eq/mg	17
[U]/{actuat}	17
mg/cm2		6
{cells}/mL	2
mL		1

I don’t understand the advantage, but I may misunderstood the approach:

  • If quantity is a measurement for “pieces” I still need somewhere the
    information of the amount of active ingredient per piece.
  • If amount is a measurement for the total amount in measurement units
    I still need somewhere the information of the amount of active
    ingredient per measurement unit.

The crucial point is not the distinction between pieces and measurement units but the link between quantity and the DRUG_STRENGTH table.

The ultimate goal is to determine the amount of active ingredients prescribed or sold.
The canonical way is to look for the total amount of drug and multiply this with the proportion of the active ingredient(s).
The total amount of drug dispended for a specific product can vary by prescribing just a specific amount of drugs (for example number of tablets in the US), amount of liquid out of a bottle (?), one out of several pack sizes of this product (Europe) but also manual compounding (Oncology).

All the above use cases seem to work for the majority of drugs (solid forms, liquids).

For the remaining cases (especially gels, creams and sprays) we either relinquish to address these or we try to find a way how we can describe the amount of drug in those measurement units which are used to define the proportion of active ingredients.
Independent from that decision we need a clear understanding of every attribute in OMOP in general and the quantity in particular. As long as it is not clear what the quantity is referring to we cannot use it for analyses. Thus, it would be better to have missings in the quantity field in case of gels but interpretable values in all other cases.

The common data model is designed to support research and wants to ensure that research methods can be systematically applied to produce meaningfully comparable results. In the light of this understanding, the purpose of the quantity is not to provide just a placeholder a data source can put any value in, for example their billing units in case of PharMetrics.
We don’t need data source specific attributes, we rather need to precisely define the commonality.
Just my personal perspective :wink:

Once we have defined how we want to provide the amount of active ingredients (probably by mapping quantity to DRUG_STRENGTH), we can discuss whether we do necessary conversions within the ETL process beforehand or within the analytics afterwards. The former might be supported by a mapping table. We may have other reference data systems for therapies which are able to provide this information.

Furthermore, the world is not PharMetrics alone. There may be other data sources which are able to provide the necessary information. But we have to request it.

(Christian Reich) #11

@klaus, friends:

All very good. Let me try a proposal:

  1. Solid products (tablets, capsules, suppositories):
  • A new field “quantity_unit_concept_id” in the DRUG_EXPOSURE table contains a unit concept of mass (typically “mg”) or a functional unit concept (“I.U:”),

  • quantity_unit_concept_id contains NULL. This is the equivalent for “each”, and the unit is whatever the vocabulary provides as Dose Form (tables, capsules, suppositories etc.).

  • The quantity field contains the numerical value.

If you want to calculate the total dose of the drug exposure you either take the provided dose (5 mg), or, if the quantity_unit_concept_id is NULL, you multiply the quantity (5) with the strength of the ingredient(s) (1 mg) provided in the DRUG_STRENGTH table.

Note: The former will not work for combination products, because it is not clear which ingredient the “5 mg” refers to. For single-compound products it would be unambiguous, but only if the quantity_unit_concept_id and amount_unit_concept_id match. We would have to enforce that.

  • Liquid products (solutions, gels etc.) which contain a drug at a given concentration.

  • If the quantity_unit_concept_id contains a unit (typically “mg” or “mL”), then this is the total amount of the solution in the DRUG_EXPOSURE record, and not the amount of the ingredient as it is the case with the solids.

  • If the quantity_unit_concept_id contains NULL, the amount of solution in each container is taken from the DRUG_STRENGTH table in a new field called “denominator_amount”. Currently, this field is omitted as all products are normalized to a denominator amount of 1. However, for Quantified Clniical/Branded Drug products the total volume is known, and the numerator and denominator amount fields would have to be adjusted: For example, if the Clnical Drug is Paracetamol 250 mg/mL, then the Quantified Clinical Drug would have 1250 ml / 5 ml.

To calculate the total dose for each ingredient, the amount of solution is multiplied with the concentration given in DRUG_STRENGTH. The amount is either given as quantity - quantity_unit or, if the unit is NULL, quantity * denominator_amount - denominator_concept.

Note: For the latter to work, the drug_concept_id has to contain a Quantified product.

We have to make the decision whether the denominator_concept_id and the quantity_unit_concept_id could use “mg” and “mL” interchangeably. If we decide so, we would make our life a lot easier. However, it is not precise: Non-aqueous liquids do not have the same density as water (which is roughly 1 mg/mL at room temperature), and solutions and mixtures of liquids can have an additional volume contraction effect. If we don’t like it, we don’t have a ton of options, because we can’t for each solution obtain these density coefficients. RxNorm seems to have made the decision that it doesn’t care. Other drug databases don’t normalize at all and just pass on whatever the manufacturer declares.

  • Products with special dosing. These are e.g. inhalants with “actuations” (puffs) or patches with drug released over time. We could decide to treat them either as solids (where each dose is equivalent to a solid dose form), or as liquids, where the denominator_unit_concept_id is not “mL” or “mg”.

Let me know what you think. We will ask the community for what they have in their data to see wether it would fly. And then we should make an experiment and try those calculations.


Excellent! Altogether, I would like to summarize this into the following 5 options:

  • (1)
    If we have the dose per piece or per actuation in the DRUG_STRENGTH table (solid forms, puffs), quantity refers to the number of pieces/actuations.

    dose for ingredient_concept_idX = quantity x amount_valueX [amount_unit_concept_idX]

  • (2)
    If we have the total volume and the concentration of the product (for example spray/gel) in the DRUG_STRENGTH table (Quantified Clinical/Branded Drug products), quantity refers to a fraction of the product (for example volume). The DRUG_STRENGTH table provides the concentration of the active ingredients:

    dose for ingredient_concept_idX = quantity x numerator_valueX [numerator_unit_concept_idX]
    If we don’t have that information in the DRUG_STRENGTH table, we need it in the transactional data, but can’t address multi-substance products:

  • (3)
    If we have the total amount of active ingredient in the DRUG_EXPOSURE table, quantity refers to the total amount of active ingredient:

    dose for ingredient_concept_id = quantity [quantity_unit_concept_id]

  • (4)
    If we have the total amount of the product (for example spray) in the DRUG_EXPOSURE table, quantity refers to this total (volume) – probably it represents a fraction of this. The DRUG_STRENGTH table holds the concentration of the active ingredient.

    dose for ingredient_concept_id = quantity x numerator_value [numerator_unit_concept_id]

  • (5)
    If we have patches with a dose release over time, quantity_unit_concept_will be set to 0. The DRUG_STRENGTH table holds a rate and not a concentration: The denominator_unit_concept_id is always h. The nominator_value holds the dose release per hour. The quantity refers to the number of patches:

    Dose rate for ingredient_concept_idX = numerator_valueX [numerator_unit_concept_id]
    For the above five options we need the following corresponding assumptions/conventions:

  • (Ad 1) Both, the quantity and the entries in DRUG_STRENGTH refer to one piece or actuation.

  • (Ad 2) The quantity represents a fraction or a multiple of the denominator, i.e. the quantity does not have a unit.

  • (Ad 3) Both, quantity and quantity_unit_concept_id refer to the ingredient_concept_id of the first entry in DRUG_STRENGTH for this drug_concept_id. Probably there is only one entry.

  • (Ad 4) The quantity refers (conceptually) to the denominator_unit_concept_id. “Conceptually” means, the quantity_unit_concept_id is equal to the denominator_unit_concept_id or they use either mg or mL. We can convert the numerator values based on “1000mg = 1mL” with the restrictions of inaccurate density assumptions. This also requires that volumes are measured always in mL and masses in mg (no µg, g, kg, etc.).

  • (Ad 5) The DRUG_STRENGTH table holds the rate always in hours, i.e. denominator_unit_concept_id = h. The quantity refers to the number of pieces and might be interpreted only together with a dosage instruction.

From an analytical perspective this is fine. Can this be realized from an ETL perspective?
Do you want me to update the proposal?

One minor question: I wonder why we distinguish in the DRUG_STRENGTH table between amount and numerator? Amount will be used if denominator is not used. Why do we need this special treatment? We could always use the numerator.

(Christian Reich) #13


Sounds like we are consolidating on a proposal. Here are 3 decisions we need to make to get it to the finish line:

  1. 1000 mg = 1 ml. Assuming we allow to use mg an ml interchangeably for determining the container size of a liquid drug (we have no other chance anyway): Should we enforce to have everything in ml, or can we live with this being somewhat untidy and allow things like “3 g of ointment”? If we want to enforce we have to do a scrubbing job on RxNorm and all other future drug vocabularies. If not, folks will have to “know” this conversion at the analytical level.

  2. The right way to define category as number of pieces or amount. One choice is to add a field quantity_unit_concept_id to the DRUG_EXPOSURE table, which would contain NULL (for the pieces) and a valid unit concept for the amount. That concept_id would have to match the denominator_unit_concept_id, which would have to be enforced by convention. Which means, somebody will get it wrong. The alternative is to create a quantity_category_flag, with 0=pieces and 1=amount. That would be clean, but harder for ETL folks to fathom.

  3. Allowing or forbidding scenario 3 in Klaus’ list. In the case of a solid drug and the quantity category = amount, we would get things like 5 mg. This would not work for combination drugs with multiple ingredients. Therefore, in my opinion we shouldn’t allow it at all, and quantity = amount is only valid for liquids. BTW: We have a similar problem with the effective_drug_dose/dose_unit_concept_ id in DRUG_EXPOSURE, in only works for single ingredient product.

Let me know what folks think.

(colin e.) #14


My votes RE: the 3 decisions you posit above:

  1. I think we should leave it to be somewhat untidy. Scrubbing sounds like a pretty significant housekeeping task vs. creating documentation showing folks how to handle the conversion on their own.

  2. I would prefer the “cleaner” option (qty_cat_flag) for my purposes.

  3. I agree with your opinion: We should not create a convention that will prohibit use with combination drug products.

(Don Torok) #15

Item 2 ) Never like the idea of using NULL to represent something, in this
case pieces. There is a UCUM value for tablet is that close enough to

(Christian Reich) #16

Thanks @herrcerd.

@DTorok: Wait. I understand your reservations with NULL, but if we want to use the actual Dose Forms use the ones from RxNorm, not UCUM (select * from concept where concept_class_id=‘Dose Form’ and vocabulary_id=‘RxNorm’). However, if we let them put those in than we have even more ways for them to screw it up. They could put “Oral tablet” where the Dose Form of the actual product is “Rectal Suppository”, if you get my drastic example. And if they can they will.


I just want to point out that the decisions are not independent from each other:
I cannot decide on “doing the conversions on the analytical level” and “using the flag instead of the unit”: If I want the flag I do not have the unit anymore and can’t convert measurements. In this case, the quantity must refer to the denominator_unit.

With respect to 3, I prefer to forbid the scenario 3 in order to support multiple ingredients.

We should include the effective_drug_dose/dose_unit_concept_id problem into the current discussion.
Here is the proposal:
We can handle compounding by adding one entry in the DRUG_EXPOSURE table per compound. This enables us all the above discussed options to provide quantity and strength. Even a compounding of multi-ingredient products would be possible.
If required, we could add a flag in the DRUG_EXPOSURE table to mark that this entry is part of a compounding. I don’t think we have to distinguish between (theoretical) multiple compounding on the same day for the same patient.
effective_drug_dose and dose_unit_concept_id can be removed.

(Don Torok) #18

Top 20 drug occurrence quantities by count from MarketScan when the drug concentration denominator is in ‘mL’

SELECT quantity , count(*) as recs
FROM drug_exposure rx
join drug_strength s ON s.drug_concept_id = rx.drug_concept_id
join source_to_concept_map m on target_concept_id=s.drug_concept_id
and m.source_vocabulary_id = 9
where s.concentration_denom_unit=‘mL’
group by 1 order by 2 DESC

quantity count
NULL ------> 5,352,704,826
120 ------> 1,755,174,696
30 ------> 1,691,853,117
100 ------> 1,561,252,470
150 ------> 1,554,596,427
60 ------> 1,503,585,520
240 ------> 1,307,392,605
10 ------> 1,289,815,720
200 ------> 1,241,314,150
15 ------> 1,213,803,384
180 ------> 911,371,962
5 ------> 876,165,683
45 ------> 656,079,965
75 ------> 638,552,544
300 ------> 587,097,428
50 ------> 527,407,543
80 ------> 523,660,735
1 ------> 441,927,397
473 ------> 357,228,590
527 ------> 339,217,568

(Christian Reich) #19

Thanks, @DTorok. Looks like the right thing. Round numbers of ml or mg for the solutions and multiples of 30 (= 1 oz.) for the creams. Disconcerting that the majority has no quantity information, but that’s the data.

Yes, that’s what I meant.

Very nice idea. That totally nails it. We would add records to DRUG_STRENGTH for all compunds (drug_concept_id=ingredient_concept_id) with the mg concept in the denominator_unit_concept_id for all “dry” compounds and mL for the “wet” ones. We could steal the knowledge what is what from RxNorm.

Well, there is the use case where you want to specify a, say, chemotherapeutic drug product (certian NDC) but also the actual dose administered. But I guess you are right, you could do that through the quantity mechanism.

Yes, please please.


I have updated the proposal based on the following assumptions/ideas:

  • As discussed we don’t address option 3 as it would not support multiple ingredients.
  • With this exclusion we don’t need either the quantity_flag or the quantity_unit_concept_id anymore.
  • If the ETL process will make use of option 4, i.e. providing the total amount of the product, it will standardize mass units to g and volumes to mL. With this convention, the ETL process is much easier and can use the quantity values interchangeably without checking the units in the DRUG_STRENGTH table. We will make necessary conversions on the analytical side.
  • The suggested compounding approach is included.