OHDSI Home | Forums | Wiki | Github

Mapping accuracy - Consideration of 'Class' field

Hello Everyone,

We are currently working on mapping our source terms to OMOP standard terms. Though we always pick the right values for filters like ‘Domain’ and ‘Vocab’ and ‘Standard’, can you help us understand the importance of ‘Class’ column/field? Is there any guideline that you follow to map items like

for example, what’s the difference between 1 and 2 shown below and how will this impact our final analysis results?

  1. Drug (domain) -->Rxnorm (vocab) -->Ingredient (class)
  2. Drug(domain) --> Rxnorm -->Branded drug box

a) Am I right to understand that we are usually expected to map it to the broader term?

b) Should our mapping be dependent on the institution who carries out this omop project?

I mean will the mapping differ should the enduser be hospital or Pharmaceutical company?

Hospital will be interested to know about the patients whereas the pharamceutical company might want to know about the brands of drugs etc.

c) How accurate should our mapping be? What will be the drawback if we don’t pay attention to ‘Class’ field?

d) Do you have any standard filters that you follow, so that we can guide the end-users on how to do the mapping or it’s like kind of experimentation? we are doing this omop transformation for a hospital

Apologies, if my question is basic


@DTorok - can you please help me with this? Would be very helpful.


#1 is the ingredient of a drug
#2 The class_id = Branded Drug Box comes from the vocabulary_id = RxNorm Extension. The RxNorm Extension vocabulary is non-US drugs. If you have US data, then you most likely will not use class_id = Branded Drug Box.

But, since you are trying to decipher the the different class_ids, I will give another example:

Example: your source data has Rx CUI = 209459 or the following string term:
Acetaminophen 500 MG Oral Tablet [Tylenol]

If you were to map to class_id = ingredient, you only get the ingredient data which is ‘acetaminophen’

When you map it to the the exact string term or the Rx CUI = CONCEPT.concept_code the class_id will = Branded Drug. This mapping will result in the inclusion of all the source attributes of the drug ingredient = acetaminophen, dose = 500, dose unit = mg, form = oral tablet, and brand name = Tylenol

No, stay as true to the source term as possible. The OHDSI provided hierarchies in the CONCEPT_ANCESTOR table will give you all the the related data if you need it. It’s easy to go from a Branded Drug to an ingredient/s because it is a 1:1 or an appropriate 1:M relationship. You can NOT accurately go from an ingredient to any other class_id.

The ETL team does the mapping, the end users query the data. I highly suggest the end users review the tutorials or attend the tutorials @krfeeney suggested here.

1 Like

Thank you for the response @MPhilofsky. As graduates and IT team, we map based on some educated assumptions. However, shouldn’t this mapping be done by people who have expertise in clinical research/domain knowledge?


Yes, the best is if people with medical background make those mappings.

We have the whole algorithm for drug mapping, you
need Schema with copies of tables concept, concept_relationship, drug_strength, concept_ancestor and concept_synonym you can obtain from Athena, fully indexed.
create the tables following the rules described here
and then you run https://github.com/OHDSI/Vocabulary-v5.0/blob/master/working/MapDrugVocab.sql

In short, drug concept is a combination of attributes:
For example, Acetaminophen 500 MG Oral Tablet [Tylenol] consists from:
Acetaminophen - Ingredient
500 MG - Dosage
Oral Tablet - Dose Form
[Tylenol] - Brand Name

And you map each attribute separately and then machinery finds the best match.

1 Like

I agree with @Dymshyts

A few examples where educated assumptions are incorrect:

  • milliliters (mL) is a quantity for normal saline in the DRUG_EXPOSURE table, but is NOT a quantity for packed red blood cells in the DEVICE_EXPOSURE table

  • expiration is an end date, but is NOT the drug_exposure_end_date

  • the values in a source’s “status” column may not have the same meeting as the CONDITION_OCCURRENCE.status column