OHDSI Home | Forums | Wiki | Github

Preparing for COVID vaccine codes in the vocabulary and modeling in the OMOP CDM

AMA is preparing to release new codes for COVID-19 vaccines in anticipation of their potential FDA approval.

As an example, they have posted codes for the Pfizer and Moderna vaccines.

So my questions: (pinging @Christian_Reich @clairblacketer and @hripcsa for their guidance but would welcome other input)

  1. How will we ensure these new COVID codes make it into the OMOP Vocabulary? Does anyone know how the AMA deliberations manifest into the input data tables that we regularly download and integrate into our vocab?
  2. How should we coordinate across the OHDSI community to ensure we all align to the Vocabulary version when we begin to support vaccine surveillance efforts? (e.g. do we agree on 1 vocab version that we’ll all adopt at the same time? and then stay on a constant cadence (e.g. quarterly/semi-annually/annually) thereafter?)
  3. In the CDM, how do we model the exposure (e.g. the vaccine code) and the administration (e.g. the vaccine administration code for first or second dose)? Will we capture each vaccine code+administration code combos as distinct DRUG_EXPOSURE records? Presumably we will also capture NDC codes as DRUG_EXPOSURE records. This modeling decision will impact how we think about designing cohort definitions to extract ‘new vaccine users’ and distinguish first exposure from second exposure.

Given the public health impact and OHDSI’s potentially important role in supporting our regulators in their safety surveillance effort, I think this topic is worth OHDSI coming together to author clear data standards guidelines that we can embrace and apply across our community. I’m happy to work with others on this as more information comes to light.



We will add them with the normal refresh mechanism of CPT4/HCPCS. If that gets a hiccup for some reason we’ll make sure manually it’s in there. I’ll expect RxNorm and CVX to add them on short notice as well, and then we cross-link. Will send a note to RxNorm. Everything should be a Drug concept at the end of the day.

Will reply here when it’s done. From then on, all releases will have these concepts. If they are overwritten for some reason we will report. We will also keep the Corona Virus page up to date.

Unfortunately, that has been proven unsuccessful. Reason is there are so many parallel and unrelated vocabulary updates that we cannot seem to be able to get on a routine release schedule. It’s been a rolling release, and I don’t see how that will ever be different. In fact, with the addition of more and more vocabularies the situation will not get worse.

Generally, vaccines are drugs, and hence they are in DRUG_EXPOSURE. The actual procedure of administration is trivial (shot in the arm or other muscle), which means it needn’t get recorded. If the source data contain both NDC and HCPCS/CPT4 codes then you will have two records at the same day. That could be suppressed at the ETL level, or you use the DRUG_ERA table, or you use the cohort definition mechanism not to double count.

Whether it is the first or second dose we will only indirectly have. If that causes issues we need to talk about how to handle. Right now, the OMOP CDM generally drops these implicit timing information in codes.

Yes please. :slight_smile:

The key problem will be separation of manufacturers.

For Influenza - Not sure if current billing codes allow distinguishing Fluarix (GSK) from Fluzone(Sanofi) in CPT world.
See https://www.cdc.gov/flu/professionals/acip/2020-2021/acip-table.htm

For vaccines using same “principle” (like the Moderna and Pfizer are)- having still separate codes by manufacturer will be key. Does any one know of AMA plans to have it specific (like in the world of drugs) or have it bundled (like in the world of devices).

Generic coding at the source might be a potential issue. Most Epic immunization data are coded in the CVX vocabulary. CVX codes don’t identify the manufacturer and usually don’t distinguish between formulations. Fortunately, EHR data usually contain “billing” codes. These are CPT/HCPCS codes and it looks like the AMA will distinguish the two manufacturers.

In Epic, immunization data live in tables separate from the traditional “drug” tables. @krfeeney and others heavily involved in the covid research projects might want to suggest sites add source vaccine data to their OMOP pipeline now in anticipation of the drugs being approved for use.

@Christian_Reich have you heard when RxNorm will have codes realized. I see a Covid NDC code for the Pfizer vaccine but no mapping to a standard code.

I’m curious as to others’ thoughts on how the vaccine program will impact our ability to get data? Do folks have experience with other state department of health programs and how that data is or is not integrated into EMRs? I’m thinking specifically of the U.S.–our dearth of tuberculosis data may reflect that a lot of programs that are funded and deployed through our state departments of health may not get reflected in our EMR and claims data, but I’m not sure. If a lot of our vaccinations go through state departments of health in specifically realized clinics (like at Dodgers Stadium), will that impact our ability to see the data?

1 Like


There are Standard RxNorm concepts:

37003436 “SARS-CoV-2 (COVID-19) vaccine, mRNA-BNT162b2 0.1 MG/ML Injectable Suspension” (Pfizer BioNTech)
37003518 “SARS-CoV-2 (COVID-19) vaccine, mRNA-1273 0.2 MG/ML Injectable Suspension” (Moderna)

They are not mapped yet. In particular, NDC 42796198 “MODERNA COVID-19 VACCINE - cx-024414 injection” and 42797616 “rna ingredient bnt-162b2 .23g/1.8mL INTRAMUSCULAR INJECTION, SUSPENSION” are not. There are no other NDCs in the system, yet. We will hunt that down and fix it up.

1 Like

@Christian_Reich any idea of when the mapping between NDC and RxNorm might come into the vocabulary :slight_smile: I’m waiting for it before we do an update of the vocabulary in our network.

Hi @cukarthik,
we are currently composing a hopefully comprehensive package that is supposed to cover SARS-CoV2 vaccines and monoclonal antibodies. See also this github issue.
We aim to include CVX, CPT4, HCPCS, NDC, RxNorm and maybe also early ATC codes (meant to be released in 2022).
This all means we also need to make sure the dependencies work out and go through anyway necessary refreshes besides only the vaccine codes. The bulk of the work will probably happen during February and we hope that by beginning / mid of March we can release most of the vocabularies involved.
~ Mik

1 Like

Well, generic coding at the source is an issue. Would @Christian_Reich, @mik, or @Dymshyts please provide some mapping guidance? In our source data, we have COVID-19 vaccine data coming across in free text string. Which concept_ids should I map the following:

  • Pfizer, SARS-COV2 (COVID-19) VACCINE

  • Moderna, SARS-COV2 (COVID-19) VACCINE

37003436 SARS-CoV-2 (COVID-19) vaccine, mRNA-BNT162b2 0.1 MG/ML Injectable Suspension

37003518 SARS-CoV-2 (COVID-19) vaccine, mRNA-1273 0.2 MG/ML Injectable Suspension

1 Like

Only now I’ve realized why I don’t really like this mapping:

It’s valid while the only formulation exists. Let them introduce a lower dosage or something else, we are screwed up. For sure, official mappings will be fixed, but it’s an additional work and a big pain to constantly review the custom mappings done here and there.

But the problem is that we don’t have any better option - CVX is very specific too (or very general).

So this is another reason to build a homemade vocabulary for vaccines.

1 Like

Hi Alexander,

What would be better - using the official mappings if they could be more easily maintained, or building a homemade vocabulary?

I might be able to maintain the official mappings automatically. I have the technology to remap automatically (it’s new).

Also, out of curiosity, why would a custom vocabulary not also require updating if dosages were changing?

1 Like

Probably if they introduce, let’s say, child dose, the vaccine will have different concentration. So we will be able to reflect it using RxNorm.
On the other making a homemade vocabulary is tempting idea as we can fully control it.
But it also requires more maintanance and effort.
For now I would stick to RxNorm, so users would be able to map their records to standard concepts.

1 Like

Thank you for the concept_ids, @Dymshyts!

I have watched the source coding for data elements related to Covid evolve over time. When COVID-19 was first identified as a disease in our EHR source data, it was an internal custom text string mapped via an internal mapping table to multiple SNOMED codes. The EHR now maps to the ICD10CM codes for COVID-19 disease. We also had lab Measurement data coded to custom, internal identifiers that are now all mapped to appropriate LOINC codes. I hope the drug coding will also evolve into standard coding. If not, I will post new source values as they appear in our data :slight_smile:

Well, there’s already a standard coding:
CVX is already in OMOP Vocabulary.
CPT4 is going to be available in Athena tomorrow
what about NDC @Alexdavv?

And RxNorm and CVX makes a standard vocabulary for vaccines.

I was talking the source data being coded with a national terminology versus the internal custom coding the EHR is using now :slight_smile: I searched for the CVX, CPT4, NDC and RxNorm codes at our source. None were found. I only found the text strings

1 Like

Actually both. RxNorm provides specific drug products, CVX works on a high level, while homemade vocabulary can help join everything into one system. So the mappings is not the key problem.

Sounds very interesting. Want to show it?

We’d not map to the specific dosages unless the source explicitly states it.
Moderna would be mapped to “COVID-19 mRNA-1273 spike protein vaccine” without indication of the entire formulation/dosage.

Righ, it will work for official vocabularies where every other code is the entire drug product.
As for custom mapping, “Pfizer, SARS-COV2 (COVID-19) VACCINE” may change the meaning over time. So once it happened, we need to get back to all the ETL done and review it.

It was released with new codes/mappings.

1 Like


Yes, sure - you can see the description on www.dynaccurate.com. We’re currently leaving the funding research program in three weeks and then we can distribute the technology. I’d be very happy to set up a test for OHDSI.

1 Like

Hi @mik, I just wanted to check if you are on track for the bulk of the work in Feb and March. Unfortunately, 2022 is too late for us. Is there a way to move up the time line b/c COVID vaccines are being coded as CVX, CPT, and RxNorm in EHR systems in the US and without these mappings it makes it difficult on the analytics side. Thanks!