OHDSI Home | Forums | Wiki | Github

Vaccine concept mapping improvement

The existing vaccine vocabularies is probably too messy to fix in a short time.
Alternatively, we may think of building a clean and well-organized hierarchy from scratch.
I have started to draw some draft hierarchy Creately. I can talk it more at next meeting.

1 Like

June 30 meeting summary
Rashmie presented automated hierarchy building using Formal Concept Analysis method based on ~30 decomposed CVX codes.

  • removing concept_name and color code the nodes in further refinement
  • Merck team will do more CVX/CPT decomposition;

We also discussed the limitation of CVX, e.g. it does not provide specific ingredients which may differ in different countries

Agenda for July 14

  • Manually created hierarchy and standard concepts for vaccines
  • General ingredient proposal (Christian, Alexander D.)
  • OHDSI vaccine vocabulary working group Mid-year review

I was hoping the vaccine vocabulary group could provide this information, if possible :slight_smile:

Knowing the drug was never approved or distributed in the US is also helpful information. Then I know to NOT map the source data to a non-US drug concept_id. I didn’t know and would have never guessed CVX would cover drugs NOT given/approved for use in the US.

I’m always interested in improving custom mappings. And the vaccine domain is one area where I had to take a lot of “best guess” at an appropriate standard concept_id. All help is appreciated :slight_smile:

2 Likes

Agenda for July 28

  • General ingredient proposal discussion (Alexander D.)
  • OHDSI vaccine vocabulary working group Mid-year review (Yupeng Li)

OHDSI Vaccine Vocabulary Solution Pros and Cons

Hi everyone,

In the OHDSI vaccine vocabulary workgroup we have spent several months discussing the issues with the way vaccines are represented in the OMOP vocabulary. In our last meeting Yupeng presented five potential solutions which have been distilled from our prior discussions.

We invite you to take a look at this excel spreadsheet and add pros and cons to each solution.

If you would like to review more details about the solutions you can take a look at the slides here or the prior meeting recording here.

August 11 Meeting Agenda

  • Discuss 5 possible solutions to the issues with the vaccine vocabulary and the pros and cons of each
  • Prioritize the solutions based on open discussion and feedback

We created a spreadsheet in the OHDSI Teams environment with a description of each solution and space for pros and cons. If you would like to add comments or pros/cons to each solution please feel free to edit the spreadsheet in the OHDSI teams environment. The Excel file can be found under the Files tab.

August 11 meeting summary

  • Adam presented solutions gathered from the Vaccine WG.
  • Discussion on managing the process of realizing solutions.
  • OHDSI vaccine workgroup goal: Create a new manually created vaccine ontology in OMOP vocabulary that is generated/supported by vaccine decomposition and formal concept analysis

Agenda for August 25
Discussion of newly created vaccine ontology, managing process and directions

September 8 Workgroup meeting is canceled due to symposium preparation.

Next Vaccine Vocab WG meeting will be September 22.

Please take a look at the Vaccine Vocabulary proposal document in our Teams environment if you have time. Any feedback, questions, comments, etc would be helpful. Thanks!

Sept 22 Meeting agenda

  • Review the Vaccine Vocabulary proposal document and discuss what needs to be added before we present the proposal to the full OHDSI community.

  • Discuss options for dissemination of the proposal (Weekly OHDSI community call?)

New meeting time

Hi everyone. We are moving the meeting to 9-10am Eastern time starting tomorrow (9/22) to accommodate more schedules. My apologies for the late notice.

Currently it looks like we don’t include non-US vaccines in our CVX import like these:

My source here is the CDC:

In fact it would be useful to include all the non-US (maybe the 500 series?) codes in order to support research in LMIC countries that use the WHO COVID-19 Core CRF. The 5July2021 vaccination section of the CRF looks like this:

A prototype a group of us are building for an Africa Open Science Platform uses OMOP. It captures information collected from the CRF using REDCap and/or other tools and ETLs the vaccination information into OMOP. We are just now working on the mapping. We are a bit stuck…

1 Like

It looks like we might have a best practice we can use in a few African countries to begin with. The best practice creates both a PROCEDURE_OCCURRENCE and a DRUG_EXPOSURE.

The concept for the PROCEDURE_OCCURRENCE comes from SNOMED CT:

Administration of vaccine product against Severe acute respiratory syndrome coronavirus 2 (procedure)

This is a hierarchical concept with three children that in turn have children:

The SNOMED CT vaccination procedure does not indicate any actual vaccine products. In the best practice this would be left to the DRUG_EXPOSURE. Here we would identify the vaccine administered with its CVX code. The CVX vocabulary would include both US and non-US vaccines. The administration procedure and the drug administered would enjoy a FACT_RELATIONSHIP.

When this information is gleaned from a registry or retrospectively like in the WHO COVI-19 Core CRF the subject may be reporting not one PROCEDURE_OCCURRENCE but a series of occurrences that together form an intentional unit – e.g., a Sinovac vaccine followed by a Pfizer booster. Sometimes in these intentional units products are mixed and matched governed by availability and/or a specific zeitgeist. If the provenance indicates an “intentional unit”, we are considering creating FACT_RELATIONSHIPs across a reported series of PROCEDURE_OCCURRENCEs.

@JayGee ,

I wouldn’t create a procedure_occurrence record for the administration of the vaccine. It’s not necessary and won’t be used. You can use the drug_exposure.route_concept_id to identify the route for an administration, if you have a use case for intra-muscular drug exposures. The procedure of administering a drug is assumed.

Thanks. How would I capture information about what vaccine was administered in the first dose, the second dose (if applicable) and, finally, what vaccine was used as a booster? Internationally, we can’t assume separate lanes for each vaccine, so from one dose to the next as needed, the vaccine can change.

Create a clinical event record for each dose given. All you need to create a record are the Person, date and drug.

Ask the OHDSI vocabulary team to include the additional CVX codes for the non-US immunizations here. Explain your use case and give a link to your CVX source.

If you’re interested, please join the EHR Working Group where we discuss and give guidance for the ETL of any & all source data to the CDM, discuss use cases not covered by the CDM, discuss technical needs for an ETL, and all things OHDSI/OMOP. You can sign up here

Thanks again.

What type of clinical event are you suggesting? Just the DRUG_EXPOSURE? If so, how do you suggest I code it to indicate a first dose, second dose or a booster? CVX codes don’t include these concepts. They don’t mix up an administrative context (first dose, second dose, booster) with the drug itself. Alternatively, are you thinking I should create a second record like an OBSERVATION to give the DRUG_EXPOSURE additional context.

It would be great if you could make a very little (back of the envelope) example.

@mik, do we have any plans to include the non-US CVX codes in a future import? Thanks, Jay.

Please share your use case so we can help you determine how to best store these data.

Potential ideas:

  1. You can tease this out at analysis time.
  2. Or, if this is very important to have within the row of data, you can add an extension column to the CDM. See my poster here for the risks, benefits and suggestions of extending your CDM for an internal use case.

I use ‘clinical event’ as a generic term for any clinical event within the CDM. Just because a source code comes from a “drug” vocabulary does not mean the data will live in the Drug_Exposure table of the CDM. The domain_id for the standard concept_id will tell you where a source clinical event record will live in the CDM. Here’s my back of envelope/screenshot of a power point slide for data changing domains:

We can safely assume all vaccine administration data will ETL to the Drug Exposure table, but it is best practice and necessary by OHDSI conventions to ETL data to the domains (aka table or field) of the standard concept_id for a row of source data.

Basic SQL for finding the domain_id for a standard concept_id:


The Book of OHDSI has further information on this. And the Files section on the EHR Working Group located on MS Teams also has quite a few helpful links :wink:

My preference is to map the WHO vaccination section into an information model, not grow a domain like DRUG_EXPOSURE with an additional column. In an information model you would combine OMOP clinical events to describe a DRUG_EXPOSURE and its context. That’s why with COVID-19 I was attracted to a SNOMED CT procedure through which one or more vaccines are administered. The procedure varies depending on whether the vaccine uses messenger RNA or not because vaccine administration varies with its type. That is a pretty neat SNOMED CT concept. In the community of practice we are aiming at, I don’t want to suggest we grow the CDM and I want to take every concept at face value. Without these principles, I fear it will be impossible to produce FAIR data content.

Wouldn’t the chronology of the vaccine order be based on the date which they occurred? So the first is the one with no prior, the second one is the one with exactly 1 prior, etc.

If you’re trying to replicate exactly what is in that CRF, that data model does not align with the CDM as of 5.3, but maybe the EPISODE table could be leveraged to represent the sequence of exposures. However, if you get a CRF that has the date of second and third vaccines but not the first, then just using the standard CDM model will not be able to represent the absence of a first exposure but the presence of the others.

1 Like
t