OHDSI Home | Forums | Wiki | Github

2024 Oncology standards maturity effort - Join the journey!

The Oncology working group needs your help!

We hope to once again harness the collaborative spirit of the OHDSI community to drive innovation and catalyze significant advancements in cancer collaboration and research.


There is a wealth of emerging and exciting oncology projects that aim to leverage OMOP to facilitate novel and impactful research. The challenge is that the breadth and depth of these multimodal data are significantly larger in scope than our current standard conventions cover.

OMOP is not alone in this regard as a comprehensive and widely adopted global standard for open oncology data does not exist. Oncology data is complex, exceptionally intertwined, often goes beyond the ‘person-centric’ model, and is constantly evolving.

Building off of our previous work, we aim to mature the oncology standards in OMOP - notably around enabling more types of data sources (e.g. EMRs, registries, etc.) and international adoption.

Our approach

For more details on below please see the Development Overview portion of the docs.

A global standard cannot be static, it requires a community behind it to evolve and expand over time. We believe OHDSI can continue to build and sustain that momentum to be that community.

To achieve international and source-agnostic interoperability, specifically the harmonization of diverse data representations, a diverse group stakeholders, data sources and contributors is required.

To those ends, we have adapted our processes to lower the barriers for contribution while also emphasizing transparency and novel means for asynchronous contributions.

Notably, we are moving at a faster pace than the official OHDSI vocabulary releases and consequently will be maintaining a “delta”, or “development”, version of the vocabularies. At the end of this effort we plan to fold these improvements back into the common OHDSI standards. Additionally, we will consult with the vocabulary team when applicable.

The stage has been set

Greater details on below can be found in the Github Project portion of the docs

There has substantial work completed to make this effort as transparent and accessible as possible:

  • An extensive and international outreach effort has been conducted to aggregate the gaps and pain points of implementing oncology data in OMOP, which is the starting point of this project. All of that feedback has been ingested and organized within a Github Project.
  • Given the complexity of the Github project, we have created extensive documentation to provide greater understanding and ability to leverage it.
  • The tasks have been broken down into smaller more easily tackled chunks such as “investigating an issue” or “complete outstanding vocabulary changes that have content provided”, or “documentation of _”. The intent is to enable many small, and often asynchronous, contributions rather than singular large bodies of work. Additionally, the “task groups” are templated, requiring all the same steps to completed for each one. The point here was to force the completion of documentation and validation for all new established conventions.
  • The plan is to complete as much as we can, prioritized by use cases, in preparation for a new stable release. After that milestone is reached, we will continue to iterate and improve while adhering to a stable release schedule.

Scope of work

  • The majority of the outlined work falls into one of three buckets:
    1. Collaboratively deciding on conventions; Creating proposals
    2. Investigating and/or Modifying the vocabularies
    3. Creating documentation and/or validation

What we need most

  • Diverse community feedback
    • Feedback on decision points - Will this solution work for your data?
    • Any experienced issues, hurdles or ambiguities
    • Use cases, studies, ambitions
    • Source data investigations
  • Community contributions
    • Vocabulary:
      • Investigations - e.g., are there duplicate standard concepts for laterality?
      • Modifications - e.g., provide the modifications to de-standardize duplicate laterality concepts
    • Documentation
      • Identifying gaps
      • Populating content

Getting Involved

Please see the Getting Involved section of the docs.

Next steps

  1. Starting next month, this effort will be the main focus of the “Oncology WG - Dev/Vocab” subgroup meetings. We will start each meeting in a ‘scrum-like’ way and use the remaining time to discuss proposals. Given how dispersed our community is, attendance at meetings is not a requirement for contributing.

  2. We will use this forum thread, duplicated in teams chat, to send out future announcements, most notably for announcing new discussion items going “under review”

Lastly, here’s a piece of symbolism intended to inspire (and pander to the nerds in the audience):


Quick update:

Due to conflicts, we’re moving the meeting to give an orientation of the project and a chance to ask questions to a new time → Jan 24th 11AM EST - Meeting link

If you are interested and can’t make it, no worries, the recording will be provided afterwards.

@agolozar @rtmill
Do we have an agreed upon OMOP convention for storing molecular mutation data?

@PriyaDesai- Here is a very very short answer:

Similar to other cancer attributes, somatic variants are recorded as MEASUREMENT and the concepts are defined through the OMOP Genomic Vocabulary. The Genomic vocab has five classes: Gene variants, Gene DNA variant, Gene RNA variant, Gene Protein variant, and Structural variant.

Few rules:

  1. It is important to reflect the modality with which the variant was measured during mapping. For example, if the measurement happened at the chromosome level, you should map the variant to Gene DNA variant. When modality is not determined and all you know is that there is a change in the gene, you should map to gene.
  2. Only variants that can be mapped should be recorded. If no mapping can be obtained no record with measurement_concept_id=0 is needed. This is because there are many irrelevant variants in each cell, and most of them carry no consequence. Recording them would overwhelm the database with irrelevant information.
  3. Similar to metastasis, you should record the results in the value_as_concept_id field.

We have a tool called KOIOS that maps data in the VCF or HGVS format to OMOP concepts. You might want to consider using that if you have VCF data.

We are in the process of updating the documentation to reflect the new changes in the Genomic Vocab which will be available in the near future. In the meanwhile, we can jump into the details and go over your questions during one of the Oncology -Omics subgroup call.