OHDSI Home | Forums | Wiki | Github

Call for Genomic Question and/or Data

Through discussions with the CDM Genomics Working Group we realized that instead of trying to capture every possible data point related to a patient’s genome in the CDM it may be better to start with a research question or analytic use case that would benefit from the addition of genomic covariates. This will help us focus on getting the most important pieces of information into the data model first. Along those same lines we also want to understand who among us in the community actually have genomic data readily available to be mapped. Again, we want to target our efforts towards creating a solution that will benefit the most collaborators and knowing what data is currently out there will help us do that. With that being said, please respond to this post with either:

  1. A research question or analytic use case that requires genomic data
  2. A data source that you own or have access to with genomic data you would like converted to the Common Data Model

We will discuss these at our next genomics meeting on March 20th; all details can be found here.

Good point, thank you @clairblacketer

  1. For a analytic use case, we can learn from other collaborative consortium outside OHDSI.
    One of the biggest example would be GENIE project of AACR (American Association for Cancer
    Research). GENIE is an international data-sharing consortium focused on generating an evidence base for precision cancer medicine by integrating clinical-grade cancer genomic data with clinical outcome data for tens of thousands of cancer patients treated at multiple institutions worldwide.
    Here’s an examples of clinical utility of GENIE data.

  2. We have targeted NGS data from cancer patients, which was converted into prototype of genomic CDM.

We are using OMOP as the CDM for data standardization as part of the Healthcare Alliance for Resourceful Medicine Offensive against Neoplasms in Hematology (HARMONY), a pan-European IMI project aiming to collect, share and harmonize Big Data from multidisciplinary sources, including clinical and molecular data, in order to:
• Enable identification of novel pathways for drug development.
• Facilitate drug development pipelines and accelerate the “bench-to-bedside” process in drug development.
• Empower clinicians, policy makers and payers to improve decision-making and optimize care for patients with HM’s.

In the context of hematologic malignancies, but extendable to other conditions, some more specific use cases could be:
• Determine the genomic differences between patients belonging to different age groups and their potential impact on the disease course
• Improved molecular characterization for refining current disease classifications and prognostic systems
• Definition of genetic markers and potential novel therapeutic strategies
• Identification of clinical and molecular markers for treatment response, outcome after relapse, potential toxicity…
• Explore the potential value of the variant allele frequency as measurement of minimal residual disease

So far we also have somatic variants obtained through targeted NGS, which fit the current G-CDM prototype (although not converted yet).

I hope this is helpful.

1 Like

Hi everyone. Some genetics/genomics ideas that may be possible using claims data and might be of interest to the group:

  1. what subset of subjects with a BRCA test have a value for their result and what can you say about those that have a test and a result and those that just have a test
  2. for those with a BRCA test and a value does the evidence show that they go on to develop or have worse prognosis than those with just a test in breast, prostate, and ovarian cancer cohorts? What does the course of care look for those with a BRCA test and positive value and those without a value?
  3. what is the prevalence of prophylactic surgery in people with positive BRCA1 or BRCA2 mutations
  4. what is the prevalence of presymptomatic predictive testing for Hungtington’s disease

Thanks to @clairblacketer for posting this thread, I’m looking forward to an interesting call on the 20th!

We’re planning to make an ETL tool for G-CDM, which would be similar to WhiteRabbit and Rabbit-In-A-Hat.
It can help you to convert your data into G-CDM :grinning:

In the cancer world, anti-EGFR therapies and KRAS-BRAF-MAPK signal pathway has always been an issue. We propose to evaluate co-mutations and anti-EGFR therapies in KRAS wild type colorectal cancer patients. Almost every colorectal cancer stage IV patient receives a panel including at least KRAS/BRAF/NRAS; so we think that this will be a very generalizable question for the whole community regardless of the panel type each institute uses, which is often a hurdle for these evaluations.

For our model, the genomic data nomenclature we adopted: the systematized Nomenclature of Medicine, Clinical Terms, for diseases and qualifier values; the Human Genome Variation Society mut-nomen syntax, for mutation names and locations; the Consensus Coding Sequence representation, for genomic regions; and the Human Genome Organization Gene Nomenclature Committee (HGNC a.k.a. HUGO) symbols and identifiers, for common gene names. This is the same as SMART on FIHR, which is expanding its landscape.

We have whole exome sequencing data from this study here: https://clinicaltrials.gov/ct2/show/NCT00410241

Possible question is: What dose of simvastatin should I take given my SLCO1B1 gene.


Great @Vojtech_Huser
My database only contains targeted NGS and focuses on the somatic mutation in cancer patients.
The whole exome sequencing data in general population can add diversity to our database pool and discussion.

Also, it would be very useful to investigate how many variants or genetic notation we need to standardize as @Christian_Reich suggested yesterday.

The paper published from NEJM yesterday can be another use-case and exemplary study, which integrated EHR and genomic data.
They used exome sequence data and electronic health records from 46,544 participants in the DiscovEHR human genetics study to identify genetic variants associated with serum levels of alanine aminotransferase (ALT) and aspartate aminotransferase (AST).
By this, they found that a loss-of-function variant in HSD17B13 was associated with a reduced risk of chronic liver disease and of progression from steatosis to steatohepatitis.

Current G-CDM model support to conduct this study.

We focused on starting from VCF file and making a database version of it. (no BAM files).

The problem is with phased data (or unphased).
Also capturing what kit was used to prepare the sample and what was the sequencing machine.

In genomics, sometimes old data is not re-used and, instead, a new sequencing is done on the sample.

I also was trying for 8+ months to have a code to indicate somehow presence of exome data in MEASUREMENT table.

(just a flag and detailed data to expect elsewhere)

I am pleased to be behind (the requestor for) this LOINC code:

btw - SNOMED CT international rejected my new concept request for such code and “referred me” to LOINC.

Wonderful, @Vojtech_Huser
Could we find the ‘targeted NGS’ in LOINC? or we need to propose again like you did?

As you can see in wiki or the specification document, current G-CDM can capture the when the sequencing was done with which sequencing machine and sequencing pipelines and the quality score of sequencing or specimen (in the ‘sequencing’ table).

@SCYou and @clairblacketer , Are there any future plans to collaborate with GENIE project for integrating clinical-grade cancer genomic data into OMOP dictionary? I think, this will be very useful for cancer research studies.


Hello @priagopal , How can we collaborate with GENIE project?

Hi @priagopal,

We have discussed this as one of the goals for the Oncology Genomic WG. Would be great to hear your thoughts/proposal on how we can initiate this as a part of the WG ongoing activities.