Through discussions with the CDM Genomics Working Group we realized that instead of trying to capture every possible data point related to a patient’s genome in the CDM it may be better to start with a research question or analytic use case that would benefit from the addition of genomic covariates. This will help us focus on getting the most important pieces of information into the data model first. Along those same lines we also want to understand who among us in the community actually have genomic data readily available to be mapped. Again, we want to target our efforts towards creating a solution that will benefit the most collaborators and knowing what data is currently out there will help us do that. With that being said, please respond to this post with either:
A research question or analytic use case that requires genomic data
A data source that you own or have access to with genomic data you would like converted to the Common Data Model
We will discuss these at our next genomics meeting on March 20th; all details can be found here.
For a analytic use case, we can learn from other collaborative consortium outside OHDSI.
One of the biggest example would be GENIE project of AACR (American Association for Cancer
Research). GENIE is an international data-sharing consortium focused on generating an evidence base for precision cancer medicine by integrating clinical-grade cancer genomic data with clinical outcome data for tens of thousands of cancer patients treated at multiple institutions worldwide. Here’s an examples of clinical utility of GENIE data.
We have targeted NGS data from cancer patients, which was converted into prototype of genomic CDM.
We are using OMOP as the CDM for data standardization as part of the Healthcare Alliance for Resourceful Medicine Offensive against Neoplasms in Hematology (HARMONY), a pan-European IMI project aiming to collect, share and harmonize Big Data from multidisciplinary sources, including clinical and molecular data, in order to:
• Enable identification of novel pathways for drug development.
• Facilitate drug development pipelines and accelerate the “bench-to-bedside” process in drug development.
• Empower clinicians, policy makers and payers to improve decision-making and optimize care for patients with HM’s.
In the context of hematologic malignancies, but extendable to other conditions, some more specific use cases could be:
• Determine the genomic differences between patients belonging to different age groups and their potential impact on the disease course
• Improved molecular characterization for refining current disease classifications and prognostic systems
• Definition of genetic markers and potential novel therapeutic strategies
• Identification of clinical and molecular markers for treatment response, outcome after relapse, potential toxicity…
• Explore the potential value of the variant allele frequency as measurement of minimal residual disease
So far we also have somatic variants obtained through targeted NGS, which fit the current G-CDM prototype (although not converted yet).
Hi everyone. Some genetics/genomics ideas that may be possible using claims data and might be of interest to the group:
what subset of subjects with a BRCA test have a value for their result and what can you say about those that have a test and a result and those that just have a test
for those with a BRCA test and a value does the evidence show that they go on to develop or have worse prognosis than those with just a test in breast, prostate, and ovarian cancer cohorts? What does the course of care look for those with a BRCA test and positive value and those without a value?
what is the prevalence of prophylactic surgery in people with positive BRCA1 or BRCA2 mutations
what is the prevalence of presymptomatic predictive testing for Hungtington’s disease
Thanks to @clairblacketer for posting this thread, I’m looking forward to an interesting call on the 20th!
@aahc
We’re planning to make an ETL tool for G-CDM, which would be similar to WhiteRabbit and Rabbit-In-A-Hat.
It can help you to convert your data into G-CDM
In the cancer world, anti-EGFR therapies and KRAS-BRAF-MAPK signal pathway has always been an issue. We propose to evaluate co-mutations and anti-EGFR therapies in KRAS wild type colorectal cancer patients. Almost every colorectal cancer stage IV patient receives a panel including at least KRAS/BRAF/NRAS; so we think that this will be a very generalizable question for the whole community regardless of the panel type each institute uses, which is often a hurdle for these evaluations.
For our model, the genomic data nomenclature we adopted: the systematized Nomenclature of Medicine, Clinical Terms, for diseases and qualifier values; the Human Genome Variation Society mut-nomen syntax, for mutation names and locations; the Consensus Coding Sequence representation, for genomic regions; and the Human Genome Organization Gene Nomenclature Committee (HGNC a.k.a. HUGO) symbols and identifiers, for common gene names. This is the same as SMART on FIHR, which is expanding its landscape.
Great @Vojtech_Huser
My database only contains targeted NGS and focuses on the somatic mutation in cancer patients.
The whole exome sequencing data in general population can add diversity to our database pool and discussion.
Also, it would be very useful to investigate how many variants or genetic notation we need to standardize as @Christian_Reich suggested yesterday.
The paper published from NEJM yesterday can be another use-case and exemplary study, which integrated EHR and genomic data.
They used exome sequence data and electronic health records from 46,544 participants in the DiscovEHR human genetics study to identify genetic variants associated with serum levels of alanine aminotransferase (ALT) and aspartate aminotransferase (AST).
By this, they found that a loss-of-function variant in HSD17B13 was associated with a reduced risk of chronic liver disease and of progression from steatosis to steatohepatitis.
Current G-CDM model support to conduct this study.
Wonderful, @Vojtech_Huser
Could we find the ‘targeted NGS’ in LOINC? or we need to propose again like you did?
As you can see in wiki or the specification document, current G-CDM can capture the when the sequencing was done with which sequencing machine and sequencing pipelines and the quality score of sequencing or specimen (in the ‘sequencing’ table).
@SCYou and @clairblacketer , Are there any future plans to collaborate with GENIE project for integrating clinical-grade cancer genomic data into OMOP dictionary? I think, this will be very useful for cancer research studies.
We have discussed this as one of the goals for the Oncology Genomic WG. Would be great to hear your thoughts/proposal on how we can initiate this as a part of the WG ongoing activities.