Agenda: OMOP-CDM Extension for NGS Data
First of all, I appreciate for all your interest and discussion to support the genomic CDM Workgroup.
These days we are thinking basic concepts; why we need (do we need) a table extension for genomic data; which elements of genomic data has to be stored in OMOP-CDM; how the genomic CDM can be used in clinical practice.
Why is genetic material important in the clinical practice?
- The patient’s genomic data are used as an indicator for determining the cancer stage and anticancer treatment according to the NCCN guidelines (Figure 1).
Figure 1. Genomic alteration data used in clinical decision
The way to store the genomic data in current OMOP-CDM
-
Currently, OMOP-CDM is a structure that stores only a few well-known mutations.
In the Measurement/Observation table, the combination of [gene name + sequential variant + specimen + inspection method + exon number + variant type] information is created in one set and has a concept ID (Figure 2).
Figure 2. How to present a variant in the current OMOP-CDM -
This is an efficient way to store the result of only a few typical mutations, such as a single gene or dozens/hundreds of specific mutations, to express whether or not those mutations are present.
Why does OMOP-CDM need to be expanded?
Reason 1: The number of variant subject to sequencing has increased exponentially.
Figure 3. Types of next generation sequencing (NGS)
*Image source: NGS 검사: Whole Genome & Exome, Targeted Sequencing 비교 – 두마디 정밀의료
-
For the recently generalized Targeted Next Generation Sequencing (Targeted-NGS) technology, the number of genes to be examined is among tens and hundreds, and each gene has its own area for testing.
-
However, countless variations can occur at a sequence (A), such as deletions (A to -), substitution (A to T / C / G), and insertions (A to AT / ATG / ATGC / ATGCC / ATGCCTTACGGAT and so on……).
-
It is impossible to make the number of all these cases into one set, as it is now.
-
In particular, Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS) technologies examine entire exome and genome sequences, respectively, and are being used to find patients who can be treated with certain drugs as it has been found that Tumor Mutation Burden (TMB)* tests in immune diseases are related to predicting Immune checkpoint inhibitor (ICI) treatment effects (Figure 4). (DOI: 10.1200/JCO.2017.75.3384 Journal of Clinical Oncology 36, no. 7 (March 1 2018) 633-641.)
Figure 4. TMB as an emerging biomarker in NCCN Guidelines Version 1.2019 Non-Small Cell Lung Cancer
*TMB (Tumor Mutation Burden): TMB is the calculation of the number of mutations in cancer cells.
The higher the TMB, the higher the response to immunosuppression drugs is shown.
- Because WES and WGS technologies examine the entire exome and genome sequence, it is inefficient to create all the variation concepts as they are today, as with targeted-NGS.|
Reason 2. Non-variant information is also required to interpret NGS testing.
-
In addition to the current measurement/observation table, there are features that should be taken into account when comparing multi-center data of NGS testing.
-
This is because the method of the NGS testing is not standardized like other clinical lab test tests.
-
They should be recorded/standardized together to interpret NGS testing between institutions to further classify the patient’s genetic variation and cancer condition and to identify the outcome of clinical care.
-
Currently, most of these data may be retrieved from pathology reports or from EMR, even though it’s very difficult, but in CDM, there is no way to obtain the well refined NGS results.
> Non-vairant features to be recorded/standardized |
-
Sequencing platform (device & software) information
: Information about the name and version of the platform using sequencing assigned by the institution. -
Reference genome
: Prior knowledge that is used in aligning the leads to recognize changes in sequence. Because the reference genome acts as a comparison criterion, it is important to confirm whether the reference genome is identical or not when comparing variants between institutions. -
Read depth (variant & total)
: Reads are the thousands of pieces of nucleic acid that you analyze when you run an NGS. NGS amplifies the reads and sequencing them over and over again, supplementing slightly incorrect parts to increase the accuracy. Read depth is the number of times a single location has been read during sequencing. Bigger read depth means better accuracy. You should exclude variants with read depth under threshold during analysis for increasing accuracy of comparison. -
Genotype
: Whether the variant comes from somatic or germline -
Annotation information
: The clinical impact of the variant
Things you can do with Genomic-CDM (Use cases)
-
Identifying patients for therapies
: By using variants data resulted from NGS, you can figure out how many patients are adequate for immunotherapy (ex. Nivolumab) in each centers by calculating tumor mutational burden (TMB) from the result data of whole exome sequencing (WES). -
Selecting patients for clinical trials
: By using variants data and linked clinical data, you can find and encourage patients to involve certain clinical trials (ex. NCT02296125) who have specific mutation profiles and treatment history. -
Finding genes/variants related to outcomes
: By using variants characteristic and linked drug/treatment exposure data, you can discover genomic characteristic of patient group having a poor reaction to drugs.
- You can also use non-variant information of the genomic-CDM in…
- confirming the comparability of the genomic data between institutions by using information of sequencing platform, reference genome, genotype.
- filtering the genomic data by quality by using information of read depth of variant and total target read.
The meeting of the genomic CDM working group has been suspended some while, but we will hold the meeting again soon. We look forward to your participation.