Dear @Christian_Reich , @LawrenceArcher and all,
I am continuing the topic on this question because I think my questions are mostly relevant to the SNP data handling in OMOP CDM.
Firstly, please let me briefly describe the type of data we are working on. We are working on a healthy birth cohort where the objective is to study how conditions in pregnancy and early childhood influence the subsequent health and development of women and children. The subjects in the cohort consist of mother and child. We have the demographics profile, biomarkers profile, the maternal and child metabolic and body composition, the sleep pattern, life event and social relationship, paternal factors, imaging and omics data among others. The omics data include genome, transcriptome, lipidome, proteome, etc.
In the cohort, some of the subjects (women) developed GDM (gestational diabetes mellitus) and we have also observed some Type 2 Diabetes patients in the follow-up. Therefore, we are interested not only in the variants that are linked to the GDM, but also the risk factors contributing towards postpartum Type 2 Diabetes development.
We have already put some effort into mapping the demographics profile, clinical measurement as well as other observations to the OMOP CDM. Additionally, we also wanted to map the omics data to the OMOP CDM. In this effort, we start our journey with mapping the genomic data first.
To give you an idea, our genomic data consists of not only array genotyping data, but also WGS (whole genome sequencing) data for some of the subjects. So, as you can see, we are actually getting the germline variants of the subjects. The question is then how we are going to map these germline variants to the OMOP CDM (particularly based on OMOP Genomic vocabulary). You may already have an idea of the difficulties we are facing as what we have described in the previous meeting in Dec 2023 if you still remember. We are happy to see the latest release of the OMOP Genomic vocabulary which resolved some of the issues that we faced in the previous issue.
So far, we have established a preliminary workflow on mapping our genomic data to OMOP Genomic together with non-standard genomic vocabulary (which we created internally). We shared this exercise (experience) with the OHDSI APAC community in the 4th April meeting. After getting some feedback, we reevaluated our approach and hence come up with some questions that we would like to bring up.
I hope the above description gives you an overview of what we have, and the below questions (as well as points of discussion) will revolve around it. I am still learning, and I could be wrong in certain aspects, so please correct me if that’s the case. Thanks so much.
Question 1: Do you foresee that we can use the OMOP Genomic for germline mutation as well? If yes, may I know what is the best approach to record the homozygous REF allele? If not, is there any plan to distinguish the germline mutation from somatic mutation? Recently, I have realized that in LOINC vocabulary there are concepts that could be used to record the different genotypes (in the field value_as_concept_id). For example: concept id 36660258, which can take the answers of “A/A (homozygous)”, “G/A (heterozygous)” or “G/G (wild type)”. I am wondering if a similar approach can be used for recording of germline mutation using OMOP Genomic vocabulary. Any advice is highly appreciated.
Question 2: Related to 1st question, I have a question regarding the approach for “Result recording” as described in this document ([Genomic Variants in the OMOP CDM – August 8, 2023](https://ohdsiorg.sharepoint.com/:b:/r/sites/Workgroup-Oncology/Shared Documents/Oncology -Omics Subgroup/Genomic WG Goals and Status Reports/Genomic Variants in the OMOP CDM.pdf?csf=1&web=1&e=uXrzr2)). It is recommended that for “Result recording”, you MUST fill in the field value_as_concept_id as “Positive (concept_id=9191)”, “Negative (concept_id=9189)” or “Equivocal (concept_id=4172976)”. May I know how this is defined? What is the meaning of “Positive” and “Negative” in this case? Is it “positive” with mutation? Or “positive” in the case of phenotype or drug effect observation?
Point of Discussion 1: I watched the video recording ([2024 Onc WG Genomic Meeting Series_4th Tuesday-20240326_090842-Meeting Recording](https://ohdsiorg.sharepoint.com/:v:/r/sites/Workgroup-Oncology/Shared Documents/Oncology -Omics Subgroup/Recordings/2024 Onc WG Genomic Meeting Series_4th Tuesday-20240326_090842-Meeting Recording.mp4?csf=1&web=1&e=wI6TNI)) with respect to VRS. We are glad to know that there is an ongoing discussion on this topic. In fact, this topic is currently in discussion within our local community on representing genomic data in GA4GH VRS schema. Is there any plan to integrate VRS in KOIOS (@LawrenceArcher) for mapping to OMOP Genomic vocabulary? I think this will be very helpful in further strengthening the capability of KOIOS for mapping to OMOP Genomic standard concept.
Point of Discussion 2: From the document ([Genomic Variants in the OMOP CDM – August 8, 2023](https://ohdsiorg.sharepoint.com/:b:/r/sites/Workgroup-Oncology/Shared Documents/Oncology -Omics Subgroup/Genomic WG Goals and Status Reports/Genomic Variants in the OMOP CDM.pdf?csf=1&web=1&e=uXrzr2)), I have learned that the desired variants to be included are “Clinically relevant variants (driver mutations, frequent correlates, mutations relevant for drug effect)”; which is great. We are thinking if the OMOP Genomic vocabulary can be extended to include the list from ACMG SF v3.2 list (PMID: 37347242). Not sure if anyone has mentioned this before, but we think that it is a good source of list as it consists of 81 genes which have been recommended as the minimum list of gene-phenotype pairs for opportunistic screening to facilitate identification and/or management of risks for selected genetic disorders through established interventions aimed at preventing or significantly reducing morbidity and mortality (PMID: 27854360). Based on our current observation, there could be approximately 40,000 additional risk variants to be included in the OMOP Genomic vocabulary.
Sorry for the lengthy write-up and questions.
We would be happy to have either (1) meeting for discussion; or simply (2) communication through this forum to proceed further.
Thanks very much for your input, support and advice.
Best regards,
Erwin