OHDSI Home | Forums | Wiki | Github

Genetic Data?

Can anyone share an example of CDM documentation where they are pulling across genetic data. We are about to get some data here but I have zero experience with this data (aka never seen it first hand) and wanted to see some examples of how to handle it.

@rimma could you share anything you’ve done?

Additionally I know there many people who have been discussing this or have interest in this topic (e.g. @rimma and @ZPGoldman). I’m probably not the right person to spearhead this conversation (being that I don’t even actually have the data myself) but I wanted to at least tease up the idea of a group getting together to discuss.

Tagging @mvspeybr & @Christian_Reich

@ericaVoss:

The data aren’t that hard, really, as far as I can tell:

You got genetic abberations: Deletions, Inserts, with and without frameshift, duplications or higher copy numbers, translocations on the DNA level. All this is at a location, which is within a gene, wich is coding or not. You have expression on the mRNA level. All those have a likelihood, but usally they get called: a threshold gets applied, turning into a binary yes or no (at which point the abberation disappears from the data.

That’s about it.

The vocabulary has to know about the genes the abberation or variant is either in, or is affecting. And, if it is good, also the pathway or network information. But that’s a nice to have.

There is a standard to report the DNA-level abberations: VCF. There are also standards to report the actual sequence and mRNA-level changes, which are much simpler.

What do you exactly need?

@Christian_Reich - well I don’t really know what we need. We know we are getting genetic data and that is about it. I’m just looking for who I can poke when we have questions. :smile:

Erica,

If you find that drug-gene associations are relevant, the PharmGKB hosts
several downloadable files that associate various alleles with
pharmacokinetic or pharmacodynamic effects. I have some experience with
these. Also, some collaboration has occurred between one of the major
genomic research networks (eMERGE) and OHDSI - I would ping George
Hripcsak to learn more about the current status and what tools came out
of it (e.g.,


and
http://www.ohdsi.org/web/wiki/lib/exe/fetch.php?media=resources:ohdsi_poster_levine_2015.pdf.)

best,
-R

Thanks, Rich. So far we have been using OHDSI’s clinical standardization to improve sharing of clinical phenotypes within eMERGE, but we haven’t placed genomic data into the model at this point.

George

You mean peek? :smile:

1 Like

NIH BTRIS warehouse has data on participants in the ClinSeq study. So we have some experience with designing a Cognos prompt page for a genomic report, having a data model, and responding to users of the query page.

We only planned to put yes/no flag to the OMOP CDM structure.

I requested SNOMED CT procedural code for ‘exome sequencing’ (so that I can record yes/no flag) in procedures_occurence. They rejected the request and deferred me to request it from LOINC. :frowning:

(so the LOINC request is filed and pending getting into LOINC. SNOMED CT has nice concept request tracking with link, but LOINC does not have a similar feature))

See the details here:
https://uscrs.nlm.nih.gov/request/newconcept/view.xhtml?id=271026&returnUrl=%2Frequest%2FsearchRequests.xhtml (link may require UMLS login)

old link to video demo of our query prompt is in this 2 year old post now

NantHealth is currently working on capturing genetic data in the OHDSI common data model (CDM). An overview of specifics can be found here http://www.gpscancer.com/.

We will be covering full set of genetic information produced by GPS Cancer tests over several iterations, starting with quantitative targeted proteomics and gene mutations analysis.

Our immediate interests in collaboration with the OHDSI community are focused on expanding vocabularies to support genetic data and developing conventions for capturing genetic data in the CDM. We are conducting consultations with our legal department to determine the extent of information we can share.

@ZPGoldman, is there a specification available for your data model? I’m trying to aggregate all the proposed models. Also, I encourage you to join the workgroup that Claire is starting: Genomic Data in the CDM

t