OHDSI Home | Forums | Wiki | Github

Gene Translocations

I’m dealing withe collecting data regarding if a specimen was tested for various genetic translocations. For example
http://atlasgeneticsoncology.org/Anomalies/t0103ID1002.html

The t(1;3) is not in any of the Athena vocabularies. I have many other examples and it is expanding. Right now we are only capturing if the test was done. So I guess I just map it to Yes and No?

If we start capturing results how might I go about that? I see some biomarker/ genomics work seems to be happening but slowly.

We need to have some vocabulary for this genetic stuff.
Any ideas which we can use?

I’m wondering what concept do you use now to encode if the enetic translocation test was done?

You can blame on my laziness, @jliddil1 .
It’s time to wake up from hibernation for genomic WG along with oncology WG.

We proposed genomic extension model for OMOP-CDM and its applications, which is published in JMIR.

I ask for making official OHDSI github for further discussion, summarization of requirements, and formal materialization of the genomic extension model. Could you do this, @mvanzandt, @Dymshyts or @Christian_Reich ?

Lazy. That is certainly the first attribute that comes to mind when thinking of you, @SCYou. :slight_smile:

What do you want me to call it?

1 Like

Genomic-CDM would be great, just like Radiology-CDM repo, Thank you!

Specifically what do you mean? I’ve been googling the genetic tests and biomarkers. I like the mCode approach. http://standardhealthrecord.org/guides/mcode/.

To @jliddil1,

Although not all translocation mutations have 'concept_id’s,
currently you can store the translocation with Yes or No using measurement table like other SNPs.

The contents below are about the various methods and discussion to describe the translocation.

Translocation as “t(A;B)(p1;q2)”

The International System for Human Cytogenetic Nomenclature (ISCN) is used to denote a translocation between chromosomes. The designation t(A;B)(p1;q2) is used to denote a translocation between chromosome A and chromosome B. The information in the second set of parentheses, when given, gives the precise location within the chromosome for chromosomes A and B respectively—with p indicating the short arm of the chromosome, q indicating the long arm, and the numbers after p or q refers to regions, bands and subbands seen when staining the chromosome with a staining dye. The translocation is the mechanism that can cause a gene to move from one linkage group to another.
(Reference: Chromosomal translocation - Wikipedia)

HGVS Nomenclature

The basics of the current HGVS recommendations are explained in a slide presentation from Johan den Dunnen (last updated Apr.2017).
(Reference: http://www.hgvs.org/varnomen/HGVS-basics2017.pdf)

Translocation in HGVS Nomenclature

There was a discussion in HGVS2013 about the method to describe translocations.

“Describing translocations by extending HGVS sequence variation nomenclature”

Translocation in Genomic-CDM

Although it’s not an official way, the translocation can be stored using a Variant_occurrence table of the Genomic-CDM.

↓ Example of storing a translocation data

So I have been using yes/no for if the test shows the translocation. Then there are translocations such as t(11;16) that aren’t in any of the current Athena dictionaries. What approach do I take for these instances. In the Immuno-oncology space the number of biomarkers/ genetic tests is expanding faster than things like LOINC can keep up with.

JD

You are right.

In the era of Next Generation Sequencing (NGS) rather than Sanger Sequencing, ‘variant’ is hard to be described with ‘concept_id’. Because the lesser-known variants are much more than the famous variants.

Basically, the ‘variant’ is a myriad of combinations of 1) POSITIONs in DNA where sequence changes occur and 2) NUCLEOTIDEs(A,T,G,C) before and after modification.

It would be better to separate and record ‘gene’ and ‘change’ information than to define the infinite number of ‘gene + change’ cases as concept_id.

About 40,000 human gene lists can be standardized through the HUGO Gene Nomenclature Committee (HGNC).

The ‘change’ information is to create a rule that records in string data type according to the Human Genome Variation Society (HGVS) nomenclature.

At present, many translocation variants will not have concept_id in Athena.

All you can do is request a new concept_id or temporarily assign your own concept_id.

Alternatively, your data can be stored using the Genomic-CDM if you need although it’s not an official way yet. :wink:

Let me read all this and attempt to get my head around it. I need to see how to implement the Genomic CDM and Oncology internally in our systems. We are moving to a semantic analytic tool that we could utilize this in potentially.

I meant something like this actually:

So, we can ingest this 40000 genes and ‘changes’ into the OHDSI vocabulary.

Okay, So my genetics training is weak at best. But let me try to use some examples:
So currently LOINC has a code for:
t(6;9).
We only ask if a test for t(6;9) is done and the results (pos/neg)

Concept Name: t(6;9)(p22;q34)(DEK,NUP214) fusion transcript [Presence] in Blood or Tissue by Molecular genetics method

Concept Code: 394804000, Concept ID: 3008670,
http://atlasgeneticsoncology.org/Anomalies/t0609ID1014.html

So how would this look in the proposed schema? What do you use as a concept Code and ID?

For t(8;22) there is not a LOINC or other code
http://atlasgeneticsoncology.org/Anomalies/t0822p11q11ID1224.html

I assume the gene is this:
https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:17012

So for t(6;9)(p22;q34)(DEK,NUP214)

The gene is DEK,NUP214 if I read things correctly

The change is t(6;9)(p22,q34)

The thing I am now realizing is that for example we only ask was a t(6;9) test done. Not the specific gene information.

You got it: GitHub - OHDSI/Genomic-CDM: Repository for development of the genomic module of the CDM..

This should go into the CDM definition in the new repository. The problem is this: There are some well-known chromosomal aberations, and LOINC has a good list. And there is an infinite amount of possible aberrations, for which such a notation would be useful. And there is a gray zone in the middle, aberrations that appear non-nonsense and are under research. The CDM needs to deal with these.

Because the involved genes may be irrelevant or not defined. Often, these translocation create a specific aberation that fuses two independent genes, or combines a transcription factor binding site (promotor or repressor region) with a gene for which it was not designed. But again, in many cases these translocations are not precisely defined, and become disease attributes in their own right.

Does that help? :slight_smile:

We do have to figure that out in the above mention Genomic WG. There is no shortcut here, @jliddil1.

t