OHDSI Home | Forums | Wiki | Github

Fusion genes mapping approach


(Ana Heredia) #1

Hi!
We are working on modelling a data source on acute lymphoblastic leukemia. Some of the variables in the source refer to the presence/absence of fusion genes analysed by molecular biology. We could find standard concepts within the Measurement domain for some of the anomalies but not for all of them, so we wonder if someone else is dealing with the same issue or what people think about the several possibilities for mapping those for which we could not find a suitable standard concept.
Examples of the concepts we could find are:
3002279 (t(11;19)(q23;p13.3)(MLL,MLLT1) fusion transcript [Presence] in Blood or Tissue by Molecular genetics method)
42868761 (Del(1)(p32p32)(STIL,TAL1) fusion transcript [Presence] in Blood or Tissue by Molecular genetics method)

However, there is in the source also information regarding NUP98/RAP1 and NUP214/ABL1 fusion genes, for which we could not find a suitable concept. Some of the approaches we’ve thought of are as follow. Which would be the preferred one?

  • Option 1: create a custom concept “NUP98, RAP1 fusion transcript [Presence] by molecular biology” within the Measurement domain, use 0 as measurement_concept_id, and 4181412 (Present)/4132135 (Absent) as value_as_concept_id
  • Option 2: create a custom concept “NUP98, RAP1 fusion transcript [Presence] by molecular biology” within the Measurement domain, use a generic concept such as 4233623 (Molecular genetic test) as measurement_concept_id and 4181412 (Present)/4132135 (Absent) as value_as_concept_id
  • Option 3: map it as an observation with Observation_concept_id 4054260 (Nuclear pore alteration), value_as_string “NUP98/RAP1” and 4181412 (Present)/4132135 (Absent) as qualifier_concept_id. Also create a custom concept “NUP98, RAP1 fusion transcript [Presence] by molecular biology”

Many thanks in advance for your help.


(Manlik Kwong) #2

Hi - I’ve been experimenting with importing omics data for canine lymphoma patients from an ongoing study into OMOP. Option 3 makes sense to me. However when I was thinking about this from the end-user perspective, it seem to me having the actual gene variant and different attributes (which chromosome, position, allele frequency, effect, coding, etc) creating a custom vocabulary and concept for our data seem to make more sense to me. Then use observation and measurement tables accordingly and searching discussing results in terms of the name of the variant instead of how the data was generated or sourced.

I’m just starting down this road and learning and experimenting. I’ve linked this dataset to the patient’s OMOP EHR data and loaded about 66,000 data points representing some 1000 different genes from this data set and starting to run queries to see how the approach impacts the construction of SQL and performance on the integrated EHR and omics data.

I’d be interested in seeing what others suggest.


(Christian Reich) #3

Friends:

This is very apropos. The Oncology WG is addressing this very subject as we speak. Please join. We think we have a representation for simple mutations (indels and the like), but we still need the fusion proteins and the translocations.

Generally, we decided to do option 1 and build a concept for each variant that is relevant for oncology. The latter one is a bit of a problem: “relevant for oncology”. Turns out that is harder than thought, because there is no commonly agreed criterion for making that decision. How do you do you guys do that?


t