OHDSI Home | Forums | Wiki | Github

How to incorporate NanoString data into the CDM using HGNC

We want to incorporate NanoString data into the CDM and wonder if genes in the HGNC vocabulary can be used for this.

NanoString outputs mRNA gene expressions based upon analyses of blood and tissue (FFPE samples) (so a column with genes and a column of gene expressions). Mutations, alternations or deletions are not analyzed and thus the sequence used for the mRNA analysis is of less importance, and does not need to be captured. Gene expressions are normalized based upon standard genes in the assays, so results of different analyses can always be compared.
So the data we need to incorporate is “just” the mRNA gene expressions and since HGNC genes are in the measurements domain, we wondered if we could use these standard concepts for our data, by “just” inputting the gene expressions.
This approach will of course omit the source of the test (blood or tissue), but for starters this is OK.

Hope you can help :slight_smile:

@Mikailgo

Question: And the second column essentially is a real number? How would you analyze this stuff? Give me all patients where EGFR is above 20 (the normalized expression score)?

Yes, the output is a real number.
That could be one approach. Others have developed subtypes of cancer based upon gene expression of specific genes, but I guess this should be made outside of the CDM.
Recently, some authors (PMID:33059196) have used a machine learning approach to discover which genes contributed the most to certain outcomes. I think this will be one of the approaches we will use, so using both mRNA gene expressions and historical health data to predict. eg. risk of recurrence, using the OHDSI tools.

So, we have a concept for the variant (unspecified) of all genes, for example MYC Gene Amplification. It is a descendant of the generic MYC (MYC proto-oncogene, bHLH transcription factor) Variant. We have such generic “variant of a gene” for all genes. We could easily extend that to the mRNA expression (amplification), or just to the ones you have in your system.

The value: We would have to add a convention about this expression score. Not sure how the analytic models would pick that up, with a fixed threshold or some relative measure. But that would be a nice piece of research to correlate this with outcome or treatment effect.

Perfect! We will definitely go forward with this approach and let you know if it works.
From what I can see its around 90ish genes that have the unspecific variant concepts (Unspecific variants). For now, we can just make internal custom unspecific variant concepts for our genes of interest, but we can update you along the way to see if the approach you described works. Thank you for the help :slight_smile:

@Mikailgo If you have the breakpoints established within the assay, you can bring not just numbers, but also categorical interpretations using the value_as_concept_id.

Wouldn’t it be better a property of the SPECIMEN table? At least on the day when we’ll have an external key link for the Measurement to Specimen?

t