Data modeling NGS/CGP data

rkwpnw · March 24, 2025, 10:07pm

Are there best-practices on how NGS/CGP data should be modeled across Measurement, Specimen and other tables? Is OMOP Extension the standard for NGS Athena

Alexdavv · March 24, 2025, 10:54pm

Hi Roshanthi!

There was an attempt to model rich genome data using the extension to the CDM, but it didn’t gain widespread adoption.

Lately, the Onco WG has done another attempt through the closed world content modeling approach. You can learn about it here and here. This is a standard now, but we’re looking for resources to maintain and expand it.

And yes, that OMOP Extension concept is the standard since SNOMED and other vocabularies didn’t have one.

BTW, what problem do you want to solve?

rkwpnw · March 24, 2025, 11:18pm

Thanks @Alexdavv I am trying to figure out how to map the ordered tests for NGS (part 1) and then later how to model the gene and variant level results (part 2). I believe the current practice is to extract gene and variant level information into HGNC and Clinvar concepts. We have about 12K pts with these types of tests for oncology in our CDM.

jmethot · March 31, 2025, 2:31pm

Hi @rkwpnw. The conventions aren’t fully developed but I believe the intention is that for somatic variant results you store each variant in the MEASUREMENT table using concepts from the OMOP Genomic vocabulary . For example if you have somatic results containing an EGFR T790M mutation call, you would store a MEASUREMENT record with measurement_concept_id=19576591.

If you have stored a primary cancer diagnosis in your CONDITION_OCCURRENCE table according to this convention, then you would use the measurement_event_field_concept_id and measurement_event_id fields of the MEASUREMENT record to link the variant’s MEASUREMENT record to the diagnosis.

I don’t know of a convention for representing the value of the OMOP Genomic MEASUREMENT record. I think some people just use the existence of the record as “positive” while others store a concept representing “positive” in the value_as_concept field.

Note that currently all variants stored in this way are assumed to be somatic. Representation of germline variants is an outstanding problem.