Genomic Data in the CDM

Christian_Reich · October 7, 2020, 5:26pm

Very timely message, and we are clearly behind in disseminating what the Oncology WG has done. It will all be visible at the Symposium.

But take a look at Athena and the latest vocabulary release. It essentially did what you are suggesting:

All genomic concepts are domain_id=‘Measurement’
We incorporated the HGNC canonical human genes (vocabulary_id=‘HGNC’), but declared them as variants of those genes (as the intact gene is not a finding).
We built genomic variants based on a number of collections (Jax, Clinvar, CIViC, cgi, CAP, NCIt), instead of the totality of all possible variants. We are working with other collections (oncoKB and Cosmic).
If the variant is defined at the molecular level the HGVS notation is the concept_code, with the reference sequence provided by the source vocabularies. These are on the genomic, transcript or protein levels
Otherwise there are less precise variants (e.g. Protein expressions)

Start for example with this and click yourself through: https://athena.ohdsi.org/search-terms/terms/35955862

Thoughts?