Genomic Data in the CDM

@Mi-So:

Very timely message, and we are clearly behind in disseminating what the Oncology WG has done. It will all be visible at the Symposium.

But take a look at Athena and the latest vocabulary release. It essentially did what you are suggesting:

  • All genomic concepts are domain_id=‘Measurement’
  • We incorporated the HGNC canonical human genes (vocabulary_id=‘HGNC’), but declared them as variants of those genes (as the intact gene is not a finding).
  • We built genomic variants based on a number of collections (Jax, Clinvar, CIViC, cgi, CAP, NCIt), instead of the totality of all possible variants. We are working with other collections (oncoKB and Cosmic).
  • If the variant is defined at the molecular level the HGVS notation is the concept_code, with the reference sequence provided by the source vocabularies. These are on the genomic, transcript or protein levels
  • Otherwise there are less precise variants (e.g. Protein expressions)

Start for example with this and click yourself through: https://athena.ohdsi.org/search-terms/terms/35955862

Thoughts?

2 Likes