We are mapping a cancer genomics dataset that includes the concept of a sample derived from a specimen and are not sure how to map this to OMOP CDM. Our specimen table maps well to OMOP specimen, but then we also have a ‘sample’ that describes a nucleic acid sample derived from the specimen (with types such as DNA / RNA / ctDNA / protein).
This sample object is important, because it is the physical object sent to the sequencing centre that links the generated sequencing metadata and files back to the Person (i.e. the results that come back from the sequencing centre are all labelled with sample id, not specimen id).
Do people have examples of modelling a specimen derived from a specimen, or other suggestions about how to set this up?
It depends what you want to do. Let me unpack a little from what you said:
You said you are ok with the specimen. I assume that refers to biopsy or surgery specimen of cancer, correct?
You said you have “derived specimen”. What happens to a specimen, and what processes are done with it, is not covered in OMOP.
Why not? Because for the OMOP use cases we model what happens to a patient, not what an organization does or does not. In other words, whatever method is used to derive a genomic marker or variant is not captured. Instead, only the result matters. For example, if you do full-genome sequencing you will produce DNA abberations, if you do transcriptomics you will get RNA variants, if you do immunohistochemistry or some advanced proteomics you will get protein variants, if you do FISH you will get structural variants, etc. These then can be used for our methods and analytics.
Do you have another use case? Please come to the Oncology WG and bring it up. We want to know.
This is not part of the standard OMOP CDM. You would need it if you wanted to do the open genomic research typically done for bioinformatics use cases, like marker discovery, validation or GWAS. For OHDSI type studies, where you want to understand the effect of markers on disease progression and treatment effect, all you need is the genomic variants in the MEASUREMENT table represented by the OMOP Genomics vocabulary and you are good to go.
Well, with visits, we had a similar challenge and created visit_detail table.
You either create an entity to be hierarchical or new, drill down (1 level down only) table
If current model does not have the complexity you need, you
A) force it anyway. (with caveats)
B) extend columns (preferred to C)
C) extend tables (if B is impossible to do)
Or you do it the “FHIR way” (non relational).
(or a “json/yaml/xml” column (hybrid way))
A biomarker from specimen of specimen is a very “patient” thing. That is the whole premise of precision oncology. If my choice of drug is decided based on secondary construct in specimen, it better be in OMOP.