On our last call the excellent work on ETLs from STDM drew attention to the need for a standard vocabulary for biomarkers in OMOP. Of the gaps in the CDM needed to do these ETLs, standardizing representation of biomarkers stands out to me as the most important and the one with the greatest benefit to analyses of both trial and observational data. I.e. representing trial arms and drugs not yet in RxNorm seems less challenging to accommodate and less likely to benefit other areas of OHDSI.
This recent fine paper that Patrick co-authored has a very helpful breakdown of the current impediments to trial replication due to the absence of data in EHR and claims sources. It adds to our understanding of the types of trial data that are potentially available in EHRs but cannot yet be represented in a standard way. In other words, it suggests types o concepts and concept relationships that are common to trials and EHRs that might be mappable with a minimal extension to the CDM,
Among these, I think biomarkers will help to maximize the targets in OMOP that can be mapped to from trial data in STDM.
The idea of a biomarker vocab is a bit different than the other domains in the CDM because it is as much about the relationships between concepts as it is about the coverage of the concepts in the domain. I suggest we consider the use of the Human Phenotype Ontology (HP0) for this. The HPO is the object of a very large and very mature biocuration process annotating relationships between concepts based on research evidence, it is already widely used by many researchers, and it has established linkages with standard OMOP vocabs that can function as biomarkers.
This paper describes recent work annotating LOINC concepts for lab results with HPO terms. Similar work is underway for radiologic results as represented in RadLex which has been proposed by Chan and Kwangsoo for their Radiology CDM extension. Most obviously, it has a strong connection to genomic data which it is rooted in and would be an important complement to the oncology extension of the CDM.
Juan has already done extensive work annotating standard OMOP vocab with HPO. So there is much to build on already and the fit with standard vocabs is good. There is also a natural relationship between the process of biocuration and the relationships that the HPO encodes. The evidence for determining whether a relationships comes from trials. A virtuous circle that assists in the extension of the HPO’s biocuration activities could be arranged that is driven by the same researchers and organizations who want to use it for ETLing their trial data.
Adding the HPO to the OMOP CDM including its relationships to standard OMOP concepts would add new possibilities for phenotyping and for relating clinical data to knowledge bases used in life sciences. Both of those impacts are potentially large and worthwhile. Perhaps the biggest impact would be a significant extension of the community’s ability to identify valid clinical endpoints in analyses and predictive models.
I would be happy to reach out to Peter Robinson who is a leader of HPO activities and related algorithm development, to explore this idea.
I am eager to know whether others, particularly those in the trials WG, think it might interested in this. This is work I think has a good chance of receiving external funding support because of it’s broad impact and the central role the HPO plays in many national and international research support efforts involving ontologies and knowledge bases.