OHDSI Home | Forums | Wiki | Github

New working group: Clinical Trials

Dear ClinTrials WG members, you should have received an email from me today with notes from our meeting on 11 October, to resume group activity. Great call, and plans made for the WG focus. We will have 2 or 3 calls in 2019 to get things restarted (invites will follow soon), and our fortnightly calls will resume from January 2020. If you didn’t get that email, it means I don’t have you on my WG members list. Please email me at sonia.araujo@iqvia.com so I may add you to that list. Thanks, Sonia.

@EmmaVos, @Ajinkya_Patale, @jliddil1, @tom_snelling - please email me at sonia.araujo@iqvia.com with your name and email address so I may add you to the WG’s distribution list. Thanks, Sonia

great,i am also interested in this group too

I’d like to join as well.

forum response to the proposed agenda

- Actions from last meeting, incl conclusion on group approach for SDTM->OMOP conversion
- Refine use case for testing this approach
- Discuss feasibility / relevance of using clinical trial data from Gates Foundation as test bed to showcase / prove our SDTM->OMOP conversion approach
- Discuss any other clinical trial data sources for this purpose

possible sources are: (each repository offers several studies, not just one example listed below)


NIDA request approval process is instant.

Mike Gurley drew the Oncology WG’s attention to the ICAREdata project. It’s relevance to this WG probably has less to do with the STDM-to-OMOP ETL work than the possible uses of ETLed RCT data we might be interested in down the road. Similar to some of the things in my long post above. But if those working on the ETLs aren’t familiar with it, they may want to check out the GitHub repo of the outfit supporting the project (the Standard Health Record Collaborative) to see if there’s useful code there.

1 Like

On our last call the excellent work on ETLs from STDM drew attention to the need for a standard vocabulary for biomarkers in OMOP. Of the gaps in the CDM needed to do these ETLs, standardizing representation of biomarkers stands out to me as the most important and the one with the greatest benefit to analyses of both trial and observational data. I.e. representing trial arms and drugs not yet in RxNorm seems less challenging to accommodate and less likely to benefit other areas of OHDSI.

This recent fine paper that Patrick co-authored has a very helpful breakdown of the current impediments to trial replication due to the absence of data in EHR and claims sources. It adds to our understanding of the types of trial data that are potentially available in EHRs but cannot yet be represented in a standard way. In other words, it suggests types o concepts and concept relationships that are common to trials and EHRs that might be mappable with a minimal extension to the CDM,

Among these, I think biomarkers will help to maximize the targets in OMOP that can be mapped to from trial data in STDM.

The idea of a biomarker vocab is a bit different than the other domains in the CDM because it is as much about the relationships between concepts as it is about the coverage of the concepts in the domain. I suggest we consider the use of the Human Phenotype Ontology (HP0) for this. The HPO is the object of a very large and very mature biocuration process annotating relationships between concepts based on research evidence, it is already widely used by many researchers, and it has established linkages with standard OMOP vocabs that can function as biomarkers.

This paper describes recent work annotating LOINC concepts for lab results with HPO terms. Similar work is underway for radiologic results as represented in RadLex which has been proposed by Chan and Kwangsoo for their Radiology CDM extension. Most obviously, it has a strong connection to genomic data which it is rooted in and would be an important complement to the oncology extension of the CDM.

Juan has already done extensive work annotating standard OMOP vocab with HPO. So there is much to build on already and the fit with standard vocabs is good. There is also a natural relationship between the process of biocuration and the relationships that the HPO encodes. The evidence for determining whether a relationships comes from trials. A virtuous circle that assists in the extension of the HPO’s biocuration activities could be arranged that is driven by the same researchers and organizations who want to use it for ETLing their trial data.

Adding the HPO to the OMOP CDM including its relationships to standard OMOP concepts would add new possibilities for phenotyping and for relating clinical data to knowledge bases used in life sciences. Both of those impacts are potentially large and worthwhile. Perhaps the biggest impact would be a significant extension of the community’s ability to identify valid clinical endpoints in analyses and predictive models.

I would be happy to reach out to Peter Robinson who is a leader of HPO activities and related algorithm development, to explore this idea.

I am eager to know whether others, particularly those in the trials WG, think it might interested in this. This is work I think has a good chance of receiving external funding support because of it’s broad impact and the central role the HPO plays in many national and international research support efforts involving ontologies and knowledge bases.

For COVID19 studies (including observational studies and registries), I propose this workgroup takes a lead in guiding current PI teams to standardize their data directly into OMOP CDM. (and not SDTM). (to move away from native->STDM->OMOP but go native->OMOP.

This is separate from guidance for EHR data. I mean advanced CRF data (in addition to all the EHR guidance will now have after studyathon).

Good morning! I am Qin Ryan, a new member of OHDSI, introduced by Dr. Ana Szarfman. I am a hematologist/oncologist who works at FDA reviewing efficacy and safety on new therapies with experience on claims data analysis. I am also a cell and molecular biologist. It is my honor to contribute to join you team to work on COVID-19 data. Presently, I am still try to navigate through OHDSI but would love to contribute.

Hi all! I am Ru, an MD/PhD student at the University of Pittsburgh-Carnegie Mellon University working with the REMAP-COVID team. I am interested in joining this working group; would you be able to kindly add me to the listserv or point me in the right direction? Thanks!

Sonja, can you invite first author (from Denver) of this publication to present at some future WG meeting.

Hi there, I work with the potential of converting disease-specific registry data into OMOP CDM. It’s not really clinical trials but even less EHR data. Would I be fitting for this working group?

Cheers, Tina

1 Like

Hi Tina. Thanks for reaching out again. I remembered a discussion on this already and looked up the forum thread. Only to realise that you actually kicked off that discussion: Registry data to OMOP CDM Work Group

Sadly, there is no progress on the proposed clinical study group. On the flip side, there has been a lot of progress in the clinical trial and the UKB working group. I can imagine that especially the conventions developed in the UKB (which is a large British registry) can be interesting.

Could you tell a little more about the challenges you are facing? Then I can loop you into the right conversation.

1 Like


I recently joined OHDSI and work in personalized healthcare data science at Roche. I would like to join the Clinical Trial working group as I am currently involved in a pilot at Roche to map some study data (SDTM) to OMOP CDM. Kindly let me know if I need to reach out via email to be invited to the WG. Thanks!

@waqarali - welcome to OHDSI! Could you please email to Sonia Araujo at sonia.araujo@iqvia.com and she will add you to the CT WG. See you there!

Welcome! As @gregk noted, reaching out to Sonia is a great place to start. I would also suggest you making sure you have access to our OHDSI MS Teams environment so you can make the workgroup meetings and engage in the asynchronous discussions. You can request gain OHDSI access using this link, and then request to be parts of different workgroups, studies or chapters with this link. Thanks!

1 Like

Great, I will email Sonia then. Thanks for the links for access as well!

Hello @sonia , would really appreciate joining CT WG. Thanks)

Hi Maxim, thanks for your reply!

The challenges, in general, are that OMOP CDM focusses on EHR/trial data and is based on clinicial/medical concepts. Registry data is quite different from such data since it often includes PRO, patient-reported outcomes, ergo “smooth” data like quality of life or burden of disease. Plus, registry data mostly don’t include any kind of standard terminology/coding and makes it difficult to map. E.g. you cannot load codes into Usagi to derive OMOP concept_ids (or the corresponding tables, domains etc). So mapping each field to the OMOP CDM representation is a lot of work. A lot. :smiley:

I think in the end, most of the data can be formatted in OMOP speak (clinical data that is collected during each visit) - so I actually just wanted to work in a group that already “translated” data that is not from EHR or the like into OMOP CDM. To share a burden :wink: and to learn from other’s experiences. So maybe I just need a group of people who have actively done the ETL design (that would already be the most helpful step) of non-EHR data. Or even EHR data. Is there a guideline (apart from the Book of OHDSI or the ETL tutorials) for how to map data that has no source codes, just Strings/text/measurements etc.?


This really is a good discussion. More and more RWD/RWE is looking at blended sources. Figuring out how to tag all the data from these sources is indeed quite a challenge. There is no one size fits all answer at this point