Requirements for Clinical Study/Trial data in OHDSI

Hello. I’m working on representing data from completed clinical trials in OHDSI. My goals are to have a common representation from which to (or with which) harmonize the data for further study. After posting in Implementors, @Christian_Reich suggested starting a proposal for extending the CDM to support clinical data. I thought I’d start with a discussion to find the level of interest, and perhaps some collaborators and experts.

I have a history in software development, including some brushes with ETL and data modeling, but I’m new to clinical data. I’ve explored OHDSI enough to have mapped our lab data into the measurements and observation tables. I need to work on visits, death, and others. I suspect an incomplete mapping will serve my needs, though I can’t deny a desire to see a thorough job completed.

What’s the level of interest? Is there some prior work you know about I should look at?


Hi Chris,

A few of us at SHYFT are also interested in this topic and would be open to discussing further. There was another thread that stalled out of the gates in the forums for converting CT data, but happy to try and get this going again.

Can you clarify your use cases that you are considering? Are you thinking about being able to compare CT to RWE datasets?

cc: @Astern, @anna_corning

Josh, folks,

I’m interested in studying different sets of completed clinical data which likely have different representations, using similar methods. Since these are completed, and often older, they don’t always adhere to standards. If many studies are uniformly represented, then the same methods can be easily applied to many sets of data/trials.

If RWE stands for Real World…Examples/Experience? There is an element of that when an analysis provides a way to predict a phenotype, we want to be able to use a model trained with clinical trials data to classify a new patient.

This looks like the start:


I have a small request - once upon a time we originally set up the COHORT table to include the different arms of clinical trials. This may need to be revisited, but there are things that are related to any purposeful study or measurement plan, not only clinical trials that could go here. UC system is using COHORT to represent the enrollment in VBC and reporting programs with quality measurement targets. I would very much like this to be a continuum with clinical trials so that OMOP adopters might internalize the continuum between evidence generation and reporting & dissemination…

@CRoeder, @Daniella_Meeker, @Vojtech_Huser: Do you want to sit down and think it through, and then bring in other people? You can also ping @mitrarocca, @krfeeney and @chunhua, who are working on linking OHdSI with clinical trials.

I’m still learning the data and issues involved. I have a small list so far. Does this group get traction with a thread of issues as they come up?

UC: Colorado? I’m still meeting other folks on campus working on OHDSI.

re: using COHORT for clinical trials. I haven’t crossed that bridge yet, but it might be nice to use a single OHDSI instance for multiple trials. Of course I’d want to make sure the person_ids don’t overlap. I wondered if COHORT could be part of the solution for identifying the trials.

I proposed concepts for capturing enrollment here

the same thread has prior trials-and-OMOP efforts.

there is even proposal here


It should be moved to the github.

It is not clear how to distribute a set of conventions.
Do you want to propose new tables or just new conventions?

The concepts for withdrawl and enrollment seem useful.

Mapping person Ids might be more involved. It wasn’t clear to me what you meant by a person identified by NCT0000123456. Are you suggesting a composite person_id that has a prefix related to the study, or have the study id as foreign key in the person table, perhaps even making a compound key of person_id and study_id.

I’m dealing with completed studies that have anonymized data, so a correlation between a study participant and a real, fully identified person in an EHR isn’t an issue, but I am considering hosting the data for many studies in the same OMOP instance. A quick and dirty solution would be to just use the person id from the study in OMOP, but I’d need to void id collisions. Including the study_id in some way would avoid that. A study table and id is appealing, though introducing a compound (person_id, study_id) key to the schema may not be.

I’d like to join this group too; apologies to Vojtech and Christian: it took me long to respond to this thread.

I agree that withdrawal and enrollment are useful information to be included about retention status.

I’d like to add visit_ID because recurring visits are common in clinical trials; visit_ID can help refer to a specific visit in the protocol and link with study calendar, adverse events, etc.

Another relevant field is adverse events.

What is the scope of this project? Does this cover only completed trials? How about supporting pragmatic trials? if it is the latter, we need to add more data fields.

Could we plan for a f2f discussion during the upcoming OHDSI meeting for this group? I will be there 10/17-19.

Visit ID would help here too.
I’m looking at the *_occurrence tables to model various events.

I don’t remember those times, they might have been in old fairy tale times. :slight_smile: But I still think it is a good idea. Proble is that the current COHORT selects from existing data, and doesn’t prescribe anything like a trial protocol or CRF does. So, alternatively, you start your own clinical trial tables, and you add some more specifics you need for describing the (completed) clinical trial well:

  • Design, with arms and complex rules (cross-over)
  • Defined visits and what goes in
  • Mapping of defined visits to the visits and their content that actually happened, including missing values
  • Sites
  • what else?

Like what?

Sounds like a good idea.

I’m very interested in this, if you’re creating a list of names for future communications.

All, is there any more update on this topic? I would like to explore the possibility of mapping from SDTM to OMOP, and was wondering anyone already have done that or not. .

@tanvir - I’m not sure what was discussed at the OHDSI meeting last year. I’d also be interested to know if there is still interest in this topic and where it currently stands from the other OHDSI leaders in this thread.

At the end of last year, I did a proof of principle conversion on the CDISC SDTM demo data set with Odysseus and we were able to convert the data to OMOP CDM v5. It wasn’t perfect OOB because there were some concepts that couldn’t be translated to the standard OMOP vocabularies (e.g., the investigational treatment from the trial is Xanomeline which never made it out of CT).

I’d be happy to talk about our experience.

@Eldar and @Dymshyts anything you’d want to add based on our experience on this?

I don’t think there is any update. The OHDSI sub-community of claims data or EHR data is much larger than sub-community of clinical trials data.

Well, small update - the proposal was moved to github

Do you have any RCT data for conversion?
I’m interested in this, too.

Yes. At NIH Clinical center, all we do is clinical studies. We have about 2k ongoing studies and and about 10k past studies.

See here: https://btris.nih.gov

I posted a problem with marking when a patient enters a trial and exits it and standad way of refering to a study (NCT id).
Another problem is with CRF data (case report forms) - with new SURVEY table that is mitigated.

Interested parties should not only “join the subworkgroup” but write into forum or into our Github CDM issue a description of their 3 most pressing problems with clinical study/trial data they face and how they propose to solve it (within CDM using a new convention or new table).

Great, @Vojtech_Huser!
I do know that NIH is doing lots of tremendous clinical trials. I just wondered we could assess to those data.

In this European OHDSI, I’ll discuss with @schuemie how to implement recursive partitioning for heterogeneous causal effects on OHDSI study pipeline.
In the previous paper using this algorithm, you can see how this algorithm can analyze causal effect of intervention and find subgroup beneficial for the intervention.

I can acquire some published RCT datas, too.
If we can gather the clinical trial data for the similar topics, it would be so awesome.