Mapping Clinical Trial & Public Research Datasets to OMOP – Experiences?

Hello OHDSI community,

I’m interested in connecting with others who have mapped — or are currently mapping — clinical trial datasets and/or public research datasets to the OMOP CDM.

We are currently working on mapping study-specific data from the Anti-Amyloid in Asymptomatic Alzheimer’s Disease (A4) secondary prevention trial to OMOP and would welcome insight from anyone who has tackled similar work.

Example: A4 Study Dataset

The A4 study includes 1,147 cognitively unimpaired individuals (ages 65–85) randomized to placebo (n=583) or solanezumab (n=564), followed for 240 weeks across 67 international sites. All participants entered with a Clinical Dementia Rating – Global (CDR-G) of 0.

Questions for the Community

  1. CDR and other cognitive scales
  • Have you mapped CDR (including box-level granularity) to standard vocabularies?
  • Did you use LOINC concepts for total scores and custom concepts for box-level components?
  • How did you handle derived composites (e.g., cognition vs function)?
  1. Clinical trial–specific variables
  • How are others representing:
    • Randomization arm
    • Study visit structure (e.g., week 0–240 schedule)
    • Analysis-derived variables (e.g., progression defined as change at two consecutive visits)?
  • Are you using OBSERVATION, MEASUREMENT, FACT_RELATIONSHIP, or custom extensions?
  1. Public datasets
  • Has anyone mapped datasets like ADNI or other longitudinal research cohorts to OMOP?
  • Are there shared mapping specifications we should align with?

Motivation

Our goal is to:

  • Enable standardized analytics across trial and observational datasets
  • Explore reproducible phenotyping of progression in pre-symptomatic Alzheimer’s disease
  • Evaluate whether cognitive and functional subcomponents (e.g., CDR memory box vs community affairs) behave differently across amyloid strata

If anyone has experience with mapping these types of datasets, or prior forum threads to share, we would greatly appreciate it.

Happy to share our mapping decisions and lessons learned as this progresses.

Thanks in advance,
Ben

@terimsippel @sazimian @Robert_Barrett @Haeun_Lee

Hi Ben,

The first thing I would suggest is joining the Clinical Trials Workgroup. It is a place where people gather every other Friday to discuss and develop conventions for converting clinical trials data to OMOP. The next meeting will take place next Friday, Feb 27.

Currently, the team is working on converting TB clinical trials data in SDTM format to the OMOP CDM. Previously, we worked on converting the PHUSE dataset, which is a synthetic dataset related to Alzheimer’s disease, also in SDTM format. Unfortunately, this dataset has not been fully converted.

You can also review the existing CT WG conventions - perhaps you will find answers to some of your questions there. As for concept mapping questions, you may want to reach out to the terminology professionals on the forum.

Hi Ben,

  1. LOINC has a concept id for CDR sum of individual scores 72088-8, corresponding to OHDSI standard concept id 42869843. Standard concepts for the cognition and function elements of the total score depend on the question asked.
  2. Trial specific variables: as Philip mentioned, many of these topics are discussed in the Clinical Trial Working Group. Most of the proposals for mapping clinical trial data make use of existing fields and domains. There are some vocabulary gaps which the Vocabulary Team has fixed and continues to take suggestions from this working group to accommodate unique features of clinical trial data.
    Come to the CTWG meeting on Feb 27 at 11 AM EST! We meet the 2nd and 4th Friday of the month. Find link in the Workgroup Schedule on the OHDSI Webpage. Upcoming Workgroup Calls – OHDSI

In general this:

For events and CRF data that you expect to also appear in routine care, you hope to map all to concepts that will make the data “converge”.

Some examples are in my AoU paper below. (e.g., AoU participant is asked during research about alcohol intake and if that later changes (and EHR data has the update), the record would show that under the same concept.

In your case, e.g., CDR-G data would appear under same concepts in A4 data as in EHR data.

For research only data (too granular to be in routine care and EHR), it only matters once you have 2 or more studies (e.g., A4 study and “some other “A5” and “A6” study” (see the whole promise for CDEs)

2018 work
https://www.researchgate.net/publication/340416860_Converting_clinical_trial_data_between_CDISC_SDTM_and_OMOP_CDM

The largest study ever mapped to OMOP is called AllOfUs. :slight_smile:

Also relevant may be Learning important common data elements from shared study data: The All of Us program analysis - PMC