REDCap data to OMOP-CDM

sshah988 · June 2, 2020, 12:16am

Hi all, I am very new to the OMOP-CDM model and the OHDSI community. I work as a Machine learning analyst for the University of Chicago. We are currently using the OMOP-CDM for our clinical trials data. We were interested in storing the REDCap data to OMOP-CDM. I was unable to find and good ideas for this. I wanted to ask if anyone had any ideas or have already implemented something of this sort. Would really appreciate some help.

Thanks,
Sameep

krfeeney · June 2, 2020, 12:50am

@sshah988, great topic! I was just talking to our friends at Montiefore today. They have developed an OMOP to REDCap pipeline.

Send me an email Kostka(at)ohdsi.org. I’ll get you connected.

mkwong · June 2, 2020, 12:52am

I would think the same approach applies when mapping EHR data to OMOP as it does for REDCap. Basically exporting your REDCap to a electronic and structured document or documents and then writing an ETL to parse, map, then load the data into your OMOP CDM.

MK

SELVA_MUTHU_KUMARAN · June 2, 2020, 1:34am

In our case, We mapped our REDCap data (Survey data) to the observation table. May I check with you guys on whether that’s how it is done usually? I feel our data which contains survey questions and surveys are a better fit to Observation table.

mkwong · June 2, 2020, 2:37am

Hi,

Not everything goes into the observation table. It greatly depends on what is the nature of the data in REDCap. For example if you have data pertaining to whether the patient has congestive heart disease, that goes into the condition table. Similarly, if your REDCap data is a heart rate or blood pressure value, it goes into the measurements table. When you define the data mapping, I typically look at the concept.domain_id to help tell me which table the data should be mapped and stored to.

MK

FrankFox · June 3, 2020, 11:58am

Hi All,
I am also working on mapping survey data to the OMOP-CDM - on the EHDEN.eu project. I am mapping an ICHOM standard set (i.e. Survey results) to the OMOP-CDM and hope you can help me with the following questions.
• In the conventions for the OBSERVATION table it says that “Valid Concepts of the VALUE_AS_CONCEPT field are not enforced, but typically belong to the ‘Meas Value’ domain”. When using the OBSERVATION table for survey responses, these will then typically be from the ‘Observation’ domain. Do I understand this correctly?

• When using USAGI to generate mappings of question-responses to existing standard concepts, the suggested mappings vary between domains. For example, the two questions below are establishing the presence of prior heart or lung disease and these are USAGI’s best-guess.
Source Description Target Concept_ID Target Description Vocabulary Domain
Other heart disease 45879053 Other heart disease LOINC Meas Value
Chronic lung disease 4188164 History of chronic lung disease SNOMED Observation
My question: What criteria should be used for selecting the best match for a survey question-answer? Would you focus on description? Are Vocabulary & Domain important?

• If all the question-answers are stored in the OBSERVATION table – including those that might also be expected to be in the CONDITION_OCCURRENCE table (such as the two above), will this make the information difficult to find for others?

• A related question: How would one distinguish the source of the answer to the above two questions? e.g. a patient response, a link to an EHR record, a medical professional assessment on-the-spot.

I would be happy if you can point me to information that might help me here.
Thank you.
Frank

Christian_Reich · June 3, 2020, 3:46pm

There is a tricky point here, @FrankFox. Surveys are good to collect medical facts. But then the question is how are they represented. They can be represented in the SURVEY (not OBSERVATION) table, and you can have all the question-answer pairs you need. In fact, we can just upload your survey and make them standard concepts. The problem with that is that only people familiar with the survey will look for it, or even know it’s there. Remember, we are doing remote network research, nobody can see behind the firewall. So, what needs to happen is that the survey question-answer pairs need to be mapped to the actual facts and put into the CONDITION_OCCURRENCE, DRUG_EXPOSURE, PROCEDURE_OCCURRENCE and OBSERVATION tables, respectively. So, when in USAGI you need to do the following: Take a question-answer pair (not separately, USAGI cannot do that) and define the Domain. In that Domain, find the right concept.

For example, let’s say your Survey has the question “What chronic diseases are you suffering from?” and the answers are “Type 2 Diabetes”, “High blood pressure” and “Depression”. You then upload these pairs (e.g. “What chronic diseases are you suffering from - Type 2 Diabetes” and find the Standard Concept (Type 2 diabetes mellitus).

That is in the Type Concept. So, e.g. the PROCEDURE_OCCURRENCE record will have in condition_type_concept_id the value 581412 - Procedure Recorded from a Survey. We are in the process of consolidating these, so this particular Concept is going away soon. But the same principle will apply.

Works?

sshah988 · June 3, 2020, 3:49pm

@krfeeney I have sent you an email from by uchicago email. thank you for the help

sshah988 · June 3, 2020, 3:53pm

@mkwong Does REDCap have an option to directly export data into electronic structured document format?

gregk · June 3, 2020, 3:57pm

@Christian_Reich on EHDEN - and just about elsewhere - we still use OMOP CDM v5.3.1. SURVEY table was introduced in CDM v.6.

btw, found this post from way back, could be relevant for this discussion

github.com/OHDSI/CommonDataModel

SURVEY data in OMOP CDM - updated

opened 09:35AM - 23 Nov 17 UTC

closed 07:19PM - 25 Oct 18 UTC

ColinOrr2006

Proposal Accepted

# Adding Patient Reported Outcome data to CDM - Requestor: Colin Orr and Cath…erine Kerr, ICON plc - Revising party: Joshua Ransom, Anna Corning, Emelly Rusli, Rayhnuma Ahmed, Aaron Stern; SHYFT Analytics - Discussion: [here](http://forums.ohdsi.org/t/linking-patient-survey-responses-together/2413/19) ## Background ICON plc is currently engaged in a project with [[http://www.ichom.org/|ICHOM (International Consortium for Health Outcomes Measurement]]. ICHOM's mission is to unlock the potential of value-based healthcare by defining global Standard Sets of outcome measures that really matter to patients for the most relevant medical conditions and by driving adoption and reporting of these measures worldwide. ICHOM brings together patient representatives, clinician leaders, and registry leaders from all over the world to develop Standard Sets, comprehensive yet parsimonious sets of outcomes and case-mix variables for specific medical conditions that ICHOM recommends all providers track. Each Standard Set focuses on patient-centered results, and provides an internationally-agreed upon method for measuring each of these outcomes. ICHOM believes that standardized outcomes measurement will open up new possibilities to compare performance globally, allow clinicians to learn from each other, and rapidly improve the care provided to patients. ICHOM Standard Sets include baseline conditions and risk factors to enable meaningful case-mix adjustment globally, ensuring that comparisons of outcomes will take into account the differences in patient populations across not just providers, but also countries and regions. They also include high-level treatment variables to allow stratification of outcomes by major treatment types. A comprehensive data dictionary, as well as scoring guides for patient-reported outcomes is provided for each Standard Set. ## Proposal ICON plc is developing a platform to ingest, store and analyse the patient outcome measures and is using the OMOP Common Data Model to store the data. The current CDM satisfies many of the requirements, but there are some gaps, specifically: - We need to store data relating to each Patient Reported Outcome (PRO) questionnaire that is completed by a patient. Examples of this type of data are; timestamp of when the questionnaire was completed, did the patient complete it with assistance, role of person who completed the questionnaire, etc. We also need to store the attributes related to the timing of the survey in relation to the treatment the patient received - for example, 'baseline', or 'six month follow-up'. This is additional contextual data that allows us to compare outcomes over time. To store this data, we propose introducing a new SURVEY table. Each row in the table represents an instance of a completed survey and serves to link a number of survey questions and answers together. Individual questions and their answers are stored as name-value pairs in the OBSERVATIONS table. The OBSERVATIONS table requires some additional columns in order to maintain the relationship with the patient questionnaire (SURVEY) as described below. ### SURVEY table The SURVEY table is used to store an instance of a completed survey or questionnaire. It captures details of the individual questionnaire such as who completed it, when it was completed and to which patient treatment or visit it relates to (if any). Each SURVEY has a SURVEY_CONCEPT_ID, a concept in the CONCEPT table identifying the questionnaire e.g. EQ5D, VR12, SF12. Each questionnaire should exist in the CONCEPT table. Each SURVEY can be optionally related to a specific patient visit in order to link it to a specific patient assessment or treatment. Field | Required | Type | Description -- | -- | -- | -- SURVEY_OCCURRENCE_ID | Yes | integer | Unique identifier for each completed survey SURVEY_CONCEPT_ID | Yes | integer | A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of survey. PERSON_ID | Yes | integer | A foreign key identifier to the Person in the PERSON table about whom the survey was completed VISIT_OCCURRENCE_ID | No | integer | A foreign key to the visit_occurrence table during which the survey was completed RESPONSE_TO_VISIT_OCCURRENCE_ID | No | | A foreign key to the visit in the visit_occurrence table during which treatment was carried out that relates to this survey. SURVEY_START_DATE | No | date | Date on which the survey was started SURVEY_START_DATETIME | No | Timestamp | Date and time on which the survey was started SURVEY_END_DATE | Yes | Date | Date on which the survey was completed SURVEY_END_DATETIME | No | Timestamp | Date and time on which the survey was completed ASSISTED_CONCEPT_ID | No | integer | A foreign key to the predefined Concept identifier in the Standardized Vocabularies indicating whether the survey was completed with assistance or not (yes / No) ASSISTED__SOURCE_VALUE | No | varchar(100) | Source value representing whether patient required assistance to complete the survey. Example: “Completed without assistance”, ”Completed with assistance”. RESPONDENT_TYPE_ CONCEPT_ID | No | integer | A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the respondent type. Example: Research Associate, Patient RESPONDENT_SOURCE_VALUE | No | varchar(100) | Source code representing role of person who completed the survey. TIMING_CONCEPT_ID | No | integer | A foreign key that refers to a timing Concept identifier in the Standardized Vocabularies Example: 3 month follow-up, 6 month follow-u, … TIMING_SOURCE_VALUE | No | varchar(100) | Text string representing the timing of the survey. Example: Baseline, 6-month follow-up COLLECTION_METHOD_ CONCEPT_ID | No | varchar(10) | A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the data collection method (e.g. Paper, Telephone, Electronic Questionnaire) COLLECTION_METHOD_SOURCE_VALUE | No | varchar(100) | The collection method as it appears in the source data. SURVEY_SOURCE_VALUE | No | varchar(100) | The survey name/title as it appears in the source data. SURVEY_SOURCE_IDENTIFIER | No | varchar(100) | Unique identifier for each completed survey in source system VALIDATED_SURVEY_ CONCEPT_ID | No | Integer | A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the validation status of the survey. SURVEY_VERSION_NUMBER | No | Varchar2(20) | Version number of the questionnaire / survey used. PROVIDER_ID | | | ### OBSERVATION table Patient responses to survey questions are stored in the OBSERVATION table. Each record in the OBSERVATION table represents a single question/response pair and is linked to a specific SURVEY / questionnaire in the SURVEY_OCCURRENCE_ID. Each response record is the response to a specific question identified by the OBSERVATION_CONCEPT_ID. This concept ID is a unique question contained in the CONCEPT table. An individual survey question can have multiple responses to a question (e.g. which of these items relate to you, a, b, c ,…?). Each response is stored as a separate record in the OBSERVATION table. The question / answer observation record is linked to the patient questionnaire used for collecting the data using two new fields in the OBSERVATION table; DOMAIN_ID and DOMAIN_OCCURRENCE_ID. DOMAIN_ID for any survey related observations contains the text ‘Survey’ and DOMAIN_OCCURRENCE_ID contains the SURVEY_OCCURRENCE_ID of the specific survey. This domain construct can be used for other observation groupings. The OBSERVATION table can also store survey scoring results. Many validated PRO questionnaires have scoring algorithms (many of which proprietary) that return an overall patient score based on the answers provided.. Survey scores are identified by their OBSERVATION_CONCEPT_ID and are linked back to the scored survey using the same DOMAIN construct described. In the name/value pair model, the name (question) is stored as OBSERVATION_CONCEPT_ID and the value (answer) is stored as OBSERVATION_AS_CONCEPT_ID where the answer is categorical and is defined as a concept in the concept table, OBSERVATION_AS_NUMBER where the answer is numeric, OBSERVATION_AS_STRING where the answer is a free text string or OBSERVATION_AS_DATETIME. **_Amendments required to the OBSERVATION table are as follows_** Change | Field | Required | Type | Description -- | -- | -- | -- | -- New | DOMAIN_OCCURRENCE_ID | No | integer | A foreign key to SURVEY table New | DOMAIN_ID | No | | ‘Survey’ New | VALUE_AS_DATETIME | No | Timestamp | The observation result stored as a datetime value. This is applicable to observations where the result is expressed as a point in time. ### Other Considerations - Extensions to the concept table include the survey and response data that is not currently contained in the standard libraries. All custom extensions to the concept table have been stored in the negative address space so as not to conflict with the currently defined standard. These extensions are not included in the definition of this proposal but should be considered for future work. - There is no formal definition of the relationship between a questionnaire/survey and the questions presented on that survey. There is an implicit relationship created when survey/response data is stored. If an explicit relationship is required, this can be achieved using the FACT_RELATIONSHIP table. ### Use Cases The example below describes the data to be stored for a question on the HOOSPS (Hip Disability and Osteoarthritis Outcome Score) patient questionnaire. The question asks the degree of difficulty in descending stairs due to the patient's hip problem. The patient answers "Moderate". The CONCEPT table contains domain data for the survey HOOSPS, question (HPS1) plus all the potential values that a patient can respond with. ### CONCEPT table – example CONCEPT _ID | CONCEPT _NAME | DOMAIN _ID | VOCABULARY_ID | CONCEPT_ CLASS_ID | STANDARD _CONCEPT | CONCEPT _CODE -- | -- | -- | -- | -- | -- | -- -2020 | HPS1 | Metadata | Domain | Domain | | ICHOM generated -2021 | None | HPS1 | ICHOM Observation | PRO Measure | S | 0 -2022 | Mild | HPS1 | ICHOM Observation | PRO Measure | S | 1 -2023 | Moderate | HPS1 | ICHOM Observation | PRO Measure | S | 2 -2024 | Severe | HPS1 | ICHOM Observation | PRO Measure | S | 3 -2025 | Extreme | HPS1 | ICHOM Observation | PRO Measure | S | 4 -3501 | HOOSPS | Metadata | ICHOM Survey | Domain | | ICHOM generated The patient response is captured as a code 2 (in this instance) in the questionnaire. The CONCEPT_ID is determined by finding a match in the concept table for the code (2) for the specific question (identified by HPS1) in column DOMAIN_ID and the response value (2) in the column CONCEPT_CODE. ### SURVEY table - example Column | Value | Comment -- | -- | -- SURVEY_OCCURRENCE_ID | 19073 | SURVEY_CONCEPT_ID | -3501 | Concept for HOOSPS survey PERSON_ID | 21405 | VISIT_OCCURRENCE_ID | | RESEPONSE_TO_VISIT_OCCURRENCE_ID | 13403 | SURVEY_START_DATE | | SURVEY_END_DATE | 2016-07-14 | ASSISTED_CONCEPT_ID | -3601 | Concept for "Completed without assistance" | | ASSISTED_SOURCE_VALUE | Complete w/o assistance | RESPONDENT_TYPE_CONCEPT_ID | 3611 | Concept for "Patient-reported" RESPONDENT_SOURCE_VALUE | P-REP | Source system value for "Patient-reported" TIMING_CONCEPT_ID | -3621 | Concept for "BASELINE" timing TIMING_SOURCE_VALUE | Baseline | COLLECTION_METHOD_CONCEPT_ID | -3631 | Concept for "Electronic questionnaire" COLLECTION_METHOD_SOURCE_VALUE | E-QUEST | Source system value for "Electronic questionnaire" SURVEY_SOURCE_VALUE | HOOSPS | SURVEY_SOURCE_IDENTIFIER | HS001234 | VALIDATED_SURVEY_CONCEPT_ID | -3701 | Concept for "Validated survey" SURVEY_VERSION_NUMBER | | PROVIDER_ID | | ### OBSERVATION table - example Column | Value | Comment -- | -- | -- OBSERVATION_ID | 794657 | PERSON_ID | 21405 | OBSERVATION_CONCEPT_ID | -2020 | Concept for HPS1 OBSERVATION_DATE | 2016-07-14 | OBSERVATION_DATETIME | | OBSERVATION_TYPE_CONCEPT_ID | XXX | CONCEPT_ID to indicate PRO survey response’ VALUE_AS_CONCEPT_ID | -2023 | Concept for "Moderate" VALUE _AS_STRING | | VALUE _AS_NUMBER | | VALUE_AS_DATETIME | | PROVIDER_ID | | VISIT_OCCURRENCE_ID | | OBSERVATION_SOURCE_VALUE | degree of difficulty in …. | OBSERVATION_SOURCE_CONCEPT_ID | | DOMAIN_ID | Survey | DOMAIN_OCCURRENCE_ID | 19073 |

Andrew · June 3, 2020, 4:12pm

Should there be path for this in cases where it is justified because the measures are reliable, standardized, and widely used?

A rule of thumb might roughly defines criteria use to support that justification might be an origin in a curated source that promotes thoroughness of input and development and the breadth of use. That would prevent the vocabulary from becoming littered with little used or poorly conceived concepts, but allow people who want to use widely used survey/scale measure in a standard way. ICHOM, CDE on the NIH Portal, PROMIS, PhenX, widely use psychology/psychiatry/neurology scales, etc. might all be good examples of things that meet that rule of thumb.

Dima is working on this in the Psychiatry workgroup for psychiatry/psychology/neurology/neuropsychology scales and LOINC or SNOMED. It would probably be good to have a standard approach that goes beyond those psych use cases.

Christian_Reich · June 3, 2020, 4:41pm

Understood. We are working on a public local vocabulary (like a survey) management including mapping tool. Till then: Request things in the Forum. Doesn’t cost you anything.

mkwong · June 3, 2020, 5:40pm

I am told REDCap does have export capability. The last time I did anything like this was in 2017 and dumped a REDCap project out to CSV (I think). Presently, we are about to do a demonstration project involving exporting REDCap project data to a structured electronic format for me to then map to OMOP concepts and link survey findings with EHR records in veterinary medicine. I built an OMOP (v5.2) database adapted for veterinary EHR records. Regardless, this should work the same for human data via REDCap.

MeganBranda · June 3, 2020, 5:55pm

Hello, Redcap’s capabilities will be set at an institution level. If your institution has the API enabled then you get a token (or key code) specific to the redcap project, of which you there is a number of R packages (just google r package redcap and you will get a couple) that will export the data into a standard format for you to analyze and merge with any other data. If your institution does not have their API enabled for redcap then you are stuck doing it the old fashioned way… Logging in, selecting export, downloading the data. They do have standard code to go with the export to apply labels and formats.

sshah988 · June 3, 2020, 6:04pm

Thanks

mgurley · June 3, 2020, 6:13pm

I think when a REDCap project operates as a longitudinal registry, collecting manually chart abstracted data that possibly also incorporates pulls of data from EHRs/Claims data feeds, then it should NOT be treated as a “survey” but rather thought of as a source system that follows the normal conventions of vocabulary mapping and ETL. Using concept_type_id as a way of designating the provenance of the data being non-EHR or non-CLAIMS. That means needing to map every REDCap instrument, question and possible choice to standardized OMOP vocabulary values, and letting OMOP standard ETL practices dictate table destination. You will also need to make ad hoc decisions about populating dates, meaning deciding how the dates collected in the REDCap project relate to dates populating the OMOP clinical event tables. REDCap has a data dictionary format that exportable to a CSV.
I am working on a project that maintains mappings for a REDCap data dictionary outside of the REDCap data dictionary itself. You can partially manage the mappings to standardized vocabularies within the REDCap setup itself. By using the field_annotation capabilities of REDCap to designate a standardized vocabulary and using the choice values per REDCap data point as the standardized vocabulary native codes. I have Ruby script that allows for this to be curated and managed across updates of a REDCap data dictionary. It is almost ready to be shared, if there is any interest.

Andrew · June 3, 2020, 7:15pm

Mike we are interested. Thanks very much for your willingness to share.

FrankFox · June 3, 2020, 7:54pm

Thank you. That is all useful.
If I’m understanding you correctly @Christian_Reich, intimate familiarity with the source data is necessary to decide if a data-element is an OBSERVATION, MEASUREMENT, CONDITION, or PROCEDURE. The SURVEY_CONDUCT table is only a ‘master’ record for the survey instance.
I am only at proof-of-concept stage without RWD so I’ll should get by with using only the SURVEY_CONDUCT and OBSERVATION tables for now.

Yes, @Andrew it would make life easier if the most common surveys were already available. Some elements exist (e.g. HAQII) but only sporadically.

Vojtech_Huser · June 4, 2020, 3:43pm

So REDCap is used for a research study. Btw, a Clinical Study WG within OHDSI is dealing with all aspects pertaining to a study.
See this post here Registry data to OMOP CDM Work Group

Christian_Reich · June 5, 2020, 3:23am

Correct. As @mgurley said.

The question is what is the use case. If you want to do a nominal job and say “I did it, I put REDCap into OMOP” you are fine. But you probably want to use it for analytics. What kind of analytics? Only specialty analytics that makes sense with respect to the very data asset you are converting? Then an OMOP conversion is probably not necessary. Or you want to allow the data to play in the network with the OHDSI tool stack and methods. Then you better do the full job and create real Conditions, Drugs, Procedures, Devices or Visits.