OHDSI Home | Forums | Wiki | Github

Help with some REDCap to OMOP logic

Hello everyone,

I have created a web app that compares form survey questions/answers/custom text to standard SNOMED terms (the vocabulary can be expanded, just keeping it SNOMED for simplicity at the moment). This was developed to go from REDCap data to the OMOP CDM. Now, there’s an interface to submit the REDCap data dictionary and it returns the top 5 similar standard terms. Once someone goes in and maps all these survey questions/answers to a standard SNOMED concept they can export it to a CSV/XLSX.

Once we have this mapping CSV/XLSX, we can import this file into the desktop companion app, perform the ETL, and spit out some SQL files or just directly write to a database if we wanted to. The reason we made it a desktop was to avoid the many security implications being on the public web since at this point we will need to obtain the actual patient answers (PHI). I’m at the point where I’m doing the ETL logic for this and I have come across some rather difficult challenges. The first being the ‘person’ table requires the race and gender. I have added a field in the desktop app where the user can specify the key name for where to get this data for the person. However, the values for this data can vary wildly in REDCap. We aimed to make this as flexible as possible and not just for our use case. This presents a challenge that requires yet another stage of mapping unless I have all this mapped data at once. Is this the usual case? Do you map all of your data at once and then perform the ETL?

Any sort of guidance or tips would be greatly appreciated.

Thanks all!


Looks like a nice solution to a chronic problem.

However, when you map questions and answers, do you map them separately? Or in conjunction? For most domains, we need pre-coordinated concepts, except in Observation and Measurement (only in a limited way since value_as_concept_id shouldn’t take any random concept). Are you doing this pre-coordination in your tool?

What is the problem? You need to map, or create a de-novo race or gender concept.

The questions and answers are parsed and split into their own concepts and mapped individually. This is the first step of the process, but at the moment you can import any of your REDCap data dictionaries and map each dictionary individually.

The issue is that I didn’t necessarily require any mappings to take place. Instead I was under the assumption that the data could be partially mapped and somehow converted into the OMOP CDM in chunks. This does not appear to be the case. So in my web tool, it’s pretty open ended and you can map any REDCap data dictionary to a standard SNOMED concepts and in any order. You can then export these mappings (which just adds a JSON object to the field_annotation column in the REDCap data dictionary you are mapping)

When it comes to the desktop tool, you import one of these “mapped data dictionary” files which may or may not have the mappings like “birthdate” or “gender” or “race”, yet.

Hopefully that makes sense. Thanks again!


I understand, but that is the crux. You don’t want to just create a concept for whatever folks created in REDCap. Because it is not analyzable outside the institution owning the REDCap data. You could do that, but why? Analyze the data in REDCap and you are good.

The reason the question-answer pairs are so toxic is that there is no logic of what goes into the question, and what goes into the answer. You could have:

Q “High blood pressure” - A “Yes”
Q “Blood pressure” - A “High”
Q “High measurement value” - A “Blood pressure”.

for the Analyst, this is horrid. Also, it violates the rule that each fact should be represented by one and only one concept. And finally, this is a Condition. Conditions must be fully pre-coordinated.

So, if you want to do a fair ETL job, you have to take on the problem of concept mapping properly, I am afraid.

@Christian_Reich Interesting. Thanks for the info.

So would you say that I need to take a step back and ensure that all the data gets mapped first before doing any sort of ETL? I’m not sure how to best require that from a programming standpoint. Perhaps, that is something I would leave up to the user? It just seems like if things weren’t mapped properly then it could pretty much break the ETL process.

I could do a check to see if the birthdate, gender, race, etc… concepts are mapped and exist in the mapping data. I’m not entirely sure. Any more thoughts?

Edit: Actually, I think this is what I can do to alleviate this issue. Right before the ETL process, I can add a check for all the minimum required mappings like (gender, birthdates, race, etc…) to see if those concept ids exist like 8507 for Male and 8532 for Female etc…at least to get the 2 mandatory tables (person and observation_period) populated. What do you think of that?

Thanks again!


To help illustrate my point, here is what I am thinking. You can add multiple files that contain the mapped data (on the right). On the left, there exists a checklist of the minimum requirements for the person table and observation_period table. I haven’t quite figured out how to handle the birth year and observation dates best.

Well, you are building a GUI-based ETL tool. Prepare for a rough and long ride. :slight_smile:

Yes, you first have to map, and then based on the resulting standard concept you can decide where to put the record and how to treat it. Folks often create a large staging table, before they distribute records to the various clinical fact tables in OMOP (CONDITION_OCCURRENCE, OBSERVATION, etc.)

Thanks for the info. Typically, I struggle most with the frontend. However, with this project I am struggling most with the all the backend and ETL logic.