OHDSI Home | Forums | Wiki | Github

Extending OMOP For Surveillance Data

Hi there,

I’m extending OMOP model to cater for surveillance data, by creating new tables to accommodate the questions from the questionnaires. Is this the right way to go?

Have you considered using the survey_conduct table? I’m using this to manage and save survey responses.

Creating new tables to accommodate your needs is a valid approach if the survey_conduct table is not adequate. However, you should connect with other users that record questionnaire information to see if you can have a consensus on how to store the information. Then if there is a large enough need in the community for keeping questionnaire information and there is consensus on tables, documentation and implementation rules. It can be presented to the CDM Working Group.

I’m new to OMOP and we are faced with similar situation with hundreds of questions and thousands of responses, all of which are “Y”, “N”, or “NA”
I’m considering using some of the standard Y/N responses I’ve found in some of the , but not sure I can mix and match a custom concept with some of these responses.
My alternative is to create custom concepts for the questions, but 1 set of custom standard Y/N responses which I can then map to the question. Saves me from having to create a whole of responses.
Any guidance is appreciated!

@W0lfgang None of the OHDSI analytic programs are going to know anything about the custom concepts you will need to create to correspond to questions. If you use a CDM table such as Observation to store the question answer pairs, those records will be mostly ignored. However, the Data Quality Dashboard and Achilles may flag these records as erroneous for not being in the correct domain or missing foreign keys depending on your implementation. As for the response concepts, use existing concepts for Yes, No and NA mixing these with your custom concepts is fine.

Thanks @DTorok. If I understand that correctly, I can create a new custom concept for each question, but use existing concepts for the Y,N,NA answers.

As @DTorok said - using custom concepts have implications regarding standard OHDSI tools. If at all possible, review your questions and answers so they map to standard concepts, not just using standard concepts for the answers. For example in a splenic tumor survey the question:

Q5:Final histopathologic diagnosis of Splenic Hematoma? Respond with “1” for yes or “0” for no.

A “1” response (or yes) maps to the OMOP concept_id = 4094979 (SNOMED Splenic hematoma). This goes into the condition_occurrence table. If the response is “0” or no - nothing is added to the OMOP database.

The survey_conduct is also used to track the same question and source value answer. The date/time fields of the survey_conduct also allows me to report survey completeness metrics - how long did it take the user to complete the survey.

Hey - Could you please tell me more about this?

Hey @mkwong. Thanks for the detailed response. One more thing, Will i have to add more columns on Observation table -possibly to link observation table with the survey_conduct table?

In my implementation, no - both entries in the observation/condition via person_id, visit_occurrence_id (outpatient type visit or equivalent), and date/time should match the survey_conduct record. I often use the observation_source_value (extended to VARCHAR(255) or more) to also provide clues where the data came from as well. While ATLAS and other standard tools won’t know what to do with source_value, I find it helpful when debugging and doing reporting to have information available.

When someone takes a survey - entries are created in person, visit_occurrence, care_site, location, survey_conduct, observation, condition_occurrence, measurement, etc. Basically create a complete record set for each survey taker. If you use a unique ID - multiple surveys (ex daily patient reported symptoms/observations/outcomes in Long COVID-19 monitoring via a SMS Text based survey/data collection solution) - single person record, multiple visit_occurrence and so forth.

Again - to maintain compatibility with existing OHDSI tools - don’t add new fields if you don’t really need to. Anytime you add tables and fields to standard tables - things break or you eliminate the value of standard tools.

@mkwong Adding a column to an existing CDM table should not “break” any of the standard tools. Do you have an example where this happen. Thanks.

Sorry - didn’t mean it will break tools, it just won’t be recognized or used in cohort discovery or other analysis.

This is very similar to what we have been discussing in the Registry WG, where we looked at the mapping of UKBiobank data.

@Cjohn @W0lfgang Could you give some more context? A lot of ETL design choices depend on the use case.
e.g. do you want to use Atlas for study definition? Are you combining this with EHR data? Do you want to collaborate in cross-institutional studies using OMOP?

Hi @MaximMoinat Think of it this way - A questionnaire where the responded responds with a Yes, No Or not applicable( if the question does not apply). eg Do you take three meals a day?
End goal is to map at least 80% into omop for standard and quality analysis. In realistic, most(If not all) questions do not map straight to OMOP. What would be ideal solution for this? Should I drop OMOP and use a different model? If yes, what will it be? Or how do i extend to accommodate the variables?

Hi @MaximMoinat , our use case is very similar to @Cjohn. I don’t anticipate that most of our questions are in the existing OMOP model, hence the need for custom questions. We will be using OMOP to also capture EHR data and wanted to use the same model to potentially combine the survey data to the EHR data. At this point, data usage and consumption
would be using user written analysis tools and code (R, Jupyter). There is also a future desire to collaborate with cross institutional studies which is why we want an open industry standard like OMOP.


A lot of good questions and answers here. But surveys like that (“do you take 3 meals a day?” - “no”) are ultimately not OMOPable, even if you can somehow squish it into some table. The reasons are:

  • OMOP follows the closed world assumption - all records are fully normalized to a reference table (containing anything that can happen to you), and if something happened you have a record, and if not you don’t. Surveys don’t work that way. They often have a closed set of answers, but these are not in themselves referenceable (what does “no” mean?)
  • OMOP usually records clinical entities, i.e. information about your disease. Not eating 3 meals a day is not pathological. And we have even problems coming to consensus on things like smoking.
  • The OMOP model assumes the ability to separate data from analysis. In other words, you should be able to ask a question to the data without knowing its content. Surveys sometimes are standardized and public, but mostly they are private, and therefore completely obscure to an outside query.
  • You could do the job and convert question-answer pairs to proper OMOP concepts and place them into the appropriate tables as @mkwong suggests. But that is a huge amount of work with some questionable value, and hence hardly anybody does it. For example, what is the condition_start_date for the survey question “Final histopathologic diagnosis of splenic hematoma”?

I know I might derail @MaximMoinat’s registry WG, but we should seriously think about how to handle all this. But I doubt we will come up with a satisfactory answer, other than “put the survey into a table of your choice and OMOP the EHR”.

1 Like

Hi All,

I am the analytic lead for a new NCI cancer prevention study (https://www.cancer.gov/connect-prevention-study/#:~:text=The%20Connect%20for%20Cancer%20Prevention,and%20how%20to%20prevent%20it., Connect for Cancer Prevention Study - NCI). We have created 5 questionnaires with about 6000 variables/responses documented here, https://github.com/Analyticsphere/ConnectMasterAndSurveyCombinedDataDictionary. We are interested in using EHR, registry, GWAS, and imaging data in combination with survey data to answer questions on cancer etiology, early cancer detection and treatment.

We were not able to harmonize to other questionnaires at the outset of questionnaire creation as most questions are original or heavily modified from previous questionnaires to meet the needs of our researchers and due to a lack of staff (I was the only member of my team for the first two years of questionnaire and dictionary development).

We would like to see our questionnaire harmonized to other studies using questionaries including All of Us and have started internal discussions of this process within NCI

Given the limitations Christian listed, but the desire to be able to use questionnaire data with EHR data and other observational data, is it time for a Questionnaire working group? And yes, I am volunteering.


1 Like


Sounds like you have a heavy lift, there. Harmonizing questionnaires for the sake of doing that is very hard - you can never make the decision whether something is equivalent unless you know the scientific use case - how it is supposed to address a scientific question.

Do you have those needs? Do you know what their use cases are?