Recommendations for questionnaires / surveys

The-Alchemist · December 13, 2023, 10:50pm

NOTE: I’m using the term survey and questionnaire interchangeably in the conversation below.

Not to beat a dead horse, but I’ve been reading about CDM v6 and questionnaires / surveys on these forums. We’re trying to “future proof”, to a certain extent, and would like to follow best practices ahead of time rather than fixing everything later.

LOINC seems to store each survey, question, and answer as a separate concept, and uses the concept_relationship table to link them. Panel contains and Has Answer are relationships used.

To use a concrete example, here are the links for the LOINC Kansas City Cardiomyopathy Questionnaire (KCCQ):

survey itself: Athena
one of the questions, “Bothered by fatigue over the past 2 weeks” : Athena
one of the answers, “Extremely bothersome”, Athena

We’re thinking of modeling our surveys, questions, and answers similarly.

We’ve also been reading a lot of info on the forums, Github, web, etc. about how surveys are done in CDM.

There’s some inconsistencies (is the table called survey or survey_conduct?), but that’s a minor issue from my perspective, as we can rename a table really easily. We’re more interested in recommendations on how to store questionnaire questions and answers.

A few questions:

Some people seem to recommend just using your own custom schema until v6 is finalized.
Since Survey Data in CDM · Issue #90 · OHDSI/CommonDataModel · GitHub has been merged, is it safe to assume that the general schema will remain the same, and can be safely used for internal purposes? We’d prefer doing this, instead of reinventing the wheel.
Are the relationships that LOINC uses, (i.e., Has Answer, Panel contains) canon? Should we continue doing this?
Should responses to surveys go in the response table, measurement, observation, or condition? Or a combination of them? Or it depends on the question answered??
We want to perform statistical operations such as histograms and averages for survey responses. For example, “Degree of difficulty experienced due to your hip while descending stairs in the last week [HOOS]” has a range from None, Mild, Moderate, Severe, Extreme. We’d map these to 0 - 4 to get an average. What’s the best way to map these categories to numbers? concept_relationship with a custom relationship?

Anyway, any recommendations would be greatly appreciated, thank you!

Christian_Reich · December 26, 2023, 10:10pm

Thanks @The-Alchemist. Sounds like it’s time to make some gold out of dirt. For all folks involved in surveys and questionnaires to sit down and get this sorted. You may want to apply a top-down approach using USE CASES.

Looks like you have one already! Keep it coming. Make a list. It will guide you to make the right decisions.

Let me try answering a few, and rephrase or pose others:

You are free to create a new standard. While it is not yet in the CDM, it can be an Expansion to it, gathering experience.
I would not assume anything is settled. You guys have to come together and do that.
Ditto.
Ditto. You may even want to consider a new table. SURVEY_CONDUCT isn’t for the content, it is for metadata so the data can be queried the right way.
This is an analytic. It needs no modelling or mapping. Unless you think you want to standardize all answers that are degrees of something to a numerical value - then make that part of your Expansion.

You also need to ask yourselves:

Which surveys are we going to standardize and put into the vocabulary? What are the rules for that? What are the mechanisms for that?
Are we going to have standard answers? In other words, are we going to have one “Yes”, or 500 “Yesses”, one for each question that can be answered that way?
Surveys are EAV, which means they have questions and answers, or variables/attributes and values. Except for the rather undefined OBSERVATION table, and the very narrowly defined MEASUREMENT table, in the OMOP CDM we have pre-coordinated facts (“Extreme difficulty descending stairs in the last week”). That is how all our tools and methods work, and there really is no need to keeping them separate. So, what are you going to do about that? Are you going to pre-coordinate question/answers, map them and put them into the actual clinical entity tables, so they can be picked up?

Happy to help.

Sanjay_Udoshi · January 3, 2024, 1:26am

For those familiar with REDCAP, here is an exported list of supported “standard” Questionnaires/Surveys.
ie-re: Use Cases - It’s a bigger problem than I realized.

Comprehensive Clinical Surveys DB.xls (1.5 MB)

Vojtech_Huser · January 5, 2024, 7:34pm

AllOfUs has surveys and one can also look at all the representation choices they have made. For example on Q7: Only one Yes and they used LOINC code for that. (but in vocab, there are also examples of not doing it (see conceptID 1332792)

Some parts of my paper here are relevant for that: Table - PMC

See this file
https://github.com/lhncbc/CRI/raw/master/AoU/CDE/Registered_Tier/S2_AOU_CRF_values.xlsx

Here is their approach:

their study design tried to re-use existing questions but they extended where truly needed (PPI vocabulary) (no research should skip this smart stage 1 and try to re-use non-copyrighted questions as much as possible and AoU did it really well)
if element was fully captured by existing vocabulary, they used it, if not they created vocabulary entries for that
they were able to store everything using existing core OMOP tables but they put necessary infrastructure into the terminology layer - e.g., CRF name and CRF sections were represented in vocabulary. See Athena for example

Here are the relationships in the PPI vocab (has PPI parent code)
(is the CRF level and it has children for each section)

sections of CRF here

questions in section Mood