OHDSI Home | Forums | Wiki | Github

Where to map binary (yes/no) results to in the OMOP CDM?

I am working on an ETL of ICU data gotten through a questionnaire. During my Rabbit in a Hat mapping I felt I needed some guidance as to where to put boolean (1 and 0 in the database) fields that are basically observations derived from measurements.
Some examples:

  1. When sepsis was suspected, did the patient have leukopenia? (Leukopenia is defined as having a leukocyte count less than 4*10^9/L)
  2. When sepsis was suspected, was the arterial pCO2 value lower than 32 mm Hg (4.3 kPa)?
  3. Did the patient get kidney replacement therapy in the first 24 hours of being admitted to the ICU?

Do these binary/boolean fields belong to the OBSERVATION table mapped to value_as_a_string or value_as_a_number? Or does the first one belong in the CONDITION_OCCURANCE table? Or do the first two maybe even belong the MEASUREMENT table? And do I map the third in PROCEDURE_OCCURENCE as procedure_source_value?

@DanPutt:

This is a problem we are discussing right now: What to do with the survey information, and where to put it.

You may have to create some more complex logic than the Rabbit supports: You need to combine from different fields and map things over. Surveys are not a one-to-one mapping job. Reason: What constitutes a question, and what constitutes an answer is arbitrary. In OMOP, we prefer to pre-coordinate these things so there is no ambiguity preventing us from standardizing the information.

This is mostly a vocabulary mapping problem. In essence, the question could be construed as:

  • “Leukopenia?” If the answer is yes you need to write the Condition Concept Leukopenia into CONDITION_OCCURRENCE. If no you write NOTHING.
  • “White blood cell count < 4B/L”? You write the Measurement Concept Leukocytes [#/volume] in Blood into measurement_concept_id, the operator < or > into operator_concept_id (depending whether the answer is yes or no), 4,000,000,000 into value_as_concept_id and per liter into unit_concept_id.
  • you do both

Same is true for the other ones.

What we are still trying to figure out is whether and how to represent the original questions and answers. People running surveys like to keep all the detail and relay it to the other clinical data to draw inferences. That makes sense, but it is utterly non-standard and cannot be done remotely without tacit knowledge of those details.

Please join us in the debate.

t