OHDSI Home | Forums | Wiki | Github

Observation table issue

Hi, all

I’m Jae Hyeong Cho, Ajou university School of Medicine

I’m converting the National Health Insurance Service-National Sample Cohort (NHIS-NSC) data to the OMOP CDM v5.0.1.
NHIS-NSC contains data from biennial health examination, including previous medical history, family history, alcohol, exercise and smoking history from self-report.
But, I’m not sure how to make observation table.

If we have answer of the patient from the question: ‘Do you have a previous medical history of stroke?’, how can I put this answer into observation table?

CONCEPT_ID : 4077982 CONCEPT_NAME : History of cerebrovascular accident DOMAIN_ID : Observation VOCABULARY_ID : SNOMED OBSERVATION_CONCEPT_ID : 4077982
  1. Value_as_string: ‘Yes’ or ‘No’
  2. Value_as_string : ‘Y’ or ‘N’
  3. Value_as_string : 0 or 1
  4. Value_as_concept_id : 45877994 (concept_name: ‘Yes’, domain_id: Meas Value, vocabulary_id: LOINC)
    Value_as_concept_id : 45878245 (concept_name: ‘No’, domain_id: Meas Value, vocabulary_id: LOINC)

Among these, which one would be the most suitable answer?


Welcome to the family.

You write a Observation record as following:

observation_id - that’s a running record number. you can start with 0 or whatever you want
person_id - your patient ID
observation_concept_id - 4077982 (the SNOMED concept you found)
observation_date - the date.
observation_type_concept_id - you pick that from the available Type Concepts. If there is none that fits, let us know and we add:

45905771 Observation Recorded from a Survey
38000276 Problem list from EHR
38000277 Lab observation numeric result
38000278 Lab observation text
38000279 Lab observation concept code result
38000280 Observation recorded from EHR - sounds like this is the right one
38000281 Observation recorded from EHR with text result
38000282 Chief complaint
43542355 Referral Record
44786633 HRA Observation Numeric Result
44786634 HRA Observation Text
44814721 Patient reported

value_as_number - empty
value_as_string - empty, unless it was a questionnaire, and you want to record what he patient said exactly.
value_as_concept_id - empty. You don’t have to encode “Yes”, because it is implicit
qualifier_concept_id - empty
unit_concept_id - empty
provider_id - the doctor who did the observation
visit_occurrence_id - the visit, during which this observation was taken
observation_source_value - empty
observation_source_concept_id - if it is coded in your data, put the code in
unit_source_value - empty
qualifier_source_value - empty

Let us know.

Looping in @SCYou and @Rijnbeek because they discussed this in today’s PLP meeting.

Would it not be more consistent to use ‘History of clinical finding in subject’ (4214956), which is what is most often used in the vocab itself to encode history? So in that case:

observation_concept_id = 4214956 (History of clinical finding in subject)
value_as_concept_id = 4310996 (Ischemic stroke)

Similarly, you can encode:

  • Past history of procedure (4215685)
  • No history of procedure (4166732)
  • Family history of clinical finding (4167217)
  • No family history of (4051104)

I also found:

14732006 “No history of” this looks like a general one you could use for procedures, conditions etc?

There are also some more specific standard codes (probably limited?):
160254003 “No history of cardiovascular system disease”

@Christian_Reich @Rijnbeek @schuemie Thanks for response!

I’ve already found the concept_id such as 4050816 (FH: Hypertension) and 4053372 (No FH: Hypertension). Actually, I can convert almost every answer in the questionnaire which is composed of 'yes or ‘no’.

But I cannot find the concept_id for the question: ‘Do you have family history of liver disease?’
I found the only concept_id for the answer of yes (4144266, ‘FH: Liver disease’). I cannot find the concept_id for the answer of no. So I will make observation for only those answered ‘yes’ in regard to this question. In other words, the information for ‘no’ will be lost in CDM.

In raw data of Korean National health insurance data, there are a lot more (actually, I think it’s too many) questions and answers than in sample cohort data. We can think about the standard way to put answer ‘yes’ or ‘no’ of health questionnaire to observation table.

Furthermore, NHIS database contains information for the quantity or frequency in economic status (from tax), smoking, exercise, and alcohol drinking (from questionnaire). I just hope we can use this information in OHDSI tools in the future. And I think we need to standardize conversion process for this important health information.


You are totally right. That’s the way to do it.

This is tricky, guys. Generally, we don’t record negative data. Data are either positive, or just not there. You don’t say a Patient didn’t have XYZ.

You could put them in there as survey questionnaires. You can then answer them with Yes (4188539) and No (4188540). But we would have to add the survey questions to the vocabularies. Can you provide them?

Would be interesting to see those. Can you show us what they look like?

@Christian_Reich I put the explanation for the columns in NHIS DB (original explanation is Korean)


That’s for smoking and excercize. We actually have an active subgroup to model them in a consistent fashion. I wouldn’t put them in a survey. Plus, the patients never tell the truth about smoking and alcohol. :smile:

What about the history questions?

Thanks for response, @Christian_Reich !
In the sample cohort of NHIS, the columns of medical history and relevant concept_id are:


If this all there is, then I would not put it in as a survey. I would just encode those with a Yes, and leave the Nos alone. Think about it: All of us filled out those questionnaires sitting at the doctor’s office. Most of the time the doctor never looks at it, and most patients don’t really know what “family history” means, what the various family members were suffering from, where the “family” ends (is your great cousin twice removed still family?) etc. So, the No answers are probably not so useful.

Bottom line: Use

					observation_concept_id	value_as_concept
Previous history of stroke		4214956			381316
Previous history of heart disease	4214956			321588
Previous history of HTN			4214956			316866
Previous history of DM			4214956			201820
Previous history of pulTBC		4214956			253954
Family history of liver disease		4167217			194984
Family history of HTN			4167217			316866
Family history of stroke		4167217			381316
Family history of heart disease		4167217			321588
Family history of DM			4167217			201820
Family history of cancer		4167217			443392

Hello Everyone,

Have few questions about observation table. This thread did answer few but I am just listing down my questions to kind of make sure that my understanding is right?

  1. Observation_event_id and obs_event_field_concept_id - What type of values go into this field. Any examples please? I am not able to make sense of this field.

1)Observation_concept_id - concept id for question

  1. Value_as_concept_id - concept id for answer

  2. Observation_source_value - could be empty but I decided to store variable names (I mean column names in Excel which indicate the questions) (ex: We would like to store this for easy retrieval of records… Is there any other field in this table that is more suitable for this?)

  3. Value_as_string - response term stored as string value (ex: Single, Married, Divorced, 195, 86 etc). responses to all survey questions goes here. Ours was a paper based survey/questionnaire

  4. value_as_number - Option number (ex: 1. Yes, 2. No, etc, so this will field will have only 1 and 2)

  5. Observation_source_concept_id - ??? What can be the value for this? It’s all patient responses to survey questions. Should it just be ‘0’

  6. unit source_value and unit_concept_id - are only applicable for questions like weight, height etc.

  7. Qualifier value - What does this field signify? What is the use of this? What value goes into this?