OHDSI Home | Forums | Wiki | Github

Scalable logic to create custom visit_ids

Hello Everyone,

Our source_visit_ids are bit messed up as it was extracted from two different hospital systems and had some tech issues which resulted in lack of unique identifier. So, we would like to create our own visit_occurrence_ids to avoid all this confusion and keep it clean.

If it’s only for visit_occurrence table, I can just populate a normal sequence number for visit_occurrence_ids,

But the challenge is when I have to identify the corresponding lab, drug info for each visit?

may I check how does one link that newly generated visit_occurrence_ids to existing labs, drugs and diganosis info.

If I go by date interval, I have to check one row against multiple rows of that patient and then assign the newly created visit_occurrence_id to identified row (in lab/drug/diag table). Don’t know this appraoch feels inefficient and may/may not work for big data.

Or infact, do you compare one row against all others rows of the patient and assign the visit_id? (by making use of your computational resources)

I did read multiple OMOP implementation papers but looks like not many went through this challenge.

Any advice or tips from CDM implementers who had similar scenario and what was the logic that you followed (which we can try to replicate).


You might consider the following:

visit_occurrence_id in my implementation is an AUTO_INCREMENT integer ID - so new rows are automatically assigned a unique value. The source visit identifier regardless of where it comes from goes into the source_value field if you need a way to run queries against the original EHR source.

In your case - if there are overlapping numbers from the two source institutions - pre-pend a site id (or assign one) to the source visit id - something like value_source = “.”

Hi @mkwong ,

Once you generate the Auto-increment id in visit_occurrence table, how would you use them other tables like measurement, drug_exposure, condition_occurrence etc.

How would you identify corresponding lab, drugs and diagnosis for specific visit_occurrence_id?

ex: I have created a auto_increment id. Now if I want to identify all the lab/drug/diagnosis associated with visit_occurrence_id = 1, how do I identify them (efficiently)?

The other tables have person_id and visit_occurrence_id. Using both identifies the patient and the visit in which the measurement, observation, procedure, drug_exposure, and device_exposure was documented. For example a patient imported would generate an auto_increment person_id and entry in the person table. Each visit or encounter the patient had with the hospital would produce a single visit_occurrence and therefore visit_occurrence_id tied to the admission and discharge dates (per visit_occurrence table fields). Any documentation or data captured during that visit/encounter would use both the person_id and visit_occurrence_id - thus linking a group of measurements, procedures, observations, conditions, etc to a patient and a visit/encounter.

Basically your ETL program will have keep track of both patient and visit/encounter as it is building and populating the measurements, observations. conditions, etc tables.


When you mean visit_occurrence_id, do you mean the auto-generated id that we created?

let’s say for example

I have imported a visit_record

person_id = 1
visit_occurrence_id = 1
visit_start_date =  21/02/2019
visit_end_date = 27/02/2019
visit_type = Inpatient

Now we go to the diagnosis table (raw diagnosis data has only below info)

person_id = 1
condition_start_date = 22/02/2019
condition_end_date = 22/03/2019
condition_source_value = 'Unspecificed DM'

Would it be posible for you to show an example/query on how you did this?

If I have to rely on dates to join between these two tables, I am afraid it may not be efficient. YOu can see that my raw diagnosis data doesn’t have visit_information

Additionally, in our raw data, I see the below scenario where one person has multiple records for the same visit_id. Since, we don’t have any other information like case_number etc, we only have visit and person_id info in the raw visit table. For example,

person_id = 21
visit_id = 101
visit_start_date = 2014-10-29 09:44:00
visit_end_date = 2014-10-29 09:44:00

person_id = 21
visit_id = 101 #let’s take only this record of this person has lab/drug/diag info
visit_start_date = 2014-10-29 10:30:00
visit_end_date = 2014-10-29 10:39:00

you can see how we have same person and same visit id but different time intervals. If I group them just by person and visit_id, I will get duplicate ids. But if I group them using all three person, visit and time information, I get unique VISIT_ids. But when I go to the diagnosis/labs/drugs table, I may not be able to find the right visit which has lab/drug/diag info (because lab/drug/diag table has different date information)


Sure - the ETL that brings in the raw visit information (second block) would match the condition_start_date to the best visit_occurrence record - in this case visit_occurrence_id = 1 (person_id is already matched). Assuming subsequent visit_occurrence records do not overlap with the condition_end_date.

So you end up with a condition entry of:
condition_occurrence.person_id = 1
condition_occurrence.visit_occurrence_id = 1

Given the data as described, the only way I can think of doing this is by matching dates and in fact how I link different data sources such as invoice transactions and lab results to a person and visit. Yes - there will be overlaps and situations where you have to choose among two different visits, but these are few and I use the strategy of earliest match (time distance basically) to assign data to visits.

For what seems to be overlapping visits - I’m inclined to merge the two taking the earliest date as the start and latest as the end unless there is more information from clinicians or folks responsible for entering this information. It often comes down to 2 different systems and what the person entering the data understood as the patient’s encounter dates (admit and discharge) - example of someone coming through the ED - which could be a separate EHR system, then get admitted to the hospital (enter data into the hospital’s EHR system), and then discharged - creating two overlapping encounters.

I’d like to hear from others regarding how they dealt with these issues.