Clearer definitions + examples for the mandatory tables

rmfranken · November 7, 2024, 8:12am

Hi there,

I’ve been struggling for some weeks trying to understand and apply the OMOP CDM. Specifically, I’m trying write SPARQL queries against a graph database containing patient data, in order to populate tables for building the OMOP SQL tables.

In doing so, I have come across the table “Observation period”, which to me is not very clear what kind of data is expected to be in that table. The definition states:

Table Description: This table contains records which define spans of time during which two conditions are expected to hold: (i) Clinical Events that happened to the Person are recorded in the Event tables, and (ii) absence of records indicate such Events did not occur during this span of time.

I am trying to understand how one decides to “start” an observation. It’s clear to me that an observation should probably end when a patient dies (or maybe when they are healthy again?), and it should not start before a patient is born (or maybe 9 months before, in case of womb-related stuff? No idea), but is that a correct interpretation?

So my question: What is an observation period? The other columns in the table do not help much in trying to figure out what it means: as the only other meaningful property is period_type_concept_id which is allowed to be filled with an enormous list of possible concepts related to billing/costs. Does this mean an observation period MUST be linked to some kind of claim? So for a patient with lung cancer, the observation period is the moment they were first diagnosed until they were successfully treated/died, and all “events” related to that diagnosis (and accompanying insurance claim) should be part of that observation period?

I’m sorry if the question, or questions like it have been asked before - but for the mandatory tables, I would expect a few examples or clearer definitions.

Then:
What’s the difference between an observation and a Visit_occurence? They both occur during a given time period - so is an observation period consisting of multiple related visit_occurences?

We have in our graph DB a concept “Administrative case” (administrative artifact for billing according to national healthcare billing guidelines) and a “Healthcare encounter” (an interaction between an individual and a specific unit or service of a healthcare provider institute, e.g. emergency, intensive care unit, for the purpose of providing healthcare service(s) or assessing the health status of an individual) .

To me the administrative case seems close to an observation period, and the healthcare encounter to a visit_occurence. But I’m not sure, as the definitions leave quite a lot of space (on both sides).

I hope to gain some better insight in the definitions of both concepts, and preferably some examples of what does, and what does NOT constitute visit occurences and observation periods.

Kind regards,
Robin

Chris_Knoll · November 8, 2024, 3:06pm

The concept of the OBSERVATION_PERIOD is a common source of confusion but is one of the most important tables that exist in the CDM. This combination of confusing and important makes the challenges associated with it more severe, so you’re in good company to raise this question.

Let’s go through a hypothetical example that can illustrate the importance of this table:

Consider a patient P1 that engages with a healthcare provider, and the patient must subscribe in order to receive service. Let’s say this person subscribed between 2010 and 2014, but then moved to another country for 2 years (2015-2016). This person then returns and picks up a subscription from 2017-2021. As long as the person has the subscription, they can go to a doctor in the practice, and they’ll record conditions, drugs, procedures and visits associated to P1 in their OMOP CDM.

Question 1: Will this person have any record of conditions/drugs/procedures/visits between 2015-2016?
Answer: No. This person was out of the country and did not have a subscription so this person will not appear in the data for this period of time.

Question 2: If you’re going to do a observational research study that is going to include this person in it, which years would be valid to consider ‘at risk’ or ‘for follow up’ for observations of drugs/procedures/conditions?
Answer: Only 2010-2014 and 2017-2021 are the valid years which this person was ‘at risk’ of observing any of these clinical events in this system. This is when the person was subscribed.

Question 3: How does the CDM maintain this information about when people are ‘at risk’ of recording an observation of drug/procedure/condition?
Answer: The OBSERVATION_PERIOD table

For some types of source data, defining the observation period is straight forward. For example, claims data has enrollment (which acts like a subscription) so you will know a start date and end date for when a given person could have health data recorded for them. If they are not enrolled, the health care provider will not record things for you. In other types of sources, it’s harder. For example, consider a walk-in clinic with an isolated electronic-health-record database, where a person could visit that walk-in clinic or go to some other one. A given person could go to any number of different walk in clinics thus you don’t know if they were ‘at risk’ of being recorded in one system vs. another, so you have to make some choices about how to represent the OBSERVATION_PERIOD representation for this type of person. If a network of clinics could come together and share information about a person who could visit any of those other clinics, you might have more visibility of if the person was at risk of recording an observation of drugs/procedure/conditions in your data. Then you can build your observation period tables to represent that. But sometimes you just don’t know and you have to make difficult choices about what the observation period table represents.

So, I hope you can see the importance of the Observation Period table with respect to observation research: you shouldn’t make a statement that someone is at risk of developing a condition if you aren’t going to see that diagnosis appear in your patient data.

Putting the OMOP CDM aside, let’s consider your graph database that has patient-level data. Let’s say you are going to answer a research question with your data, such as the proportion of people with a condition in a given year. How do you know someone is present in a given year? If they have a visit, does that count? Does that require a person must present a visit in your data in order to exist? What about those people that are very healthy and go to doctor every 2-4 years? Are you possibly excluding healthy people from your denominator? What about if some of these people don’t exist during the year that you are trying to estimate proportion? Are you including people you shouldn’t?

So the idea of knowing ‘who is in the data’ is an important question, and in CDM world, the OMOP CDM’s Observation Period table does this. In your graph database, how are you doing this? If you can identify how you’d do it in your graph database approach, then you can probably map your graph-database-logic to OBSERVATION_PERIOD table records.

-Chris

rmfranken · November 13, 2024, 11:00am

Thank you Chris for your excellent and detailed answer. Especially the last 2 paragraphs are making me question indeed exactly this. I think the model I am using is very “visit centric” rather than “patient centric” - so indeed, someone only exists IF they are at a hospital - which I agree is problematic for studying people as a whole.

I will do some sparring with the people that modeled the graph database, and use your answer as fuel!

rmfranken · November 14, 2024, 7:54am

Actually - rereading your example. Is it fair to say there would be 2 records in the observation_period table? One record for 2010-2014 and another for 2017-2021?

In my source system, we only have inpatients (i.e., the ‘administrative case’ only covers the period where the patient stay at the hospital).
Observation period would be a collection of administrative cases (hospital+ambulatory), where ambulatory is missing. It seems that the closest thing to observation period would be to take the minimal/maximal date among all administrative cases for each patient, so to have a single observation period. But still this is not optimal as you don’t know if some person have left the country, or changed hospital, or left for ambulatory or other clinics for which we don’t have data. (i.e. the patient would be inaccurately present in the data for periods of “breaks” like 2015-2016 in your example)

Chris_Knoll · November 14, 2024, 1:55pm

That’s correct! There are 2 periods of ‘continuous observation’ across those two time windows, so you’d have a record representing each.

That’s an approach, and you’re right, you’re open to some error in assuming a person is in the data between 2 dates when it could be that they were not at risk of being recorded in the system of record (ie, they left the country, etc). It’s a trade off, and you have to just accept some level of uncertainty or figure out a way to know if a person was ‘at risk’ of having an event recorded.

While the min/max approach has been put into practice, I’ve thought that maybe a small padding of time around the min/max could be OK…thought experiment: you have the min date of 4/25…is there absolutely no way that the person might have went in on 4/24? Just a single day earlier? That’s reasonable to me…what about a week? a month? 6 months, a year? I think as you get further away from a visit, you have less certainty that they might have come into your system if they needed care. How much uncertainty you are willing to accept is what would push you towards ‘exactly the min visit’ vs. ‘6 months prior to min visit visit’ should begin the Observation Period. If there are other pieces of information like drug dispensings, prescription pickups etc, maybe you can incorporate that information as well.