OHDSI Home | Forums | Wiki | Github

Observation_Period table - how do you generate this table at your site?

Hi, @Gowtham_Rao,
How observation_period is populated may depend on the design choices of the ETL (hence why you might consider it ambiguous?) but the purpose of the table is very specific: it defines the time intervals that a person is considered under direct observation of a healthcare provider. In claims database terms, that would be their enrollment. In hospital systems, it might only be when the person is inside their walls; when they enter they begin observation, when they leave, they stop. The hospital doesn’t know what happens to the person between visits in this example.

Periods don’t overlap because if you want to associate a patient event to an observation period, you only want to get the single observation period associated with that event. If you had overlaps, then you could have one event associated to multiple periods, and that would complicate some analytics in the OHDSI tool stack.

That’s true, but from an insurance provider, how are they supposed to track care if no is telling them to pay for it? No one proposes that we don’t use the clinical data for a person when uninsured, but the question to you is: how does anyone get this data?

That’s a rule that the particular EHR system has adopted as the specification of ‘a person is being observed’. Another system may decide the observation period for a person is the earliest known date of a person to the latest known event of a person (or maybe the current date if they know the person isn’t dead).

The importance of the OBSERVATION_PERIOD table is that you can not say that a person has not been diagnosed with X or not exposed to drug Y in the past 5 years if you can not assert that the person was under continuous observation for that time period (the events could have happened in the ‘gaps’). For claims, some have adopted the idea that if they are paying for coverage, they’ll ask the insurance provider to pay for their meds and visits to a doctor. So that’s where you might be confusing the payer plan period contributing to building an observation period. But in other systems, they could just flat out tell you that person P1 as an entry date of StartX and an exit date of EndX, and that would be your observation period in that CDM.

Consider this: let’s say you’re in the US paying for coverage, and you have been for 7 years. Then you decide to go to another country where they aren’t going honor your US coverage. So, you cancel it and start off another plan in the foreign lands. After 3 years, you return to the us and pick up your old US plan again. You have a 3 year gap in there. You got medical coverage from somewhere, but how’s the US system supposed to know about it? So, you’re just a person with a 3 year gap in their observation period. that’s all this table is built to serve.

Thank you @Chris_Knoll for the thoughtful response. That helps reduce the ambiguity.

Would it be somewhat accurate then to say - the observation period is mostly to qualify missing information. If there is significant missing information period - then we cannot assume we have complete longitudinal information on the person.

In the presence of significant missing - the answer to question : does the person have diabetes mellitus in past 5 years - is ‘dont know’ because missing data. If there is no significant missing period in past five years and the person did not have care for condition diabetes – then the person most probably does not have diagnosed diabetes mellitus .

I.e disease present – yes, no or unknown.

No problem, @Gowtham_Rao . It would be accurate to say that an OP is to qualify missing information, or you could look at it as qualifying where information is present. Either way is fine.

To your example about T2DM: do you need to have 5 years of continuous observation to determine if a person had Diabetes in prior history? No, because you might find a record a year prior, and so they have it. Requiring 5 years wasn’t really necessary here. In fact, gaps in your observations wouldn’t matter either if you found a diagnosis somewhere in your fragmented history. However, if you are going to assert that they did NOT have the disease for the past 5 years, in this case, you do need to to know that you have information for 5 years in order to determine that they didn’t have something. Subtle, yes? Not sure if that’s addressing your comments, so apologies if I’m not following you. But hopefully you have a better appreciation for the observation period table now :smile:


1 Like


You are correct. The OBERSATION_PERIOD is more important to determine whether missing records are due to the fact that nothing happened, or due to the fact that nobody was recording, but the patient might have been super busy in the system. You need that for things like wash-out periods (no drug in a given time) as in @Chris_Knoll’s example and incidence/prevalence calculations. If you were only interested in information present you wouldn’t need the table.

Note that the present (record there if something happens) and absent (no record there if nothing happened) axioms are rarely explicitly defined in observational research, but still expected to be true in an unspoken way. We get into those debates all the time whether it is ok or not to “throw away” data that are outsice the OBSERVATION_PERIOD. The answer is a very strong YES.

Thank you both

In our organization we are interested in quality of care delivered - care given vs care not given. The numerator portion of Quality measures definition goes something like - was X done in past Y time e.g was HbA1c tested in past three months, or colonoscopy performed in past ten years.

When checking the data - the answer could be

– no - because of missing information, or
– no - because of missing care.

If it is a no (absence of something) then we have to act on it. If it is a no because of missing care - then we have to work with our health plan members and their care providers. (quality of care challenge)
If it is a no because of missing information - then that’s a technology or documentation discussion (data challenge).

So, in this quality measure use case – observation period is useful to go down the path of either missing care or missing data.

Thank you

Only thing I would say to the above is around this point:

I’d say that you can’t say ‘no because of missing data’ because you can’t assert that something didn’t happen without knowing that you were definitely observing them. So shouldn’t that be ‘unknown – because of gaps in observation’?

I don’t want to take this thread off topic; this does sounds like you could be using the OBSERVATION_PERIOD to identify periods of ‘missing information’, is that what you are going in this case? And if so, how are you defining these observation periods for your patients in your EHR?

Agree. Unknown is more accurate.

I think this topic deserves more discussion. In addition to the community sharing best practices for how to populate the observation_period table, it might be good to work toward including a more formal representation in the CDM of the assumptions made about data completeness. Codes mapped to a taxonomy of justifications for confidence in data completeness within observation periods could inform interpretation of analytic results. Codes associated with a low likelihood of completeness could trigger flags that signal the need for cautious interpretation.

For example, EHR data might support relatively high confidence in completeness within brief observation periods during hospital stays or in the ED, low confidence for multiyear periods within primary care clinics. and somewhere in between for months-long periods within specialty care settings.

For claims data, periods of complete insurance loss might be the end of a spectrum of coverage rather than a binary indicator of expected completeness.

If we can encode our knowledge about our data’s risk of incompleteness, analytic routines could inform the interpretation of analytic results in ways that might be especially for users with less intimate knowledge of the source data. A rough taxonomy of expected completeness might be useful and a very precise one might be hard to create.

1 Like

@Andrew et al:

Time for a proposal in the CDM WG, don’t you think?

Only caution I will put out (and have done so in many Forum postings): The default of an OBSERVATION_PERIOD record should be “data should be expected at high likelihood”. If we create low-likelihood Observation Periods, we will kill the CDM, because then every single record has to be joined to the OBSERVATION_PERIOD in order to look for your flags of confidence. The performance of any quantitative query (e.g. with a denominator, like incidence) will go down the toilet. So, invent something neat and backwards compatible! :smile:

@Christian_Reich There’s a chance that a slightly better defined version of this idea will get discussed and improved on Michael Kahn’s data quality call. If anything emerges from that discussion that seems worth the CDM WG’s time I’ll be sure to put in a proposal. Your advice on how linking to flags could affect performance is very helpful.


Absolutely. Ping as many good folks as you can. The other ones are here and here.

I don’t know what the solution would be, but there are folks who are thinking about many locations (places patients used to live in the past) without destroying the current standard with one location per person. Maybe we can steal from there.

I have a question regarding observation_period and end dates in condition_occurrence, device_exposure and drug_exposure. When we cut records using observation periods, should we check only start dates? What if end_date is outside of observation_period?


Right now, both dates have to be inside. But this is in heavy debate in the THEMIS group. We will get some answer soon.

I would like to clarify, can we cut end_date using observation_period_end_date?
Or right now we should just drop record if start date is inside but end_date is outside of period?

I know this will call anxiety: but the answer is DROP. We only take data that fits the Standard Model. It’s not an attic for all data that might be good for some questions.

Hope, you, guys can help.
We have an EHR data and we are concerned about the gaps.
Do we have to stop the current observation period and create new one when there’s significant gap (30, 60, maybe more days) between the events?
Note, we don’t have enrollement period specified, so we calculate the observation periods just by the event dates.

The conventions state, “records which uniquely define the spans of time for which a Person is at-risk to have clinical events recorded within the source systems, even if no events in fact are recorded (healthy patient with no healthcare interactions)”.

Colorado uses the first clinical event start_date or start_datetime from one of the following tables: Condition, Measurement, Visit, Drug, Procedure, Observation, or Device table as the OBSERVATION_PERIOD.observation_period_start_date. The OBSERVATION_PERIOD.observation_period_end_date = the last clinical event end_date or end_datetime from one of the following tables: Condition, Measurement, Visit, Drug, Procedure, Observation, Device, or Death table. We believe our patient’s are “at risk” to have an event during this time period. HOWEVER, our EHR data isn’t HMO data, so our patients do visit Providers outside our system.

I’m interested to hear how other EHR data holders & ETL’ers define this. Tagging @DTorok, @karthik, @esholle, @sblyman, @roger.carlson, @samart3, @Sgp6a, @burrowse, @mgurley, @QI_omop

1 Like

The method we use for deriving the observation period with our outpatient-based EHRs is similar to the one @MPhilofsky describes. We also ensure we exclude any patient-reported health events when looking for min/max clinical events to derive the observation period for a person. We have methods to identify patient-reported events and these events can have dates that occur many years ago (even prior to when the EHR system was in existence). In some of our analytic cases we want to include these historical events, therefore we include them in the CDM data but not within the observation period time frame.

Our approach is similar, the earliest and latest (or death) based on each domain.

@MPhilofsky we’ve been very lazy about it - just setting an arbitrary point as the date before which we don’t have reliable EHR data and using getdate() as the end period. For everyone. Obviously this gets in the way of a lot of higher-order analytics that sit on top of the platform, but for direct SQL queries executed on an ad hoc basis it hasn’t really stood in our way.

As we mature in our use of the OHDSI ecosystem I think we’ll likely adopt a similar heuristic to what you’ve described here, which seems sensible and straightforward.