OHDSI Home | Forums | Wiki | Github

Help required to design observation period logic

Hello Everyone,

I require few suggestions on how to design observation_period logic.

Currently what we do is the usual. we pick of minimum of a patient's visit start date (out of multiple visits (start date)) and maximum of visit end date (out of multiple visits (end dates)).

Though this works fine, I see that Atlas misses the patients who don’t fall within the observation periods. I also understand that Atlas allows us to ignore observation periods for inclusion criteria which is great.

As observation period criteria is the fundamental, I understand Atlas has it as mandatory criteria.

Now only option I have is to design an observation period logic which can allow me to select all patients (similar to ignoring observation periods) in cohort entry events because I don’t wish to ignore records from analysis.

Let’s say my observation_period_start_date is 01/01/1899 and my observation_period_end_date is 31/12/9999

Now can’t this type of observation period logic be useful in studies where we do just retrospective analysis of our data, apply some ML algorithms and get the insights? @Chris_Knoll did touch upon this in this post, but I just felt creating a separate thread would be better

ex: If we have to find the 28 day mortality of patient, I can just rely on his domain tables like visit_start_date and other clinical measurements/data to do the analysis. Am I right?

Does observation period play any role during analysis when we have all other data for analysis? Or can you help me understand by giving an ordinary layman explanation on what kind of studies does observation period is required? I understand it’s used for selecting patients and the amount of time patient is in a cohort will be more like in centuries. But having a cohort time of centuries can cause any impact during analysis? Because If I am to do a study, we usually pick a cohort based on certain criteria and collect their covariates and do the analysis.

So, with all your experience of conducting several studies in healthcare and transforming EHR data to CDM form, can you please help a beginner like me on this? When and for what kind of studies (example please) should we be actually bothered about observation periods? what is the logic that you follow usually? do you always have to compromise by letting go of certain records when they aren’t between the observation periods?

Hi Akshay,

you set your observation period to be several centuries to include all the data you have available.
But think of the observation period variable as something with which you can be more specific about your results. In the sense that, the way medicine is practiced may change over time, so depending on the decade e.g. you may find a different mortality rate (death within 28 days) because new medicines come to the market, new surgeries are being performed and so on.
Another aspect when working with data is that maybe in the early days of a registry or a database, the entries are not so reliable as when the data has already been collected for several years. So you may not want to include these early years in your study and therefore start your observation period only later.

Hope that helps for you understanding how to use the observation period variable :wink:

Best wishes,

Hi @tburkard,

Let me make sure I got it right. Forgive me if my English isn’t that great and if questions are basic.

Though you suggest it is okay to set observation period like from 1899 to 2099, you feel that in cases quoted like above, it may not make sense to set observation period in centuries. Am I right?

Because in this case, Let’s say drug A which was used to treat patients with condition C between 2000-2010 might have taken let’s say 10 days to cure the condition.

On the other hand, let’s say drug B which is being used currently to treat patients with condition C between 2010-2020 takes only 5 days to cure the condition.

In this case, though the condition is same in both cases, patients data (visit_occurrence, drug_exposure etc) might be different.

So considering the above example, if I have my observation period start and end date as 31st Dec 1999 to Jan 1st 2021 for each patient and select continuous observation of at least 0 days then I will have all the patients in my cohort.

On the other hand, if I rely on the min(visit start) and max(visit end) date for each patient and select continuous observation of 0 days, I will still be able to have all records. Right?

I understand the former can be useful when there are records in EHR without being accompanied by visits.

But I somehow that I am not clear about the example drafted. Does my example make sense. Would you be able to correct my understanding?

Hi @Akshay,


Start with the definition of the table: uniquely define the spans of time for which a Person is at-risk to have clinical events recorded within the source systems, even if no events in fact are recorded (healthy patient with no healthcare interactions).

The Observation Period is dependent on your data. For EHR data, many data holders believe their Observation Period starts with the first date a Person has an interaction* with their healthcare system and ends with the date of the data extraction. This logic assumes a Person would return to their institution if they required care. See the discussion here. However, your logic may differ from the above depending on the way data is recorded in your EHR.

*Depending on your particular EHR instance, an “interaction” may be defined as a Visit, if a Visit is always the first interaction in your health system. Colorado defines “interaction” as any clinical event since pre-visit activities generate data in our EHR.

It’s not possible. Electricity was sparse in 1899 and there definitely wasn’t any EHR system in use.

I suggest you take a close look at those records. Is this a problem with the Observation Period? Or is this a problem with the data? Many EHR systems use default dates instead of leaving certain dates NULL. Birthdates in 1850 or drug dispenses Jan. 1, 1970.

1 Like

Hello @MPhilofsky - Thanks for the link. Am reading it.

this was just for an example. Might be I shouldn’t have gone so back in years. haha

For end of observation period, a more complex logic was just discussed in EHR WG meeting. A patient silent for a very very long time should not have end date = extraction date since most likely the patient moved out of service area.


You should further explain your idea on how to derive an Observation Period end date. I think it would be useful to share and receive feedback from the community :slight_smile:

Hey, @MPhilofsky, I really like this definition of the observation period. I’ve struggled to put the idea of an observation period into words, but this makes it very clear. I guess I needed to read the table definition more closely!

But, I think recent trends with ‘observations outside of observation periods’ goes against this definition of the observation period. If the observation period is the period of time a person is at risk of clinical events, then how can we have clinical events outside of the observation period (as recent Themis proposals have suggested)? If a person has a clinical event, that means they must have been at risk of a clinical event, so, shouldn’t every clinical event be bounded by some observation period?


“At risk” is not enough. We need to request that there is reasonable expectation that a clinical event is being recorded. In particular, we expect Conditions, Drugs and Procedures. Everything else might be missing. We will get to that in the CDM WG, as we are going through all the tables.

The records outside the Observation Period are those were something was recorded. But the problem is the opposite. What about the white space? The axiom 2 of observational data: If there is no record nothing happened. We have no such belief. But we need to be able to rely on that as well.

I think “at risk” implies that the event would be recorded if it occurred.

1 Like

The Person and Observation Period tables are the only required tables in the CDM. The conventions for the Person table are fairly straight forward. The Observation Period table is vague, at best. I look forward to more guidance from the CDM WG will provide for EHR data

@Christian_Reich As we have seen on the forums, this is a highly debated topic. I hope the discussion isn’t limited to a 9am EST WG call.

I’m not sure. Our data doesn’t have events outside of Observation Periods :slight_smile: Does yours? How did you decide on the start & end date for the Observation Period?

From what I understand (and please don’t take this explanation as support of decisions):
We’re using claims data, so Observation Periods are being created based on enrollment in a plan. However, we’re getting possibly genomic data about the person so we’re asserting that the genetic makeup of someone was as of birth. So we’ll have an observation of this genetic information as an observation as of the month/year of their birth, but the observation period does not cover that.

I think we see things like this in our data, I could be mistaken, but I’m pretty sure this is a plausible way data gets into our system.

Initially, i thought that the requirement for ‘ignore observation periods’ in cohort definitions was to support events outside of observation periods, but now that I think about it more, the use case of making inclusion/exclusion rules that ignore observation periods isn’t about allowing events that do not exist in an observation period, but rather if you have an exclusion rule, we want to be able to look at a different observation period in the past that may have an exclusion event. The old way (where inclusion rules only apply to the same observation period as the index) din’t allow you to exclude people based on an event that happened in an earlier, separate observation period…now you can, and now I feel good about that function.

I still think people are asking for events to be allowed outside of observation period, but the cohort definition function to ignore observation periods isn’t supporting that perspective, just allowing you to find observations that may appear in different observation periods.