OHDSI Home | Forums | Wiki | Github

Observation Period Flavors (First THEMIS-Focus Group 2, now discussed everywhere)

  1. Incomplete or inconsistent periods of observation.  For example, when in a claims dataset a patient is under continuous observation for medical coverage but only has prescription coverage for part of that time.  Under the current model, the patient would either have a shorter period of complete observation or a longer period of mixed observation status.  Currently, conversions can leverage the Payer Plan Period table to specify the types of coverage provided within an observation period, but in practice this is informal and non-standard.  Users new to a dataset have no consistent way to check for the types of observation a patient is under or how to define them (or even know if they need to). 

A short term solution might be to leverage the Payer Plan Period table using new standard concepts for coverage type (in one of the concept id fields), and make it a best practice to always check the coverage type prior to any new analysis to confirm the types of events expected to be under observation in a given period. This could also be borrowed for combined datasets where the type events under observation may also change over time but are not related to payer coverage (i.e. a combined EHR and claims dataset where a patient’s observation period for each data type may not align perfectly). We could set up defined payer_plan_periods that need to be filled out for claim ETLs and EMR ETLS to help standardize these entries. We’ll use the payer_concept_id field to standardize the following for claims data: medical, drug, medical/drug coverage. Then for EMR, we could use: patient in network (to define all current visit data for that patient), data in network (to define all data related to patients in network), etc.

A better, longer term solution would be to modify the CDM itself to allow for different ‘flavors’ of Observation Period to be captured. This could be done with a new table similar to Payer Plan (i.e. observation_period_detail?), but without the confusion of forcing that domain to do double duty or support cases where the types of events observed are not based on payer coverage. So within a patient’s higher-level observation period, there would be a sub-domain to capture observation period type (what events are expected to be captured) and duration using standard concepts and having standard methods to allow naive users of a dataset to determine when a patient is under observation and eligible for a given analysis. The standards proposed in the payer_plan_period will move to the observation_period_detail table.
If we don’t want to add a new table to the CDM, and we do not like the payer_plan_period option, we can leverage the period_type_concept_id field available in the observation_period table. Please see specs below (from https://github.com/OHDSI/CommonDataModel/wiki/OBSERVATION_PERIOD):

We can set up standards for using this field to define the different types of observation_periods a patient can have. Again, use the following for claims data: medical, drug, medical/drug coverage. Then for EMR data, we could use: patient in network (to define all current visit data for that patient), data in network (to define all data related to patients in network), etc.

  1. For events which fall outside a patient’s observation period, the current standard places them in the Observation table regardless of original domain and they are considered to be patient history (or else they are excluded from the conversion altogether). This covers some uses rather well, but in other cases may too limited when more complete information about an historical event is needed. Would a new domain type concept id to classify ‘historic’ events, stored in the appropriate event domains, be desirable or useful?

Along the same lines as this post, I would like some guidance for our situation. I’m sure others have this same dilemma.

How should we define an Observation Period for our EHR data?

We do have a “network”, but it is a network of care providers/care sites and not coverage. We aren’t a HMO, so we don’t have all the healthcare data for a person. We have data for patients that are only seen once at our facility (I’m thinking EMS transport)/transport from outside facility/one time referral to a specialist, patients that are seen yearly by a specialist at one of our facilities, but receive primary care elsewhere, or maybe a patient who receives all their care within our network. But the problem is, we just don’t know which patients do what. Our data doesn’t contain that information. My current, unvetted idea is to put all the data about every person (minus the ‘restricted’ persons/events) into the CDM. The observation_period_start_date would be the first datetime from any table. There wouldn’t be an observation_period_end_date because we could still see receive person data at any time. Problem is the observation_period_end_date is a required field. We do daily loads, so we could update the observation_period_end_date with the current date. However, since we currently don’t capture every clinical event that information also needs to be in the CDM. Possibly the Metadata table?

1 Like

@plpo, friends:

This subject comes up all the time and is subject of many debates. We need to nail this for good. I have a very strong opinion, and that is to keep the observation periods as high quality as possible, even if we lose data. Three reasons:

1) Data presence should be reliable, and data absence should be reliable

Observational data are analyzed with the assumption that two (unspoken) axioms are true:

A. If some relevant medical event happens there is a record,
B. If there is no record nothing happened. Which means, we don’t have negative data in there (like “patient didn’t have an MI”).

Only if these two axioms hold, our typical calculations of incidence and prevalence rates work. And we use them all the time, 80% of all studies calculate rates and ratios of rates (relative risks). Axiom A data are in the numerator, Axiom B data (patients without data) are in the denominator.

Axiom A holds true if (i) the patient is observed and (ii) the event is worthy of recording: MIs are worthy, itch on the back is not, and everything else is inbetween. Axiom B holds true if the patient is observed. Either misclassification will distort the rates.

2) Medical History information is low quality

In most cases, that information is obtained from the patient, not from some record. And everybody who ever went to the doctor (i.e. everybody) knows the fidelity of the answers to the questions from the doctor. As a result, physicians put very little significance to these listings of prior problems, at best in order to not miss a potential differential diagnosis. Family history is even worse (“Do you have any diseases in your family?” - “Oh yes, my husband has trouble falling asleep lately”). That kind of thing.

3) We need not follow the hoarding reflex any longer

Folks feel they have to preserve every little snippet of data for every patient. However, we now have a network of more than one Billion patients in the OHDSI network. We can afford losing a few data, if we can improve the quality in return. Observational research has a reproducibility and quality problem, not a sample size problem.

Therefore, I would recommend the following:

  • We create observation periods where we can make the best assumption to have axiom A, and more importantly axiom B, hold true for all three main data tables: CONDITION_OCCURRENCE, DRUG_EXPOSURE and PROCEDURE_OCCURRENCE.
  • The rest goes either in the trash, or into some place where we keep medical history information. The separation will allow folks to use these data if really wanted (e.g. for the infamous exclusion criterion use case), but it will not participate in the general cohort definitions by default.

To use some additional flags or PAYER_PLAN_PERIOD to indicate if the Drug camera or the Condition/Procedure camera is on makes no sense to me. For each query, we would have to join this table, only to exclude the exact same data I want to relegate to history anyway. The vast majority of all cohort definitions today would break, and their definition would become a lot more complicated and performance would go down the drain.


I wouldn’t worry about that, @MPhilofsky. Specialty clinics are not expected to cover all primary healthcare issues, only the complex ones. So, nobody would create the incidence of febrile viral infections in the winter from these data. In your case, you have a mixture of specialty and primary care, and the data would be treated the same way.

The only thing we don’t have is a way to declare these things explicitly. That’s something we should think about in the metadata table.

1 Like
  1. Data presence should be reliable, and data absence should be reliable

Thank you @Christian_Reich for stating this so clearly. This is my issue with the desire to include “old” information from before the observation period. In particular, the absence of data is particularly hard to explain. People often want to include a few extra bits of information to get a few more people with an exposure of interest, forgetting that a lot of similar people are missing.

As an analogy, the issue is very much the same as with outcomes. Just as missing data in the outcomes period can be informative, data outside the observation period can be informative. This can bias outcomes associated with the exposure if the exposure definition period extends outside a contiguous observation period for some people.

Note that this is a different issue from forcing everyone to have the same duration of observation period before an index date (for example, forcing everyone to have a 12 month look-back period when some have valid, longer ones available). In this case, for people with different length observation periods, it is better to use all the information. See Gilbertson, et al. (I am ignoring the situation when applying a definition that requires a specific duration of observation period in which case you should use that.)

Observation periods can be constructed dynamically using the payer_plan_period, or through other modifications to the observation period table. We support that in our application and our data model, and we prefer that flexibility for our purposes. But I can also attest to the fact that dicing the data in this way can affect performance with large datasets. The OHDSI approach is to place information in the CDM that is ready to be analyzed and to consider analytical performance. This isn’t always explicitly stated with the OMOP CDM, so it is important that you clarified this. I think any change needs to be considered in the context of the breaking changes it would introduce.

1 Like


Turns out the same discussion is popping up everywhere. I am consolidating. Please let’s have it here.

Since it is by far the hottest THEMIS issue, I suggest we will nail it at the next face-2-face at the 8-9 March 2018, hosted by Amgen in 1000 Oaks.


Copy from the other conversation:

From email:

1 Like


Related to this thread . . .

@Ajit_Londhe just pointed out a scenario where we are getting death records outside of the OBSERVATION_PERIOD for a claims database. @clairblacketer, @Ajit_Londhe, and I were going back and forth on what to do. Personally I think I agree with the approach from above, I guess it really isn’t history, but rather future. :slight_smile: If we know they are no longer enrolled but know they died in the future, we are missing a snippet of time that might explain what truly happened to the patient.

We actually know this about everybody. :frowning:


@Christian_Reich - other than being depressing on the matter, would you agree that it would be appropriate to delete a death that occurs let’s say 30+ days after enrollment ends?

If we move forward with the option that I outlined, which Peter advocated
for, then no data that falls outside of the observation period would need
to be deleted (in which case to your scenario, the death record would be
preserved, even though it falls outside of the observation period). A
given source would need to consider at ETL-time how you want to think about
an observation period in this context, but I don’t think it would require
death deletion in any circumstance.


Dear all

We are converting EMR data to CDM.
The Observation_period table is produced by the following method.

“Periods of continuous enrollment is calculated by combining monthly records and
recording the observation_period_startdate for the first period as the enrollment start date
and observation_period_enddate for the last period as the enrollment end date. If the
time between the end of one enrollment period and the start of the next is 30
days or less, treat this as continuous enrollment.”

After the conversion, Achilles_heel report that looks at “events outside observation periods”.

We have checked the data to resolve this error.

In Korea, there are cases where a doctor prescribes medication without a visiting enrollment,
or patient comes back to the hospital a few days after the treatment prescription, examination prescription and then the treatment or examination is carried out without a visiting enrollment.

If it comes without visiting enrollment, it will be recorded as before or after visit enrollment.

That is, this information is an event that the patient visited the hospital correctly,
but the visit registration is not recorded and an Achilles_heel error occurs.

In order to include these data in the observation period,
First, we try to generate observation_period using the drug, examination, treatment, and enrollment tables in EMR.
And then, the time between the end of one enrollment period and the start of the next is 6 months or less, treat this as continuous enrollment.

First point, would we use other tables(drug, examination, treatment table etc.) as well as enrollment tables to create observation_period?
Second point, would we window period 6 months? How long will it be appropriate?
We checked the issues, the claim database has used a window period 1 year based on insurance registration. And EHR databases have used that CCAE database 32days, and GE database 12months.

We would like to hear from OHDSI members if this method is suitable for solving the problem of observation_period invalid in Korea.


I think you are trying to use the rules for claims data, instead of those for EHR data. In EHR data, unless your healthcare system has a specific mechanism for “enrolling” patients with an institution, you don’t know if you “have” a patient in active treatment or not during the time when nothing happens. Those axiom 2 situations could mean the patient is healthy and happy like a fish in water, or gone with the wind. There is nothing you can do. Most EHR system define the observation period as between first occurrence of something (Condition, Drug, Procedure etc.) and the last occurrence. Everything in-between is part of the observation period.


Thank you for your suggestion.
However, we have “missing observation period” problem.

The term “enrollment” doesn’t mean the insurance enrollment in this case. Dahye and we used “visit” information as enrolled period because we can exactly know if a patient was observed or not, only during he/she is visiting the hospital.

Especially in Korea, patients can go to any hospital freely they want.
For me, there’re 4 hospitals I encounter independently; usually A for cold, B for dent, C for surgery or emergency, D for physical examination. It changes very frequently, and A hospital will never know that I’ve got surgery last year. Also C won’t know that I had light fever right after surgery/discharge, if I decide to go to nearest hospital A for fever.

This “blank period” during a patient is not visiting certain hospital could be for one month, or could be 10 years.

That’s why we’re not using the normal EHR rules.

  • “first occurrence of something to last occurrence of something”

Here’re additional issues you may answer,

  • if any other country’s EHR systems have the similar problems of blank observation period?
  • if they have, how “EHR rules” effect on research results?
  • which is better method to build observation period table for “blank” data?


This is a very good question. Actually two:

  1. How short Observation Periods can be (longitudinal or horizontal separation of data)
  2. What do I do if I know I capture only a fraction of the data (vertical separation of data)

For 1): The idea with EHRs is that if you were sick you would return to the same hospital you went last time, and which probably is the closest to you. I know this is not a given. But the likelihood is there, so people make this approximation. They have nothing else to hang on to. Which means, if there are no data then you are healthy enough not to be in the hospital (even though you might be in a different one). For use cases that look at events inside a visit this is probably fine (rarely people get referred from one hospital to another just like that). For use cases that cover longer times (like studies with long-term follow up) this might create “Axiom 2” errors = You underestimate the rate of events because you would wrongly interpret the whitespace as a time without events.

The idea of making mini-Observation Periods around each visit is dangerous. Because you need the whitespace in-between for the correct prevalence assessment. Otherwise it looks like a patient is “always” in the hospital, or “always” has an asthma exacerbation, because during an Observation Period they do, and outside you are not supposed to look. You can also never define a washout period. In the extreme case, you have one-day Observation Periods: Everything has a prevalence of 100%. So, don’t throw away the whitespace.

That’s fine. I haven’t been in a hospital for 10 years thank God. But if something happened I would have gone.

For 2) This is really a problem for the Metadata and Annotation Workgroup. They need to solve the problem how to capture the fact that there is only partial information available.

1 Like


According to your explanation, observation period as between first occurrence of something (Condition, Drug, Procedure etc.) and the last occurrence.

there will be one observation_period per patient.

Thank you for your detailed explanation.

A proposal about new period_type_concept_id was considered in this discussion. And what the result: will be added new concepts or this idea was rejected?

Thank you in advance