OHDSI Home | Forums | Wiki | Github

Observation_Period table - how do you generate this table at your site?

I am analyzing CCAE dataset in IMEDS lab.

My discussion item is about observation_period table.

Also, in another thread brought about this problem - differences in how you generate this table impact your results of any given SQL script. (Regenstrief operating on 36month window vs. allowing maximum 30 d of silence to close observation period)

Relationship to PAYER_PLAN_PERIOD is also of great importance. What if your EHR data have no payer data to work it at all.

The table is defined as:

“The Observation Period table is designed capture the time intervals in which data are being
recorded for the Person. An Observation Period is the span of time when a Person is expected to have clinical events recorded. “

For a country like UK, the observation
periods for all citizens will be something like: from BIRTH to DEATH since they
are “insured” all the time. For US, the table is mostly derived from payor_plan
tables and is more complicated.

I see in CCAE dataset has sometimes drug data without
a visit. I looked at the ETL for the IMEDS CCAE dataset, and it does not
clearly define what events are considered when deriving this table.

The CCAE IMEDS ETL document says:

Patient Status during an Observation Period in the Truven claims data
is available from the Enrollment tables. Enrollment Detail (ccaeT) includes records that indicate a person’s enrollment for each month for the period covered by the claims data.

Enrollment entries are consolidated by combining records that indicate
continuous enrollment over a period of time start and end dates to cover the
period. If enrollment data indicates a person’s coverage did not extend through
the entire duration covered by claims data, then multiple Observation Periods
are recorded to capture all periods of coverage with corresponding start and
end dates.

The consolidation is done through the following steps:

 - Records for each person are sorted in an ascending order of dtStart (Start Date).
 - Periods of continuous enrollment consolidated by combining monthly records and
recording the Start Date (dtStart) for the first period as the Observation
start date and end date for the last period as the Observation end date. If the
time between the end of one enrollment period and the start of the next is 32
days or less, treat this as continuous enrollment.

No records are added to cover gaps in coverage.

But it does not state when it “combines montly records” whether it considers labs as records or labsORvisitORdiagnosis as record. What is meant by event/record is crucial.

There is also an associated Achilles_heel report that looks at “events
outside observation periods”.

The table observation_period is not listed in chapter of CDM v5 specs as Standardized Derived Elements. But in most cases, it is derived and we should be clear about how exactly it is derived and have it derived consistently accros sites.

No EHR has a tab for “entering patient observation periods”.
E.g., “When did you approximately move-in to central Wisconsin?”

I am looking at “silent patients” as patients that seem to be healthy (no events, but have insurance coverage) and rules that generate observation_periods are very important for that
(since I don’t have access to generally all-insured population (like UK’s CPRD) :frowning: )


Not sure what others will say, but here is the logic I would apply:

We are using the data under the general axiome that (i) if a clinical event happens to the patient, it is in the data and (2) all that didn’t happen is absent from the data. That’s what we base calculations of incidence and prevalence numbers on. We don’t have “Did not have AMI” (even though there are a few codes like that). So, the observation period designates the time interval where these are true, or reasonably true. And that should direct us how we should compile those times, which ought to be different for each data source.

In your examples:

  • The PAYER_PLAN_PERIOD (no idea how the infamous “payor” came about, nobody would say “drivor”) probably cannot be created from EHR records - most of them have no payer and payer plan information.
  • the CCAE ETL uses enrollment records from the ccaet table. If lab, visit or diagnosis records fall outside of this period - too bad, get truncated. Because data outside the enrollment period would satisfy axiom (i) but not axiom (ii).
  • In CDM5, derived tables are defined as drived from other CDM5 tables, not from source tables. Derivation from source table is called ETL.

But you are right. The EHR data have a fundamental problem with the observation period, in particular with axiom (ii) - we don’t know if something not in the data never happened, or the patient just went ot another doctor outside “central Wisconsin”.

Does that help?

This is a fundamental challenge in using observational data and I think OHDSI has a responsibility to offer logical conventions as it is so central to our work.

My 1 cent: We need a Workgroup that focuses on this to generate recommended strategies.

My 2 cents:

For claims, yes would just go with coverage period.

For a single EHR, it’s about what is reasonable. If it is an inpatient based CDM, then pretty much would limit to hospitalizations plus 30. In outpatient, then your rules may vary based on demographics (e.g., older patients are seen more frequently, young females are seen more than young males, etc) but would give at least a year after last event.

If HIE, I would do the type of analysis we did at Regenstrief in terms of silent periods, then decide the right level of conservatism in terms of reasonable persistence of observable period.

1 Like


Let me send a request to the Glorious Leader to give us a slot on the OHDSI session. Or do you think we should branch that out?


You mean the all group meeting? Sounds like the right place to start.

Glorious leader. GL. Nice.

I’m adding to this post to give the GL some food for thought, and get some insights if anyone else is mulling over this:

  1. Given:
    a. SAFTINet project stores both clinical and claims data, linked at the patient level, in the same OMOP instantiation.
    b. It is possible for us to have only clinical data, but it is never possible for us to have only claims data.
    c. We want to remain consistent with OMOP conventions to the greatest extent possible.

  2. Question under consideration: What is the best way to create the observation_period dates to support use of Achilles and are other research analytic needs?

  3. Proposal 1: Create both “clinically-derived” and “Insurer-enrollment-plan-derived” observation periods
    a. Clinically-derived observation period –
    i. Alternative 1a:
    1. Obs_period_start_date = Date that clinical source data is first available from the “clinical data source” and
    2. Obs_period_end_date = Date that clinical source data is last available from the “clinical data source”
    a. Consequence-this would not allow us to consider evidence of care about which we are aware via claims data, such as a medication refill.
    b. Requires that we know the source of the data. (We will have this information and we’ve added this field to all tables due to our use case of combined clinical and claims sources in a single instantiation)
    c. When necessary, we could combine the use of the clinically-derived and claims-derived observation periods in an analysis, which offsets consequence above.
    b. Insurer-enrollment-derived observation period=
    i. Alternative 1b: (using OMOP convention)
    1. start date = start date of enrollment (payment plan period start date) and
    2. end date = end date of enrollment (payment plan period end date)
    c. This proposal would violate this CDMV5 convention : “each person can have more than one valid observation-period record, no tow observation periods can overlap in time for a given person”

  4. Proposal 2: Use a “combined clinical and claims data”’ observation period
    a. Obs_period_start_date (~ end_date) – first (or last) date of any “clinical or claims” data being recorded, but not including payment plan period dates.
    b. This essentially mirrors the concept of the clinically-derived observation period but allows us to use all the information available – ‘I only hear it if it happens in within range’ but includes claims signals in addition to clinical signals as ‘in range’. This is in contrast to the Insurer-enrollment-derived obs_period – that is used for the assumption of “I’m listening from X time to Y time, and if I don’t hear it (a claim) nothing happened”
    c. This proposal does not violate the CDM V5 convention – 3c, above

  5. Proposal 3: use enrollment/payment plan period (3b above)-
    a. I don’t have this information for all persons in a dataset, some patients have only clinical, some have clinical and claims, so I would only have observation periods on some persons. I also only have Medicaid claims, so if a person changes from Medicaid to private insurance- I have payer data for some period but all periods.

  6. Proposal 4: use clinical when only clinical data is available and claims when (clinical &claims) is available – would not work- see 5a for reasoning.

For our use case of combining clinical and claims data –I’m leaning towards Proposal 2, as it best represents why we combined clinical and claims data from a research perspective. But I’m wondering if others in the community have thought this through and if/how this decision impacts the use of Achilles.

Thanks, Lisa Schilling

Just for reference, here is how the table is generated for GE dataset in IMEDS lab.
A 12M windows is used.
So we have 1M, 12M and 36M used in the community.


Person status during an Observation period in the GE EHR data is captured from the ACTIVITY_F table.
Enrollment entries are consolidated by combining records that indicate continuous enrollment over a period of time start and end dates to cover the period. If enrollment data indicates a person’s coverage did not extend through the entire duration covered by claims data, then multiple Observation Periods are recorded to capture all periods of coverage with corresponding start and end dates.
The consolidation is done through the following steps:

  • Records for each person are sorted in an ascending order of dtStart (Start Date).
    Periods of continuous enrollment consolidated by combining records and recording the ACTIVITY_DATE for the first period as the Observation start date and end date for the last period as the Observation end date. If the time between the end of one enrollment period and the start of the next is 12 months or less, treat this as continuous enrollment.
  • No records are added to cover gaps in coverage.

I’m not sure if this has been subsequently answered but we have been looking at this particular nuance.

We haven’t cracked an approach but we certainly have some learnings we would be able to share.

Our approach was to follow observations chronologically for each patient and disease grouping. It is more likely that gaps or absence of a patient occurs within specific diagnostic groups.

OBSERVATION_PERIOD - this table is ambiguous. Obviously, this is different from the PAYER_PLAN_PERIOD, something that seems relevant for health-economic studies.

This is my understanding from current documentation - OBSERVATION_PERIOD:

  1. No two periods may overlap. Periods are derived from payer-coverage files or clinical encounter information.
  2. There may be gaps in periods - clinical data captured within the gap period is not to be used.

It is not uncommon for individuals to drop health plans and be uninsured for extended periods of time. But, they can still seek care when uninsured - e.g. ‘self pay’. Clinical care delivered to individuals is relevant to research. So do we propose that we dont use the clinical data for a person when uninsured?

The EHR data based observation_period is based on episodes of care – triggered on the first date of a health care service, and ending some reasonable number of days after (30 days?). But I cant understand the value of using this arbitrary date.

I think a person once born -is at-risk to have clinical events recorded - irrespective of whether they are insured or uninsured (self-pay); captured in a EHR or not.

Has OBSERVATION_PERIOD been discussed in other workgroups? How are others using this?

Thank you

Hi, @Gowtham_Rao,
How observation_period is populated may depend on the design choices of the ETL (hence why you might consider it ambiguous?) but the purpose of the table is very specific: it defines the time intervals that a person is considered under direct observation of a healthcare provider. In claims database terms, that would be their enrollment. In hospital systems, it might only be when the person is inside their walls; when they enter they begin observation, when they leave, they stop. The hospital doesn’t know what happens to the person between visits in this example.

Periods don’t overlap because if you want to associate a patient event to an observation period, you only want to get the single observation period associated with that event. If you had overlaps, then you could have one event associated to multiple periods, and that would complicate some analytics in the OHDSI tool stack.

That’s true, but from an insurance provider, how are they supposed to track care if no is telling them to pay for it? No one proposes that we don’t use the clinical data for a person when uninsured, but the question to you is: how does anyone get this data?

That’s a rule that the particular EHR system has adopted as the specification of ‘a person is being observed’. Another system may decide the observation period for a person is the earliest known date of a person to the latest known event of a person (or maybe the current date if they know the person isn’t dead).

The importance of the OBSERVATION_PERIOD table is that you can not say that a person has not been diagnosed with X or not exposed to drug Y in the past 5 years if you can not assert that the person was under continuous observation for that time period (the events could have happened in the ‘gaps’). For claims, some have adopted the idea that if they are paying for coverage, they’ll ask the insurance provider to pay for their meds and visits to a doctor. So that’s where you might be confusing the payer plan period contributing to building an observation period. But in other systems, they could just flat out tell you that person P1 as an entry date of StartX and an exit date of EndX, and that would be your observation period in that CDM.

Consider this: let’s say you’re in the US paying for coverage, and you have been for 7 years. Then you decide to go to another country where they aren’t going honor your US coverage. So, you cancel it and start off another plan in the foreign lands. After 3 years, you return to the us and pick up your old US plan again. You have a 3 year gap in there. You got medical coverage from somewhere, but how’s the US system supposed to know about it? So, you’re just a person with a 3 year gap in their observation period. that’s all this table is built to serve.

Thank you @Chris_Knoll for the thoughtful response. That helps reduce the ambiguity.

Would it be somewhat accurate then to say - the observation period is mostly to qualify missing information. If there is significant missing information period - then we cannot assume we have complete longitudinal information on the person.

In the presence of significant missing - the answer to question : does the person have diabetes mellitus in past 5 years - is ‘dont know’ because missing data. If there is no significant missing period in past five years and the person did not have care for condition diabetes – then the person most probably does not have diagnosed diabetes mellitus .

I.e disease present – yes, no or unknown.

No problem, @Gowtham_Rao . It would be accurate to say that an OP is to qualify missing information, or you could look at it as qualifying where information is present. Either way is fine.

To your example about T2DM: do you need to have 5 years of continuous observation to determine if a person had Diabetes in prior history? No, because you might find a record a year prior, and so they have it. Requiring 5 years wasn’t really necessary here. In fact, gaps in your observations wouldn’t matter either if you found a diagnosis somewhere in your fragmented history. However, if you are going to assert that they did NOT have the disease for the past 5 years, in this case, you do need to to know that you have information for 5 years in order to determine that they didn’t have something. Subtle, yes? Not sure if that’s addressing your comments, so apologies if I’m not following you. But hopefully you have a better appreciation for the observation period table now :smile:


1 Like


You are correct. The OBERSATION_PERIOD is more important to determine whether missing records are due to the fact that nothing happened, or due to the fact that nobody was recording, but the patient might have been super busy in the system. You need that for things like wash-out periods (no drug in a given time) as in @Chris_Knoll’s example and incidence/prevalence calculations. If you were only interested in information present you wouldn’t need the table.

Note that the present (record there if something happens) and absent (no record there if nothing happened) axioms are rarely explicitly defined in observational research, but still expected to be true in an unspoken way. We get into those debates all the time whether it is ok or not to “throw away” data that are outsice the OBSERVATION_PERIOD. The answer is a very strong YES.

Thank you both

In our organization we are interested in quality of care delivered - care given vs care not given. The numerator portion of Quality measures definition goes something like - was X done in past Y time e.g was HbA1c tested in past three months, or colonoscopy performed in past ten years.

When checking the data - the answer could be

– no - because of missing information, or
– no - because of missing care.

If it is a no (absence of something) then we have to act on it. If it is a no because of missing care - then we have to work with our health plan members and their care providers. (quality of care challenge)
If it is a no because of missing information - then that’s a technology or documentation discussion (data challenge).

So, in this quality measure use case – observation period is useful to go down the path of either missing care or missing data.

Thank you

Only thing I would say to the above is around this point:

I’d say that you can’t say ‘no because of missing data’ because you can’t assert that something didn’t happen without knowing that you were definitely observing them. So shouldn’t that be ‘unknown – because of gaps in observation’?

I don’t want to take this thread off topic; this does sounds like you could be using the OBSERVATION_PERIOD to identify periods of ‘missing information’, is that what you are going in this case? And if so, how are you defining these observation periods for your patients in your EHR?

Agree. Unknown is more accurate.

I think this topic deserves more discussion. In addition to the community sharing best practices for how to populate the observation_period table, it might be good to work toward including a more formal representation in the CDM of the assumptions made about data completeness. Codes mapped to a taxonomy of justifications for confidence in data completeness within observation periods could inform interpretation of analytic results. Codes associated with a low likelihood of completeness could trigger flags that signal the need for cautious interpretation.

For example, EHR data might support relatively high confidence in completeness within brief observation periods during hospital stays or in the ED, low confidence for multiyear periods within primary care clinics. and somewhere in between for months-long periods within specialty care settings.

For claims data, periods of complete insurance loss might be the end of a spectrum of coverage rather than a binary indicator of expected completeness.

If we can encode our knowledge about our data’s risk of incompleteness, analytic routines could inform the interpretation of analytic results in ways that might be especially for users with less intimate knowledge of the source data. A rough taxonomy of expected completeness might be useful and a very precise one might be hard to create.

1 Like

@Andrew et al:

Time for a proposal in the CDM WG, don’t you think?

Only caution I will put out (and have done so in many Forum postings): The default of an OBSERVATION_PERIOD record should be “data should be expected at high likelihood”. If we create low-likelihood Observation Periods, we will kill the CDM, because then every single record has to be joined to the OBSERVATION_PERIOD in order to look for your flags of confidence. The performance of any quantitative query (e.g. with a denominator, like incidence) will go down the toilet. So, invent something neat and backwards compatible! :smile:

@Christian_Reich There’s a chance that a slightly better defined version of this idea will get discussed and improved on Michael Kahn’s data quality call. If anything emerges from that discussion that seems worth the CDM WG’s time I’ll be sure to put in a proposal. Your advice on how linking to flags could affect performance is very helpful.


Absolutely. Ping as many good folks as you can. The other ones are here and here.

I don’t know what the solution would be, but there are folks who are thinking about many locations (places patients used to live in the past) without destroying the current standard with one location per person. Maybe we can steal from there.