OHDSI Home | Forums | Wiki | Github


One of the items discussed in THEMIS however maybe didn’t have a specific forum post was do we allow events to fall out of observations periods?

Here is what we are currently recommending:

1. Events CAN fall out of observation period. 2. Payer plan period should be used to capture coverage (including partial e.g. Medicare part D) and can overlap with the observation period. 3. Every patient must have at least one observation period.

1. not use time outside of an observation period for identifying people. 2. to ensure quality do not use events outside of an observation period for an analysis. 3. if patients do not meet criteria for observation period (e.g. have partial Medicare D coverage), create an alternative CDM that allows for them to fall in OBSERVABLE_TIME.

Update CDM Wiki for OBSERVATION_PERIOD to discussion in the conventions.


Copying here from Github:

So, we now have two problems with the Drug Eras:

  • Drugs outside the Observation Period
  • Patient reported drugs

I agree with @Ajit_Londhe, and either one should be left out: The former has no reliable capture, later no precise timing. Either way things like Persistence Windows will be meaningless.


I would say as long as the DRUG_START is before the end of the OBSERVATION_PERIOD it should be included. If a drug starts the last day of your OBS_PERIOD but goes for 60 days it shouldn’t be excluded because of that.

From SQL programming perspective, I don’t see how an event can fall outside OBSERVATION_PERIOD. I checked the standard SQL code for OBSERVATION_PERIOD, it uses the MAX(COALESCE(condition_end_date, condition_start_date)), MAX(COALESCE(drug_exposure_end_date, drug_exposure_start_date)) and maximum of other transaction dates. So in theory, all events should fall into the OBSERVATION_PERIOD. The only exception to this is Death. Am I correct or I am missing something?

There is data outside observation time in many databases. I think in CPRD data that was captured before the patient joined the practice which is deemed less reliable is outside observation period. In IPCI (Netherlands GP data) a lot of patient history is outside observation period because it was captured before moving to a new data capture system.

Are there recommendations from the community/themis about when it should or should not be used in analysis?

We don’t want to program network studies that include data outside observation period in some databases but not other so the decision to include or not should only depend on the study question right?

Also is it correct to assume that data outside observation period is unreliable, i.e. we might see some things but can’t assume if something occurred we will see it in the data?

This use of observation period has become confused since the idea about capturing ‘less reliable’ in CPRD (and some other data sources). @MPhilofsky made a good comment in another thread that an OP represents the period of time where a person is ‘at risk’ of observing an event. By this definition, we shouldn’t see any events outside of observation periods because the events occurring imply that they were at risk of observing them.

I’m not sure I buy the idea that observation periods represent ‘current system’ vs prior system, and I’m not sure why it’s not possible that when you move a person into a new practice: they were under observation for events in the prior system, why doesn’t an observation period represent that period of time they were ‘at risk’ of observing events?

Other reasons for data outside of observation periods is to record historical (as in history of) events. But, I’ve agued in other threads that these observations about ‘history of’ is recording when you knew about the historical fact, not a record of when the actual fact occurred (ie: history of heart disease records the date you learned the patient has that history…not backfilled to some time 5,10 or 15 years in the past that they had heart disease).

Apologies if this post sounds frantic, but this issue has come up a lot, and ‘bending the rules’ around observation period (to say data outside OP is unreliable for example) has lead to a lot of challenges in studies and tools: events outside of OP have no notion of ‘continuous observation’, so we can’t use an event outside an OP and then figure out how much follow up time was after the event (for example).

1 Like

@Adam_Black: What @Chris_Knoll said.

Except I would even make it stronger. During OP we have the Closed World Assumption, which means the patient is not just at risk, but is definitely being observed. We know everything. If there is no record of something it didn’t happen. Only if you have that assumption you can calculate rates. Without that assumption, you have no denominator, and the numerator ain’t so great either. Obviously, it is not a 100%, as we know. In reality, we have sensitivity (something that happened is not recorded) and specificity (something recorded didn’t really happen) issues.

Outside the OP we did allow for folks to have records directly or through “history of”. But it cannot be used for incidence, prevalence, follow-up, PLP, PLE, all that. The only thing that has a use case here is your typical exclusion criterion.

What are you trying to do? What’s your use case?

1 Like

In the network studies I’ve worked in I have seen a non-trivial amount of all of the following:

  • events before a person’s first observation period
  • events between observation periods (for people with more than one)
  • events after a person’s last observation period

This can cause tricky questions like:

  • Say I want to create a cohort of people taking drug X with no history of condition Y. Do I use historical records of condition Y outside of observation for exclusion? I can do this easily enough if I’m writing code myself, but if I’m using a cohort for condition Y then this cohort won’t incorporate records out of observation period. That is, if the rule for cohorts is that the interval between cohort start date to end date must be during observation. (Is this a explicit rule for omop cohorts?)
  • Say I want to do survival analysis, what do I do with outcomes that occur after observation period end? If we include them then we also need to extend that person’s time at risk. But then what about the people without the outcome? If we only go up to observation end for them, we have a bias where people with the outcome have longer observation periods than those without)

Also tagging @tduarte @MaximMoinat for awareness

1 Like

That seems to be the only use case where folks are not opposed to using the outsiders. Reason: If there were a misclassification and the the 2nd Assumption does not hold (that if there is no record it didn’t happen) the exclusion criterion wouldn’t be “that badly wrong”. Reason: Cases are rare compared to non-cases, and statistics in this case are on your side. And you probably prefer to use the exclusion, even though imperfect, rather dropping it for reasons of religious purity.

No. Because who knows what you want to use the cohort for. The definition is that it contains patient where for a certain time a condition holds. Theoretically, that condition could be “patient should not have an active Observation Period”.

I would most certainly restrict it to Observation Period, whether the patients are cases or not. The Kaplan-Mayer method takes care of the fact that there are different follow-up times per patient. If you go outside the OP pretending you still can see the patient you can be almost certain to bias the result towards longer survival.

I know you hate dropping scarce and valuable data. The best database would contain patients from birth to death with everything recorded nicely. But we don’t have that. Still, our methods rely on the fact that we have a Closed World, at least for some period of time.

1 Like

Thanks @Christian_Reich

Iteresting! I was operating under the assumption that an omop cohort could only include time during which someone was in observation. Unless I´m mistaken, this is the way cohorts are operationalised in ATLAS/ capr @Chris_Knoll @Adam_Black? I can see that allowing cohort start/ end date to be outside of observation gives us the flexibility (and will make users decide themselves on the behaviour they want rather than it being prescribed), but allowing this does give us a lot more edge cases for software to test against. (I suppose it also has quite important implications about having a phenotype library where their is a definition of condition Y that could be used either for exclusion critea or an study index date/ outcome…)

For survival analysis that does make sense to me. One interesting case is hospitals with linkage to regional/ national death records. In this case they have the particular event of death that could happen many years into the future (really any time up to the date they are mapping the data). I have seen cases where they either have this out of observation period or create a whole new one day observation period just for this event, neither of which really helps for survival analys. They could extend the observation period to date of mapping for those that didn´t have a death record under the assumption everyone has the potential for a death record, but not sure if this introduces it´s own problems?

1 Like

Great discussion. Thanks for the insight. One thing I recently learned… In Europe conditions are often only recorded once when they first occur while in the US they are recorded regularly as long as they are still occurring. So this has implications for “no prior history of” criteria when the first occurrence of a condition is prior to observation_period_start_date. In the US chronic conditions get coded at each visit while in Spain the chronic condition gets coded once and then not again even if the patient lives with the condition for many years.

@edburn’s survival use case is interesting.

Weirdly, person could be at risk to die only on the single day they actually die. Tricky :thinking:

When linking to national death registries the observation period for the single event of death is different than the observation period for other events since all deaths should be captured by that system.

To summarize:

  • Use of OOP data seems ok for inclusion criteria (e.g. exclude prior history of… )
  • Generally not recommended to use OOP data for outcomes because time at risk is unknown
  • Cohort index dates should be inside an OP (?)

OOP = outside observation period

Thanks for sharing your experience and recommendations.

That is how cohorts are constructed using the tools (specifically CIRCE): cohort time is limited to observed time. If you use events outside observation periods, how do you know when to exit the cohort ‘at end of observation’. Note: this applies to entry events when you are ‘in’ the cohort. You can create inclusion criteria that ignores observed time. This is the difference between a cohort and ‘a data record with a start and end date’. Cohorts need to mean something, and to say that they definitely meet certain criteria for a duration of time when you don’t know if the person is being observed for that period of time, it’s a paradox. So is the example of ‘cohort of people without observation periods’…that’s not a cohort, that’s just a group of people.

This notion of putting ‘history of’ information before an observation period is flawed: If I have my patient record, and in 2010 i’m getting some medication, and 2016 I report some history of a condition, does that information get moved backwards before 2010 (to put it before an observation period) making it look like it was in the record as of 2010? I say absolutely not. I argue that the CDM can only maintain record of when you observed something happen, not when it actually happened, because we can’t really be sure about latent disease conditions in a person, we can only record when they were diagnosed. Some of these dates are very similar…you probably took the drug very close to the time it was prescribed. But other things you may not know for sure, but that’s ok, because if you have an unknown diagnosed medical condition, then you’re not going to get some sort of intervention for it because no one knows it exists. Once it is identified (ie: observed) medical decisions are being made based on this new information. I don’t know why ‘history of’ information is treated differently (ie: you learned about it in 2016, but put it in your data in 2004).

Thanks @Chris_Knoll, that all makes a lot of sense and is hard to argue with (at least for the vast majority of use cases).

Seems like this is a case where the cdm specification doesn´t currently quite match the implementation in tools. The reason I´m rather keen to work out what the rule is for a cohort comes from current work on tools where a cohort can be an input and we would like to validate that input. We originally had the validation rule that cohorts needed be within observation (as per circe/ atlas/ capr), but then was faced by various custom cohorts where this was not the case.

The current table description says:

The COHORT table contains records of subjects that satisfy a given set of criteria for a duration of time. The definition of the cohort is contained within the COHORT_DEFINITION table. It is listed as part of the RESULTS schema because it is a table that users of the database as well as tools such as ATLAS need to be able to write to. The CDM and Vocabulary tables are all read-only so it is suggested that the COHORT and COHORT_DEFINTION tables are kept in a separate schema to alleviate confusion.

I guess it comes down to whether an edit like this would be acceptable:

The COHORT table contains records of subjects that satisfy a given set of criteria for a duration of time during which they are under observation as defined in the OBSERVATION_PERIOD table. The definition of the cohort is contained within the COHORT_DEFINITION table. It is listed as part of the RESULTS schema because it is a table that users of the database as well as tools such as ATLAS need to be able to write to. The CDM and Vocabulary tables are all read-only so it is suggested that the COHORT and COHORT_DEFINTION tables are kept in a separate schema to alleviate confusion.

Sounds like the better text, indeed.

No opposition there at all. I cannot think of a good reason why cohorts should live outside the Period. Well, except for static conditions: Age, sex, that kind of thing. I could want a cohort of women. Don’t even know if you can do that in Atlas.

I agree. One-day Observation Periods to make space for a single record that will be there 100% of the time is also violating the Closed World assumption.

Cohorts have an important characteristic of time, and thus far (in CIRCE) has been focused around observations that signal entry into a cohort. There’s no ‘event for age’ in the CDM, but you can do something more cumbersome around Observation Period criteria that would let you enter the cohort on Jan 1 of a given year if you are the required age. There is talk about entering cohorts based on age range (so you are in the cohort between age X and Y), but so far, it hasn’t materialized. While age does have a time context, does gender? What’s the entry event for female (ie; when you say ‘a cohort of females’). You could argue ‘from their observation start’, at which point, you can do that in atlas: Cohort entry event is Observation Periods where gender is Female.

I think we are still confusing ‘groups of people’ with ‘cohorts’ tho. And definitely don’t want to get into it here, so I’ll just stop typing…but exercise to reader: when is a group of people a cohort?

No objection to this text either, but my only hesitation is that it makes cohorts a special case of the application of the Observation Period table, when the community should treat the Observation Table as something so fundamental that it’s literally declaring when the universe for your patients exist. With that in mind, having to say ‘during which they are under observation as defined in the OP table’ would be redundant…we should be saying that about all the data in the CDM, every table, etc. Unfortunately, an exception was made to that rule when we allowed events outside of OP, so maybe now we have to specify when OP is being applied and not…which is a little annoying and definitely prone to error (can’t tell you how many times people have checked CIRCE logic and found differences, and it was always because they weren’t handling Observation Period correctly.