OHDSI Home | Forums | Wiki | Github

Observation Period Flavors (First THEMIS-Focus Group 2, now discussed everywhere)

@plpo, friends:

This subject comes up all the time and is subject of many debates. We need to nail this for good. I have a very strong opinion, and that is to keep the observation periods as high quality as possible, even if we lose data. Three reasons:

1) Data presence should be reliable, and data absence should be reliable

Observational data are analyzed with the assumption that two (unspoken) axioms are true:

A. If some relevant medical event happens there is a record,
B. If there is no record nothing happened. Which means, we don’t have negative data in there (like “patient didn’t have an MI”).

Only if these two axioms hold, our typical calculations of incidence and prevalence rates work. And we use them all the time, 80% of all studies calculate rates and ratios of rates (relative risks). Axiom A data are in the numerator, Axiom B data (patients without data) are in the denominator.

Axiom A holds true if (i) the patient is observed and (ii) the event is worthy of recording: MIs are worthy, itch on the back is not, and everything else is inbetween. Axiom B holds true if the patient is observed. Either misclassification will distort the rates.

2) Medical History information is low quality

In most cases, that information is obtained from the patient, not from some record. And everybody who ever went to the doctor (i.e. everybody) knows the fidelity of the answers to the questions from the doctor. As a result, physicians put very little significance to these listings of prior problems, at best in order to not miss a potential differential diagnosis. Family history is even worse (“Do you have any diseases in your family?” - “Oh yes, my husband has trouble falling asleep lately”). That kind of thing.

3) We need not follow the hoarding reflex any longer

Folks feel they have to preserve every little snippet of data for every patient. However, we now have a network of more than one Billion patients in the OHDSI network. We can afford losing a few data, if we can improve the quality in return. Observational research has a reproducibility and quality problem, not a sample size problem.

Therefore, I would recommend the following:

  • We create observation periods where we can make the best assumption to have axiom A, and more importantly axiom B, hold true for all three main data tables: CONDITION_OCCURRENCE, DRUG_EXPOSURE and PROCEDURE_OCCURRENCE.
  • The rest goes either in the trash, or into some place where we keep medical history information. The separation will allow folks to use these data if really wanted (e.g. for the infamous exclusion criterion use case), but it will not participate in the general cohort definitions by default.

To use some additional flags or PAYER_PLAN_PERIOD to indicate if the Drug camera or the Condition/Procedure camera is on makes no sense to me. For each query, we would have to join this table, only to exclude the exact same data I want to relegate to history anyway. The vast majority of all cohort definitions today would break, and their definition would become a lot more complicated and performance would go down the drain.

2 Likes

I wouldn’t worry about that, @MPhilofsky. Specialty clinics are not expected to cover all primary healthcare issues, only the complex ones. So, nobody would create the incidence of febrile viral infections in the winter from these data. In your case, you have a mixture of specialty and primary care, and the data would be treated the same way.

The only thing we don’t have is a way to declare these things explicitly. That’s something we should think about in the metadata table.

1 Like
  1. Data presence should be reliable, and data absence should be reliable

Thank you @Christian_Reich for stating this so clearly. This is my issue with the desire to include “old” information from before the observation period. In particular, the absence of data is particularly hard to explain. People often want to include a few extra bits of information to get a few more people with an exposure of interest, forgetting that a lot of similar people are missing.

As an analogy, the issue is very much the same as with outcomes. Just as missing data in the outcomes period can be informative, data outside the observation period can be informative. This can bias outcomes associated with the exposure if the exposure definition period extends outside a contiguous observation period for some people.

Note that this is a different issue from forcing everyone to have the same duration of observation period before an index date (for example, forcing everyone to have a 12 month look-back period when some have valid, longer ones available). In this case, for people with different length observation periods, it is better to use all the information. See Gilbertson, et al. (I am ignoring the situation when applying a definition that requires a specific duration of observation period in which case you should use that.)

Observation periods can be constructed dynamically using the payer_plan_period, or through other modifications to the observation period table. We support that in our application and our data model, and we prefer that flexibility for our purposes. But I can also attest to the fact that dicing the data in this way can affect performance with large datasets. The OHDSI approach is to place information in the CDM that is ready to be analyzed and to consider analytical performance. This isn’t always explicitly stated with the OMOP CDM, so it is important that you clarified this. I think any change needs to be considered in the context of the breaking changes it would introduce.

1 Like

Friends:

Turns out the same discussion is popping up everywhere. I am consolidating. Please let’s have it here.

Since it is by far the hottest THEMIS issue, I suggest we will nail it at the next face-2-face at the 8-9 March 2018, hosted by Amgen in 1000 Oaks.

2 Likes

Copy from the other conversation:

From email:

1 Like

Responses:

Related to this thread . . .

@Ajit_Londhe just pointed out a scenario where we are getting death records outside of the OBSERVATION_PERIOD for a claims database. @clairblacketer, @Ajit_Londhe, and I were going back and forth on what to do. Personally I think I agree with the approach from above, I guess it really isn’t history, but rather future. :slight_smile: If we know they are no longer enrolled but know they died in the future, we are missing a snippet of time that might explain what truly happened to the patient.

We actually know this about everybody. :frowning:

2 Likes

@Christian_Reich - other than being depressing on the matter, would you agree that it would be appropriate to delete a death that occurs let’s say 30+ days after enrollment ends?

If we move forward with the option that I outlined, which Peter advocated
for, then no data that falls outside of the observation period would need
to be deleted (in which case to your scenario, the death record would be
preserved, even though it falls outside of the observation period). A
given source would need to consider at ETL-time how you want to think about
an observation period in this context, but I don’t think it would require
death deletion in any circumstance.

Agreed.

Dear all

We are converting EMR data to CDM.
The Observation_period table is produced by the following method.

“Periods of continuous enrollment is calculated by combining monthly records and
recording the observation_period_startdate for the first period as the enrollment start date
and observation_period_enddate for the last period as the enrollment end date. If the
time between the end of one enrollment period and the start of the next is 30
days or less, treat this as continuous enrollment.”

After the conversion, Achilles_heel report that looks at “events outside observation periods”.

We have checked the data to resolve this error.

In Korea, there are cases where a doctor prescribes medication without a visiting enrollment,
or patient comes back to the hospital a few days after the treatment prescription, examination prescription and then the treatment or examination is carried out without a visiting enrollment.

If it comes without visiting enrollment, it will be recorded as before or after visit enrollment.

That is, this information is an event that the patient visited the hospital correctly,
but the visit registration is not recorded and an Achilles_heel error occurs.

In order to include these data in the observation period,
First, we try to generate observation_period using the drug, examination, treatment, and enrollment tables in EMR.
And then, the time between the end of one enrollment period and the start of the next is 6 months or less, treat this as continuous enrollment.

First point, would we use other tables(drug, examination, treatment table etc.) as well as enrollment tables to create observation_period?
Second point, would we window period 6 months? How long will it be appropriate?
We checked the issues, the claim database has used a window period 1 year based on insurance registration. And EHR databases have used that CCAE database 32days, and GE database 12months.

We would like to hear from OHDSI members if this method is suitable for solving the problem of observation_period invalid in Korea.

@Dahye_Shin:

I think you are trying to use the rules for claims data, instead of those for EHR data. In EHR data, unless your healthcare system has a specific mechanism for “enrolling” patients with an institution, you don’t know if you “have” a patient in active treatment or not during the time when nothing happens. Those axiom 2 situations could mean the patient is healthy and happy like a fish in water, or gone with the wind. There is nothing you can do. Most EHR system define the observation period as between first occurrence of something (Condition, Drug, Procedure etc.) and the last occurrence. Everything in-between is part of the observation period.

@Christian_Reich,

Thank you for your suggestion.
However, we have “missing observation period” problem.

The term “enrollment” doesn’t mean the insurance enrollment in this case. Dahye and we used “visit” information as enrolled period because we can exactly know if a patient was observed or not, only during he/she is visiting the hospital.

Especially in Korea, patients can go to any hospital freely they want.
For me, there’re 4 hospitals I encounter independently; usually A for cold, B for dent, C for surgery or emergency, D for physical examination. It changes very frequently, and A hospital will never know that I’ve got surgery last year. Also C won’t know that I had light fever right after surgery/discharge, if I decide to go to nearest hospital A for fever.

This “blank period” during a patient is not visiting certain hospital could be for one month, or could be 10 years.

That’s why we’re not using the normal EHR rules.

  • “first occurrence of something to last occurrence of something”

Here’re additional issues you may answer,

  • if any other country’s EHR systems have the similar problems of blank observation period?
  • if they have, how “EHR rules” effect on research results?
  • which is better method to build observation period table for “blank” data?

@Sooyeon_Cho:

This is a very good question. Actually two:

  1. How short Observation Periods can be (longitudinal or horizontal separation of data)
  2. What do I do if I know I capture only a fraction of the data (vertical separation of data)

For 1): The idea with EHRs is that if you were sick you would return to the same hospital you went last time, and which probably is the closest to you. I know this is not a given. But the likelihood is there, so people make this approximation. They have nothing else to hang on to. Which means, if there are no data then you are healthy enough not to be in the hospital (even though you might be in a different one). For use cases that look at events inside a visit this is probably fine (rarely people get referred from one hospital to another just like that). For use cases that cover longer times (like studies with long-term follow up) this might create “Axiom 2” errors = You underestimate the rate of events because you would wrongly interpret the whitespace as a time without events.

The idea of making mini-Observation Periods around each visit is dangerous. Because you need the whitespace in-between for the correct prevalence assessment. Otherwise it looks like a patient is “always” in the hospital, or “always” has an asthma exacerbation, because during an Observation Period they do, and outside you are not supposed to look. You can also never define a washout period. In the extreme case, you have one-day Observation Periods: Everything has a prevalence of 100%. So, don’t throw away the whitespace.

That’s fine. I haven’t been in a hospital for 10 years thank God. But if something happened I would have gone.

For 2) This is really a problem for the Metadata and Annotation Workgroup. They need to solve the problem how to capture the fact that there is only partial information available.

1 Like

@Christian_Reich

According to your explanation, observation period as between first occurrence of something (Condition, Drug, Procedure etc.) and the last occurrence.

there will be one observation_period per patient.

Thank you for your detailed explanation.

A proposal about new period_type_concept_id was considered in this discussion. And what the result: will be added new concepts or this idea was rejected?

Thank you in advance

Hello,

@Christian_Reich @mvanzandt
Are there any updates regarding this topic? There are still only non-overlapping periods in OBSERVATION_PERIOD. To not loose patients during the mapping into OMOP CDM, claims databases tend to include any type of enrollment (so we need to use PAYER_PLAN_PERIOD to know whether a given enrollment period has only medical coverage, only pharmacy coverage or both kinds). If we want to analyze such database, then we would need to add our own solution in-between to restrict to only patients with medical+pharmacy coverage during analysed period. But it means that we would not be able to use ATLAS (not sure how R packages will handle such an input in-between - I haven’t checked yet).

Best regards,
Ewa

Hello @Ewas,

You can use the payer_plan_period in Atlas as part of your cohort definition. You are able to add many attributes to the inclusion criteria. See this screenshot:

t