Observation Period Flavors (First THEMIS-Focus Group 2, now discussed everywhere)

Mark_Danese · January 13, 2018, 5:07pm

Data presence should be reliable, and data absence should be reliable

Thank you @Christian_Reich for stating this so clearly. This is my issue with the desire to include “old” information from before the observation period. In particular, the absence of data is particularly hard to explain. People often want to include a few extra bits of information to get a few more people with an exposure of interest, forgetting that a lot of similar people are missing.

As an analogy, the issue is very much the same as with outcomes. Just as missing data in the outcomes period can be informative, data outside the observation period can be informative. This can bias outcomes associated with the exposure if the exposure definition period extends outside a contiguous observation period for some people.

Note that this is a different issue from forcing everyone to have the same duration of observation period before an index date (for example, forcing everyone to have a 12 month look-back period when some have valid, longer ones available). In this case, for people with different length observation periods, it is better to use all the information. See Gilbertson, et al. (I am ignoring the situation when applying a definition that requires a specific duration of observation period in which case you should use that.)

Observation periods can be constructed dynamically using the payer_plan_period, or through other modifications to the observation period table. We support that in our application and our data model, and we prefer that flexibility for our purposes. But I can also attest to the fact that dicing the data in this way can affect performance with large datasets. The OHDSI approach is to place information in the CDM that is ready to be analyzed and to consider analytical performance. This isn’t always explicitly stated with the OMOP CDM, so it is important that you clarified this. I think any change needs to be considered in the context of the breaking changes it would introduce.

Christian_Reich · January 31, 2018, 6:28pm

Friends:

Turns out the same discussion is popping up everywhere. I am consolidating. Please let’s have it here.

Since it is by far the hottest THEMIS issue, I suggest we will nail it at the next face-2-face at the 8-9 March 2018, hosted by Amgen in 1000 Oaks.

Christian_Reich · January 31, 2018, 6:31pm

Copy from the other conversation:

Low Quality Records, Start Date before Observation Period, and End Date - same day Discussion - (THEMIS - Group #3)

Hi All,

We would like to request review and discussion on the below three topics that the THEMIS group #3 (Generic, Drug, Condition, Era) is working on.

Topics #1 and #3:

#1 - Low quality records, e.g. CPRD “up to standard” flag.

Some databases have flags for lower quality records (e.g. CPRD). In some cases researchers are interested in using the information contained in these records. THEMIS needs to decide what the standard should be for dealing with these records during ETL.

•#3 - Start Date before Observation Period.

Proposals

Option 1: Continue to kick out records

Option 2: Store in “history of”

Option 3: THEMIS Focus Group 2 has proposed an idea of multiple overlapping observation period records, noted here Observation Period Flavors (First THEMIS-Focus Group 2, now discussed everywhere)

In summary, there would be a different observation period for each standard type of observation.

Examples:

Claims data: medical coverage observation period, prescription drug coverage observation period, both medical and prescription drug coverage period.
EHR data: data collected during patient visit at data cut points (aka referred to as “network” in the forum post) and all relevant data for patients seen at data cut points (aka referred to as “out of network”). We can wordsmith this.
A patient could have multiple types observation periods, with different time windows. Analysts will choose which observation period to use at the time of doing the study (whichever one meets their analysis plan).

Use cases for low quality records

Some chronic diseases may only be recorded once in databases like CPRD. If this was not after the “up to standard” date then current ETLs would kick out these records. Researchers should have the option to look at all of the data if they so choose.
Other use cases?

Topic #4:End Date – same day issue.

What do you do for start times when all you have is day data?
What do you do for end times when all you have is day data?

Proposals

Option 1: Special times (0.00 and 23.59)/Blank
Other Options?

Low Quality Records, Start Date before Observation Period, and End Date - same day Discussion - (THEMIS - Group #3)

For Topic #1 and #3, I’d like to offer a Option 4 (which I am not
necessarily endorsing but want to put it on the table, as I have reviewed
with @schuemie, @Rijnbeek, @mdewilde in discussions about the IPCI
database):

continue to preserve ‘observation period’ under its current definition of
the spans of time that a data source is expected to capture data about an
individual, such that the logic continues to hold that presence of a record
indicates its occurrence, and absence of a record can be inferred to
indicate that it did not occur (subject to misclassification error).
change the existing convention that data that has timestamps falling
outside of an observation period should be removed from the CDM,
effectively allowing any data source to make their own decision about
whether or not to maintain data that occurs before or after the observation
period start and end dates.

This would represent a non-breaking change to any of the current OHDSI
tools, since all respect the observation period time when conducting
analyses. But it opens up the opportunity to expand options for applying
inclusion criteria and building covariates by looking back in the time
outside of observation period, under the revised logic that presence of a
record may indicate its occurrence (subject to misclassification) but
absence of a record tells you nothing about the presence or absence of its
occurrence in the time outside of the observation period. Under this
logic, data outside of an observation period should continue to not be
allowed to define initial events to define cohorts, and all
incidence/prevalence estimates should continue to be limited to observation
period time.

This proposal would mean we wouldn’t need new observation period types or
to create breaking changes with overlapping periods (option 3). It also
would mean ETLs would be easier because data outside period would not
longer require being transformed into ‘history of’ observation records
(option1/2).

Christian_Reich · January 31, 2018, 6:39pm

From email:

THEMIS-Focus Group 2

Current Problem:
Observation periods have been defined differently due to different data sets containing data collected at different periods of time. It is unknown if future analytic use cases will require this “extra data”. Due to OHDSI’s rule of analyzing data within an observation period only, this “extra data” presents a problem that affects the observation period record start and end dates. If users are going to use the observation period to determine when a patient is observable, how can we include/exclude this extra data when needed.

Use Cases to Consider:

CPRD using Up To Standard (UTS) variable. This field determines the start of a patient’s observation period in current ETL standards. However, data is collected prior to the UTS variable and analysts will want to review that data for evidence of baseline conditions. Current ETL standards suggest this data prior to UTS will never be considered for analysis and should be omitted from the CDM. This prevents analysts from ever reviewing data prior to UTS.

U.S. Claims data with different insurance enrollment periods, most notably medical and prescription drug coverage. Different analyses will require claims data be considered when a patient has medical coverage, prescription coverage, or both. This will vary based on the analysis question. For example, If the analysis only requires medical coverage (no prescription drug coverage is required because all of the clinical events studied are covered under medical coverage), then the claims that exist outside of pharmacy coverage should be included. ETL guidelines change based on claims dataset. If the dataset includes both medical and pharmacy data, then the observation period reflects when both coverages are active. If the dataset specifically mentions medical coverage only, the observation period will only include medical coverage. Based on this loose ETL guideline, analysts are limited to the type of analyses they can perform on CDMs due to the type of observation period created.

Some European databases (Peter – please reference your dataset) will include data outside of the “data cut”. For example, if the dataset is going to provide patients seen between 2011-2016, it is possible to get extra information for patients seen prior to 2011, described as “patient history”. Since this data is outside of the “data cut” (2011-2016) in which the observation period is created, then where should the “patient history” data go since it is not technically within the observation period? This patient history data is useful for some analyses that may look for evidence of certain conditions.

Proposal
Data ETL’d into OMOP CDM can have multiple overlapping observation periods, differentiated by a standard concept ID indicated in the observation_period.period_type_concept_id. Period_type_concept_ids will include the following:

Medical coverage

Prescription coverage

Medical and prescription coverage

Pre-qualified coverage

Qualified coverage

In practice network

Includes out of practice data

Items 1, 2, and 3 are specific for US claims data. Items 3 and 4 are specific to European EHR data. Items 6 and 7 are specific to NHS (UK) data – specifically CPRD and HES data.

Proposal Applied to Use Cases

Two overlapping observation periods will be created for CPRD data. One large observation period that includes all data. And one observation period that uses the UTS date as the start of the observation period. Users will select the large observation period to include all data in the analysis. Users will select the UTS observation period to use the UTS data as a “qualified coverage” when considering just data after a practice’s UTS date.

Three overlapping observation periods will be created for US claims databases (as applicable). One for medical coverage, one for prescription coverage, and one where medical and prescription coverage overlap. (there is a possibility of creating an overarching observation period where the patient can have medical and/or prescription coverage – is this applicable for anyone?).

Similar to #1 for CPRD, two overlapping observation periods will be created. One that includes pre-qualified data and another that only includes “qualified” data (or data within the visit date range 2011-2016 per the example above).

Example ETL of US Claims data:

Raw Claims Data for Patient 123

Observation Period Table

Considerations:
OHDSI tools are built to only consider one observation period per database. OHDSI tools will need to be amended to allow users to select an observation period type of their choice. In the meantime (i.e. while that is getting fixed), either

Have OHDSI tools use the min/max of all observation periods for analyses

Implement a standard that if there are multiple observation period types, then the ETL’er will have to create one large overlapping observation period that OHDSI tools will choose work against.

Christian_Reich · January 31, 2018, 6:52pm

Responses:

jenniferduryea:

I am open to wordsmithing these period_type_concept_ids. The current proposal:

Medical coverage

Prescription coverage

Medical and prescription coverage

Pre-qualified coverage

Qualified coverage

In practice network

Includes out of practice data

I am open to changing #6 and #7 to what Corina suggests as #6 = data quality standards met. The idea seems similar to #3 and #4. Maybe we can just use #3 and #4 and create an all inclusive observation period for “all available data”. Then CPRD and EHR systems will use #4, #5, and #6, where #6 will be the over-arching observation period that will include all data. Please see updated list.

Medical coverage

Prescription coverage

Medical and prescription coverage

Pre-qualified coverage

Qualified coverage

All available data

And I like Aaron’s suggestion of adding more information to use case #3 to include US EHR datasets. Aaron, would you mind referencing that dataset in the wording for use case #3? In response to your questions, I believe we were looking at the use case of data that lives outside of the “data cut”. And we can just categorize that data as “pre-qualified coverage” data. I don’t think we need to do extra categorization on this type of data – especially when it’s questionable even to include this data in current ETL standards.

Christian_Reich:

I really hate to spoil the nice conversation, particular because the proposal very eloquently summarizes the problem. And I talked too much about this subject anyway. But the flagging business is not going to fly. Because it only serves a single use case: inclusion criteria of pre-existing conditions/drugs/procedures. But it would kill every cohort we built so far, which means thousands.

Also, I think you are mixing too issues:
     Metadata about what aspect of healthcare a database (i.e. all patients in there) captures (e.g. in-practice network)
     Metadata about what domains of a specific patient are captured at a certain time (whether the camera is on or not)
The former is important, but it is not patient-specific. The latter – 90% of all analytics require the Drug, Condition and Procedure Domains. That’s what should be captured during the observation period. All other partial coverages should go into a separate place (e.g. a new HISTORY table), where you can satisfy the pre-existing criteria need.

Please, please, please.

Meps etl

Hi Christian,

Thank you for your comments. To clarify, I do not think we are mixing the two issues. I believe we are addressing when the “camera is on” in every use case presented. The problem is, the camera can be different for every analysis because the analysis requires a different camera.

I know that I mentioned a number of claims and EHR use cases in the proposal below, but I am most familiar with U.S. claims databases, so I’m going to focus on that in my response. For U.S. claims data, where patients can have two types of health coverage (medical and prescription), the analysis may only require a patient be covered under medical insurance because the drugs and/or clinical events studied are all covered under medical insurance. However, if the observation period is limited to when a patient has both medical and prescription drug coverage, you are going to miss all drugs/clinical events that may have happened when the user did not have prescription coverage. (for people who are not familiar with claims data, certain drugs are covered under a patient’s medical insurance, such as chemotherapy and other physician-administered drugs. So only requiring medical coverage may be sufficient for analysis).

I do sympathize that this change will not be backwards compatible to previously-completed cohorts and, per your forum comments Observation Period Flavors (First THEMIS-Focus Group 2, now discussed everywhere), OHDSI tool performance may decline.

But, let us say we take the path that you mention in your forum post - one observation type per person and no additional flags in the Payer Plan Period table. Now, let us apply this to a U.S. claim database. If users want to do analyses that require all three observation period types (medical coverage only, prescription coverage only, medical and prescription coverage only), the users will be forced to have three versions of the same U.S. claims database - one for each coverage type. So researchers will now need to house three versions of a multiple terabyte Marketscan dataset to run all of their analyses. This is not an impossible solution (and quite a financial boom to data providers). Focus Group 2 did discuss this as an alternative. But, I believe everyone in our group decided that was not feasible nor practical to ask a researcher to do this. However, I am open to getting the OHDSI community’s feedback on housing multiple versions of a dataset.

As a group, we also discussed EHR use cases having different observation period types and it also seemed to solve the use cases as well. So this change will be applicable to all datasets that OHDSI strives to cover.

So there are two proposals I see:

Allow different observation period types (THEMIS Focus Group 2 sponsored)

Restrict datasets to one observation period type and create a different dataset with a different observation period type, if needed.

Please let me know if I misunderstood anything.

ericaVoss · February 9, 2018, 8:20pm

Related to this thread . . .

@Ajit_Londhe just pointed out a scenario where we are getting death records outside of the OBSERVATION_PERIOD for a claims database. @clairblacketer, @Ajit_Londhe, and I were going back and forth on what to do. Personally I think I agree with the approach from above, I guess it really isn’t history, but rather future. If we know they are no longer enrolled but know they died in the future, we are missing a snippet of time that might explain what truly happened to the patient.

Christian_Reich · February 24, 2018, 3:52am

We actually know this about everybody.

ericaVoss · February 26, 2018, 4:55pm

@Christian_Reich - other than being depressing on the matter, would you agree that it would be appropriate to delete a death that occurs let’s say 30+ days after enrollment ends?

Patrick_Ryan · February 26, 2018, 6:21pm

If we move forward with the option that I outlined, which Peter advocated
for, then no data that falls outside of the observation period would need
to be deleted (in which case to your scenario, the death record would be
preserved, even though it falls outside of the observation period). A
given source would need to consider at ETL-time how you want to think about
an observation period in this context, but I don’t think it would require
death deletion in any circumstance.

Christian_Reich · February 26, 2018, 6:31pm

Agreed.

Dahye_Shin · April 18, 2018, 9:03am

Dear all

We are converting EMR data to CDM.
The Observation_period table is produced by the following method.

“Periods of continuous enrollment is calculated by combining monthly records and
recording the observation_period_startdate for the first period as the enrollment start date
and observation_period_enddate for the last period as the enrollment end date. If the
time between the end of one enrollment period and the start of the next is 30
days or less, treat this as continuous enrollment.”

After the conversion, Achilles_heel report that looks at “events outside observation periods”.

We have checked the data to resolve this error.

In Korea, there are cases where a doctor prescribes medication without a visiting enrollment,
or patient comes back to the hospital a few days after the treatment prescription, examination prescription and then the treatment or examination is carried out without a visiting enrollment.

If it comes without visiting enrollment, it will be recorded as before or after visit enrollment.

That is, this information is an event that the patient visited the hospital correctly,
but the visit registration is not recorded and an Achilles_heel error occurs.

In order to include these data in the observation period,
First, we try to generate observation_period using the drug, examination, treatment, and enrollment tables in EMR.
And then, the time between the end of one enrollment period and the start of the next is 6 months or less, treat this as continuous enrollment.

First point, would we use other tables(drug, examination, treatment table etc.) as well as enrollment tables to create observation_period?
Second point, would we window period 6 months? How long will it be appropriate?
We checked the issues, the claim database has used a window period 1 year based on insurance registration. And EHR databases have used that CCAE database 32days, and GE database 12months.

We would like to hear from OHDSI members if this method is suitable for solving the problem of observation_period invalid in Korea.

Christian_Reich · April 18, 2018, 10:02am

@Dahye_Shin:

I think you are trying to use the rules for claims data, instead of those for EHR data. In EHR data, unless your healthcare system has a specific mechanism for “enrolling” patients with an institution, you don’t know if you “have” a patient in active treatment or not during the time when nothing happens. Those axiom 2 situations could mean the patient is healthy and happy like a fish in water, or gone with the wind. There is nothing you can do. Most EHR system define the observation period as between first occurrence of something (Condition, Drug, Procedure etc.) and the last occurrence. Everything in-between is part of the observation period.

Sooyeon_Cho · April 19, 2018, 8:05am

@Christian_Reich,

Thank you for your suggestion.
However, we have “missing observation period” problem.

The term “enrollment” doesn’t mean the insurance enrollment in this case. Dahye and we used “visit” information as enrolled period because we can exactly know if a patient was observed or not, only during he/she is visiting the hospital.

Especially in Korea, patients can go to any hospital freely they want.
For me, there’re 4 hospitals I encounter independently; usually A for cold, B for dent, C for surgery or emergency, D for physical examination. It changes very frequently, and A hospital will never know that I’ve got surgery last year. Also C won’t know that I had light fever right after surgery/discharge, if I decide to go to nearest hospital A for fever.

This “blank period” during a patient is not visiting certain hospital could be for one month, or could be 10 years.

That’s why we’re not using the normal EHR rules.

“first occurrence of something to last occurrence of something”

Here’re additional issues you may answer,

if any other country’s EHR systems have the similar problems of blank observation period?
if they have, how “EHR rules” effect on research results?
which is better method to build observation period table for “blank” data?

Christian_Reich · April 19, 2018, 10:48am

@Sooyeon_Cho:

This is a very good question. Actually two:

How short Observation Periods can be (longitudinal or horizontal separation of data)
What do I do if I know I capture only a fraction of the data (vertical separation of data)

For 1): The idea with EHRs is that if you were sick you would return to the same hospital you went last time, and which probably is the closest to you. I know this is not a given. But the likelihood is there, so people make this approximation. They have nothing else to hang on to. Which means, if there are no data then you are healthy enough not to be in the hospital (even though you might be in a different one). For use cases that look at events inside a visit this is probably fine (rarely people get referred from one hospital to another just like that). For use cases that cover longer times (like studies with long-term follow up) this might create “Axiom 2” errors = You underestimate the rate of events because you would wrongly interpret the whitespace as a time without events.

The idea of making mini-Observation Periods around each visit is dangerous. Because you need the whitespace in-between for the correct prevalence assessment. Otherwise it looks like a patient is “always” in the hospital, or “always” has an asthma exacerbation, because during an Observation Period they do, and outside you are not supposed to look. You can also never define a washout period. In the extreme case, you have one-day Observation Periods: Everything has a prevalence of 100%. So, don’t throw away the whitespace.

That’s fine. I haven’t been in a hospital for 10 years thank God. But if something happened I would have gone.

For 2) This is really a problem for the Metadata and Annotation Workgroup. They need to solve the problem how to capture the fact that there is only partial information available.

Dahye_Shin · April 24, 2018, 5:54am

@Christian_Reich

According to your explanation, observation period as between first occurrence of something (Condition, Drug, Procedure etc.) and the last occurrence.

there will be one observation_period per patient.

Thank you for your detailed explanation.

Olga_Osintseva · June 17, 2019, 1:20pm

A proposal about new period_type_concept_id was considered in this discussion. And what the result: will be added new concepts or this idea was rejected?

Thank you in advance

Ewas · September 1, 2022, 7:43pm

Hello,

@Christian_Reich @mvanzandt
Are there any updates regarding this topic? There are still only non-overlapping periods in OBSERVATION_PERIOD. To not loose patients during the mapping into OMOP CDM, claims databases tend to include any type of enrollment (so we need to use PAYER_PLAN_PERIOD to know whether a given enrollment period has only medical coverage, only pharmacy coverage or both kinds). If we want to analyze such database, then we would need to add our own solution in-between to restrict to only patients with medical+pharmacy coverage during analysed period. But it means that we would not be able to use ATLAS (not sure how R packages will handle such an input in-between - I haven’t checked yet).

Best regards,
Ewa

MPhilofsky · September 22, 2022, 3:57pm

Hello @Ewas,

You can use the payer_plan_period in Atlas as part of your cohort definition. You are able to add many attributes to the inclusion criteria. See this screenshot:

Ewas · October 12, 2022, 4:28pm

Hi @MPhilofsky,

Thank you for your answer! I am not an experienced user of ATLAS, so might be wrong but I cannot find a way to require e.g. 365 days of continuous medical and pharmacy enrollment prior to index date.
We would need to combine overlapping records in PAYER_PLAN_PERIOD into continuous period of medical coverage, which I do not think is possible in this section of Atlas.
Am I missing something?

Thank you in advance!
Ewa

Chris_Knoll · October 13, 2022, 3:33am

I hope that we can not overload observation period with different flavors of OP and instead leverage something like payer plan period to represent periods of time of special coverage.