OHDSI Home | Forums | Wiki | Github

Low Quality Records, Start Date before Observation Period, and End Date - same day Discussion - (THEMIS - Group #3)

Hi All,

We would like to request review and discussion on the below three topics that the THEMIS group #3 (Generic, Drug, Condition, Era) is working on.

Topics #1 and #3:

#1 - Low quality records, e.g. CPRD “up to standard” flag.

  • Some databases have flags for lower quality records (e.g. CPRD). In some cases researchers are interested in using the information contained in these records. THEMIS needs to decide what the standard should be for dealing with these records during ETL.

#3 - Start Date before Observation Period.

Proposals

Option 1: Continue to kick out records

Option 2: Store in “history of”

Option 3: THEMIS Focus Group 2 has proposed an idea of multiple overlapping observation period records, noted here Observation Period Flavors (First THEMIS-Focus Group 2, now discussed everywhere)

  • In summary, there would be a different observation period for each standard type of observation.

Examples:

  • Claims data: medical coverage observation period, prescription drug coverage observation period, both medical and prescription drug coverage period.

  • EHR data: data collected during patient visit at data cut points (aka referred to as “network” in the forum post) and all relevant data for patients seen at data cut points (aka referred to as “out of network”). We can wordsmith this.

  • A patient could have multiple types observation periods, with different time windows. Analysts will choose which observation period to use at the time of doing the study (whichever one meets their analysis plan).

Use cases for low quality records

  • Some chronic diseases may only be recorded once in databases like CPRD. If this was not after the “up to standard” date then current ETLs would kick out these records. Researchers should have the option to look at all of the data if they so choose.

  • Other use cases?

Topic #4:
End Date – same day issue.

  • What do you do for start times when all you have is day data?

  • What do you do for end times when all you have is day data?

Proposals

  • Option 1: Special times (0.00 and 23.59)/Blank
  • Other Options?

For Topic #1 and #3, I’d like to offer a Option 4 (which I am not
necessarily endorsing but want to put it on the table, as I have reviewed
with @schuemie, @Rijnbeek, @mdewilde in discussions about the IPCI
database):

  • continue to preserve ‘observation period’ under its current definition of
    the spans of time that a data source is expected to capture data about an
    individual, such that the logic continues to hold that presence of a record
    indicates its occurrence, and absence of a record can be inferred to
    indicate that it did not occur (subject to misclassification error).

  • change the existing convention that data that has timestamps falling
    outside of an observation period should be removed from the CDM,
    effectively allowing any data source to make their own decision about
    whether or not to maintain data that occurs before or after the observation
    period start and end dates.

This would represent a non-breaking change to any of the current OHDSI
tools, since all respect the observation period time when conducting
analyses. But it opens up the opportunity to expand options for applying
inclusion criteria and building covariates by looking back in the time
outside of observation period, under the revised logic that presence of a
record may indicate its occurrence (subject to misclassification) but
absence of a record tells you nothing about the presence or absence of its
occurrence in the time outside of the observation period. Under this
logic, data outside of an observation period should continue to not be
allowed to define initial events to define cohorts, and all
incidence/prevalence estimates should continue to be limited to observation
period time.

This proposal would mean we wouldn’t need new observation period types or
to create breaking changes with overlapping periods (option 3). It also
would mean ETLs would be easier because data outside period would not
longer require being transformed into ‘history of’ observation records
(option1/2).

Thanks Patrick for you post.

I think this would be the ideal solution for us and I think for many European databases. It is very hard to explain why they should through away a very high percentage of data.
The history table or history off observations are clearly suboptimal and would introduce more complicated logic than necessary. The proposed solution is the best of both worlds and for me the best and actually only real solution to the problem.

So 10 points from The Netherlands for this solution :relaxed:

Friends: Let’s all discuss it at the same place. Plus, it looks like this will be the #1 Themis discussion at the face-2-face.

This is partially related to this thread

t