Observational database evaluation for 'fitness of use': any thoughts around best practices?

Patrick_Ryan · October 29, 2016, 12:53am

A common occurrence in my walk of life is the need to evaluate an observational database to determine if it is ‘fit for purpose’ to support a particular analytical use case. More broadly, me and my team has been tasked with evaluating a new database to determine its anticipated value across multiple analytical use cases (e.g. clinical characterization, population-level estimation, patient-level prediction) across an array of exposures of interest (e.g. drugs within various therapeutic areas) and outcomes of interest (e.g. safety and/or effectiveness measures). We may use this ‘value assessment’ as a means of determining which organizations we want to engage in strategic partnerships or which databases we want to invest in enterprise licensing agreements.

Currently, to my knowledge, there is no consensus approach for performing such a database evaluation, nor is it clear what should be reasonable requirements we should impose on a data holder to facilitate an evaluation or what metrics would be sufficient to enable the valuation. Over the course of the last few years, I’ve participating in various evaluations, which have run the extreme from the instance where our evaluation was largely based on powerpoint slides of self-reported accolades from a data vendor to the other extreme where a data holder offered us a evaluation period with an instance of their de-identified source data along with source documentation and during that time we ETLed the data into OMOP CDM, executed ACHILLES to assess data quality and examine prevalence of conditions/drugs/procedures of interest.

From my perspective, there’s a few different dimensions to think about when considering how to conduct a database evaluation for ‘fitness for use’:

what are the analytic use case(s)? Is the data to be used for one drug/one outcome or across a portfolio of exposures/diseases/outcomes? Is the data to be used for crosssectional descriptive assessments or longitudinal evaluations? Is the primary motivator clinical characterization (observation), patient-level prediction (inference), or population-level effect estimation (causal inference)? How concerned am I with the validity of the evidence I’m looking for (is rough ballpark ok or am I looking for properly calibrated and highly precise estimates?)
What are the data required to define the entities of interest (populations/exposures/outcomes/covariates) in my analytic use cases?
a. Types of data: conditions, drugs, procedures, measurements, observations, visits
b. Extent of longitudinality required for patient’s medical history to observe prior conditions/drugs/procedures?
c. Types of patients: all vs. young vs. old; healthy population vs. diseased; population vs. speciality disease
how much time do I have to conduct the evaluation? Do I have a day, a week, a month, a year?
how much certainty must I obtain before I can make a recommendation?
How much transparency will we have into the database under evaluation?
Will I be given the full dataset? Given a random subset? Given the WhiteRabbit ScanReport? Given schema/user documentation?

Independent from these dimensions, there may be objective measures or subjective assessments that I’d like to provide as part of my evaluation:

Number of patients
Duration of follow-up for each person (distribution of length of observation period)
Density of data within each domain
Period of time covered by database
Prevalence of selected drugs/conditions/procedures/measurements of interest
Meta-data about originating population and data capture process
Source schema and source value frequency to estimate feasibility and resource burden for ETLing to OMOP CDM
Plausibility assssments: do patients with disease get appropriate treatments? do patients with treatments have indications? do age/gender-specific observations occur in different age/gender strata (e.g. prostate cancer screening in young women)
Concordance with external references

Some of a database evaluation could logically follow the nice frameworks that @mgkahn et al. have set up for more global approaches to data quality assessment (read https://www.ncbi.nlm.nih.gov/pubmed/27713905 and https://www.ncbi.nlm.nih.gov/pubmed/25992385), but these frameworks largely take the perspective of a data holder or associated affiliate with full and unconstrained access to the source data. In my circumstance, its generally the case that both time and level of access may be constrained, and I’m forward-looking into potential use of the data rather than present-thinking about immediate need.

So, all of that is a long ramble to ask the community: what do you all think makes for an appropriate database evaluation? what process do you follow? what information are you looking for? what tools do you use to perform your evaluation? what could the OHDSI community do to improve the quality, efficiency, and transparency of the data evaluation process?

Please reply to the thread here, and if you are interested in the discussion, join our OHDSI community call next week (see here: OHDSI Community Call 1Nov2016)

hripcsa · October 29, 2016, 10:18am

Great topic. Coincidentally, this is the work that Ning (Sunny) Shang did for the NIH’s BD2K Data Discovery Index project. The framework is ready, although the paper still being written. (Let me cc Sunny for the call.)

George

Patrick_Ryan · October 29, 2016, 10:45am

Thanks @hripcsa, that’s great to hear. I look forward to learning more.
In addition to conceptual frameworks, I’d also really appreciate if folks
have worked examples of evaluations they’ve performed that they could
share, so we can see what works and what can be improved in practice.

Vojtech_Huser · November 1, 2016, 3:38pm

This is a great topic to discuss.

One deliverable of the DataQuality study - will be the ability to run the package at later time to “benchmarch” your database against some empiric reference values. (even months after DQ study end). It will also support exporting a zip file with a vastly smaller subset of achilles pre-computations that also are “safer to share”. (all measures are relative, small cell counts are removed).

I like the call for how other sites are doing it. See also the discussion here: Empiric threshold methodology for DQ Heel rules · Issue #148 · OHDSI/Achilles · GitHub

(forum syntax note:I see…When the link is on a separate line it gets expanded into a preview). If it is not on a separate line (within text), it stays as simple link )

krfeeney · November 2, 2016, 2:21am

@patrick_ryan Really enjoyed today’s discussion, hope the conversation continues. Would love to see this turn into something substantive.

Raising my for interest in future conversations whether it be on the forum or in a working group context.