OHDSI Home | Forums | Wiki | Github

First step: Defining the broad research approach

Hi @hripcsa,

Good point, I think we should just always report both coverage and predictive accuracy. Requiring that you always exclude 1 might be tricky because we may just not have enough power, but there’s nothing stopping us from including negative controls next to our RCT-derived positives to estimate AUC.

Cheers,
Martijn

Summarizing our discussion at the OHDSI face-to-face:

Martijn gave a short introduction of the workgroup, and started the discussion on method evaluation. He pointed out that real positive controls are problematic because they are known, and doctors will try to mitigate the effect of the drug.

  • It was remarked that the current evaluation appears to focus only on detecting effects present during exposure. The notion of effects that require accumulation of exposure seems missing. Even though those effects would still fall under effects during exposure, we are indeed not focusing on such effects. They are not out of scope for OHDSI, but are probably out of scope for this evaluation since we have to focus on something, and effects during exposure are an important topic.

  • Whilst our negative controls can have unmeasured confounding, our synthetic positive controls are not able to preserve unmeasured confounding. One way to address this shortcoming of our methodology is to mimic missing confounding by removing data available for adjustment.

  • Evaluating against Randomized Controlled Trials (RCTs) is important for political reasons, but is problematic because

    • RCTs themselves are likely biased due to non-random acts after the moment of randomization
    • RCTs typically have limited sample size
    • Even though we would like to have RCTs with observational data preceding the moment the effect was known, the effect was probably already known long before the trial
  • Adler Perotte may help in identifying RCTs to include in our evaluation. He has been working on codfying the inclusion and exclusion criteria for trials so they can be implemented in the CDM.

  • Alejandro Schuler has developed an advanced approach for simulating effects. It may be possible to use this to injection signals on top of negative controls.

  • Many more people have indicated they want to be involved in the task force. Martijn recommended they post their intentions on the forums, so they can be added.

Let me try and wrap up this discussion.

The broad research approach we’ll take is to evaluate methods using:

  • Negative controls (n > 100)
  • Synthetic positive controls derived from these positive controls (n > 100)
  • Some set of RCTs (1 > n > 100)

The evaluation will focus on estimation of relative risk during exposure (as opposed to estimation of risk due to cumulative exposure)

Methods will be evaluated on a set of observational databases (to be determined, but will include the databases at JnJ).

I propose we write at least two papers:

  1. Description of the Standardized Method Evaluation Framework (Benchmark?), demonstrated on one or two vanilla methods.
  2. Application of the Standardized Method Eveluation Framework to a large set of methods currently being used in observational research (including new-user cohort method using propensity scores, self-controlled case series, case-control, and case-crossover), including a large set of possible analysis choices within each method.

Let me know if you agree (or not)!

Sounds great. Are we tracking the RCTs? (E.g., Adler, Nigam?) George

Sorry for being dense. What exactly do you mean by ‘tracking the RCTs’?

Sorry, wrong word. Creating a collection of RCTs that meet a reasonable set of criteria (size, can be implemented in OHDSI, etc.), and collect the effect sizes and variances.

George

If we all agree on the overall approach, I think the next step is to identify the tasks that need to be completed. I came up with this list:

  • Identify exposures of interest and negative controls
  • Refine approach to positive control synthesis
  • Evaluate effect of unmeasured confounding in positive control synthesis
  • Identify RCTs and implement inclusion criteria*
  • Implement case-crossover / case-time-control
  • Define universe of methods to evaluate
  • Identify list of databases to run on
  • Develop evaluation metrics
  • Implement and execute evaluation
  • Write papers

One of these tasks is the one George mentioned: * creating a collection of RCTs, for which I will create a separate topic.

I would like to ask for volunteers for these task! Please start topics for any one of these if you like. You all joined the task force, so I’m expecting you must be eager to roll up your sleeves and get to work!

1 Like

I’ll be synthesizing both positive and negative controls.

I will also be able to contribute to the identification of methods to evaluate on, but mainly within the realm of new-user cohort methods.

Also, a quick question- what do you mean by evaluating the effect of unmeasured confounding in data synthesis?

By “evaluating the effect of unmeasured confounding in data synthesis” I meant that no matter how we decide to generate positive controls, we will always need to make some sort of assumption on the nature and magnitude of unmeasured confounding. I was thinking we could do some empirical evaluation of those assumptions, although I’m not yet sure what that would look like.

Yeah, by the nature of the question it’s impossible to evaluate it directly in real data.

Hi, @schuemie
You know I’m still a novice in this field. So I’m not sure that I can be helpful. And my English is poor. But I just wanted to write what I though about this question

  • Database
    I want to provide data from Korean national health insurance system (NHIS). NHIS covers more than 98% of Koreans. NHIS-sample cohort database contains 2% of total population (1M) and also contains results from health examination. The converting process has been almost done. I’m planning to open the ETL queries and write a paper for this process.

  • Emulating RCT:
    Since I am basically a clinician (cardiologist), I’m really eager to emulate RCT by using observational study.
    The two most striking RCTs in 2016 were ‘HOPE-3’ and ‘SPRINT’ to me. These two RCTs met the criteria discussed earlier in this thread. They used existing drugs for patients with novel criteria. The study results were positive (, which surprised me and others).
    HOPE3 : http://www.nejm.org/doi/full/10.1056/NEJMoa1600176#t=article
    SPRINT : http://www.nejm.org/doi/full/10.1056/NEJMoa1511939#t=article
    Or many great RCTs targeting HF-PEF(heart failure with preserved ejection fraction) had negative results. We can also replicate these RCTs as negative controls.
    I really agree with @hripcsa that we need to emulate RCTs, and I’ll try to. But I’m not sure emulating RCTs can be a ‘golden standard’ to verify validity of method. Because I don’t think we can replicate the exact inclusion criteria and study protocols. But again, replicating RCTs is very important and meaningful by itself.
    (Actually, I don’t think the result from replication study of SPRINT or HOPE-3 trial in Korea would be positive. Because cardiovascular risk is much lower in Asian. It would be very hard to prove beneficial effect of drugs in intermediate risk population.)

  • Positive / Negative controls
    I agree with the idea of negative control. But we cannot pick the real negative control, because there can be many unobserved confounding factors as @schuemie said. Since the database we have doesn’t reflect the true whole medical history of patients. So I’m not sure that negative controls we pick can be served as the ‘gold standard’.
    For example, ‘ingrowing nail’. I do believe that anti-hypertensive drugs are not associated with ‘ingrowing nail’. But accessibility to health care system, socioeconomic status, or worrying about one’s health can be related with ‘medical claim or diagnosis code’ with ingrowing nail.
    So I agree with @aschuler 's ideas synthesizing both positive and negative controls.

Furthermore, I think that the the team for ‘positive / negative control’ and the team for ‘method development’ should be separated. If ‘method development’ team knows how positive and negative control are made, they will develop the method for this specific logic.

I can help as possible as I can

Welcome to the task force @SCYou!

I would argue that because real negative controls likely have strong unmeasured confounding makes them ideal to evaluate methods! We want to make evaluate how well methods perform in the real world, not in a simulated ideal world (note that @aschuler’s approach also introduces unmeasured confounding).

One very important thing I realize we haven’t discussed: should we evaluate methods that try to quantify risk attributable to an exposure, or methods for comparative effectiveness. In other words, methods tend to answer one of these questions:

  1. What is the change in risk of outcome X due to exposure to A?
  2. What is the change in risk of outcome X due to exposure to A compared to exposure to B?

Question 1 can often be answered by reformulating it as question 2 by picking a comparator believed to have no effect on the risk. For example, in our Keppra and angioedema study we picked phenytoin as a comparator because we were certain it did not cause angioedema, allowing us to estimate the effect of Keppra.

I must confess I’m mostly interested in question 1, since comparative effectiveness methods can be viewed as answering question 1 by picking a ‘null comparator’ as argued above. But we could create two gold standards, one for question 1 methods and one for question 2 methods.

@aschuler, there is at least one thing we can do to evaluate unmeasured confounding: we can compare an evaluation using true negative controls to an evaluation using your simulation framework where the relative risk is 1 (no effect). If the simulation procedure is realistic enough, those two evaluations should generate the same results.

1 Like

FYI: I’ve put my notes and slides of yesterday’s meeting on the Wiki.

In summary, I think we decided to:

  1. Focus on created a ‘benchmark’ for population-level estimation methods, that shows how well methods work in general
  2. Go with synthesizing positive controls by injecting outcomes on top of negative controls (at least for now)

Based on @saradempster’s suggestion I’ve created a template protocol for establishing the benchmark. I hope everyone will join in filling in this protocol!

You can find the link to the protocol template in this topic.

Thanks Martijn! I will take a l ook asap. @schuemie - how do you want to receive comments i.e. posted here or in the document itself?

Just thinking further about metrics for assessing CIs.

If we are really interested in effect estimation, then want confidence intervals w.r.t. true value:

  • coverage

  • mean CI width

  • variance of CI width

  • bias (point estimate or CI midpoint versus true value)

  • see Kang and Schmeiser CI scatterplots [1] (e.g., CI half width versus midpoint)
    (they are much like Martijn’s scatter plots)

If we want to discover associations, then we want confidence intervals w.r.t. no effect (1), and the true value is irrelevant other than its direction:

  • this is really just a hypothesis test (p-value)

  • specificity is set at .95 (95% coverage of negative controls after calibration)

  • sensitivity is proportion of excluding no-effect (1) for positive controls
    can derive relation of sensitivity to CI: (CIwidth / 2) < EffectSize - 1

  • ROC area calculated based on point estimate of specificity and sensitivity
    (or perhaps could generate curve by altering alpha .2, .1, .05, .03, .01)

Just noticing that when we do p-value calibration and report coverage, we really should also report power on positive controls.

  1. Keebom Kang, Bruce Schmeiser, (1990) Graphical Methods for Evaluating and Comparing Confidence-Interval Procedures. Operations Research 38(3):546-553. http://dx.doi.org/10.1287/opre.38.3.546

George

@hripcsa: moved this discussion here

Hi @saradempster! Just add comments to the document itself.

t