OHDSI Home | Forums | Wiki | Github

[Patient-Level Prediction Workgroup] TC 27-01-2016

Team,

Tomorrow, in the second TC of the Patient-Level Prediction Workgroup, Narges Razavian will present a nice paper entitled “Population-Level Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors”.

link to paper:
http://online.liebertpub.com/doi/full/10.1089/big.2015.0020

See the wiki page for the connection details:
http://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:patient-level_prediction

We will record this session.

Talk to you all tomorrow,

Peter

Thanks @razavian for the nice presentation today in the PLP meeting!.

The recording and slides are available on the wiki page

Some additional questions and remarks on the paper of T2DM prediction from my side that were maybe to detailed for the TC:

  1. You call this population-level risk prediction in the title of the paper instead of patient-level prediction while the aim is clearly to predict individual risk. Is this a matter of terminology or not?
  2. As we discussed it would be very nice to get insight in the transportability of these models by rerunning the model development steps on other type of databases. Once the PLP pipeline has progressed more it would be nice to investigate this further.
  3. In your paper you mention that you compared the performance of mutliple clinically relevant definitions of T2DM. Clearly, a proper outcome definition is crucial here. How exactly did you do this?
  4. Looking at other ways of representing temporality in the model and getting insight into the additive value is very interesting i think. So i look forward to learn more on the convolution method paper you are preparing.
  5. In the parsimonious model you used binary variables for the measurements. Why was this done and would you expect improvement if you would keep them continuous ?
  6. You had to use surrogates for some features because they were not available in the data. Was this only BMI for Obesity or others as well?
  7. I like the idea of incorporating trend information on the measurements. This related to the disease trajectories research area. How did you define fluctuating, increasing, decreasing in your study? increasing compared to the previous only etc? if it is fluctuating it is around a constant value?
  8. You used log-likelihood reweighting to overcome class-imbalance. Does this mean you value a FP as much as a FN? How much impact did this have? on which measures (not AUC)?
  9. I find it interesting that the further you predict the less features end up in the final model after the regularisation. This is probably exactly what you expect right because the less strong features remain.

As i mentioned I will work in New York for two months (march/april probably) and @jennareps will come to NY as well some time. It would be great to visit you and David to see how we can merge our efforts.

Thanks

Thanks @Rijnbeek for the questions! Here are some of my thoughts.

  1. This is a matter of terminology. Partly due to the fact that we analysed the population prediction AUC, but haven’t studied yet how different sub-populations (i.e. people with 6 month history vs people with 4 years of history) get stratified. With those additional analysis I’d be more comfortable calling it patient-level.

  2. Indeed agreed. That’s why we need the implementation discussion asap, so that we can share the models and evaluate at our local databases.

  3. Yes we spent a lot of time finding the correct outcome definition, given that we didn’t have chart review data for these 4.1 million people! In summary here’s what we did: From people who we knew surely have diabetes (based on confirmed multiple A1c) and people who surely didn’t have diabetes (based on low A1c and other measures) we constructed a ‘gold standard’ set, on which we could evaluate different outcome definition criteria. We had to use special techniques to make this ‘gold standard’ set be statistically close to our overall population so our analysis wouldn’t have additional bias. Please read all the details here: (In Part A of the supplementary of the paper, below) http://online.liebertpub.com/doi/suppl/10.1089/big.2015.0020/suppl_file/Supp_Data.doc

  4. Agreed. We are investigating the additive effect. Usually even the simple trend features (increasing, decreasing, fluctuating in the past 6/24/48 months) is also informative in knowing whether the temporal trends have any added value of not. Once that is established, we can move to learning the more detailed trends. I will be happy to talk about convolution model later in the future!

  5. We actually followed the citations of related work and they had also used binary most of the time (with thresholds). Their reason has been that it is easier to compute the final score with pen and paper in the office, using binary values. Do I think we would gain by continuous measurements? Yes, if we allow non-linear incorporation of the numbers in the model. Not sure, if we simply use a weight for it.

  6. The surrogates were for the baseline (parsimonious model). I believe we used only surrogate for BMI and blood pressure, since we didn’t have them in claims dataset. I’ll double check which other variables we might have used surrogates for.

  7. Increasing and decreasing were simply based on first and last measurement in that time interval (i.e. 6/24 months, or ever). If there was not more than 1 measurement these features would remain 0. Fluctuating refers to at least 2 times observed ‘change of direction’ of the value within that time interval, regardless of the initial or average or final value…

  8. Indeed, currently we don’t study different evaluation criteria, and we do value FN and FP similarly. Without this AUC suffers greatly, but we haven’t studied other measures like PPV and NPV as a function of label balance yet.

  9. True, we also noticed that. Further into the future is a more difficult task, which leads to lower quality of prediction and less predictive power for features (usually)… We have thought about multi-task learning where we train different gap models within same optimization framework therefore sharing the predictive features. We haven’t yet done that.

Would be great to get together with you and @jennareps and merge the efforts soon!

t