Weekly OHDSI Digest - 1May2017

MauraBeaton · May 1, 2017, 7:43pm

OHDSI MEETINGS THIS WEEK

CDM and vocabulary development working group - Tuesday at 1pm ET
https://imshealth.webex.com/imshealth
Join by phone
US: 610-244-3377
US toll-free: 855-633-8467
UK: +44-203-075-5950
DE: +49-69-6604-4065
Conference ID: 14916110
Host PIN: 6110

Patient-level prediction (Western hemisphere) workgroup meeting - Wednesday at 12pm ET
https://global.gotomeeting.com/join/972917661

Population-level estimation (Eastern hemisphere) workgroup - Wednesday at 3pm Hong Kong time
https://meetings.webex.com/collabs/#/meetings/detail?uuid=M6WE9AOKFETH2VEFPVCZWWBIT0-D1JL&rnd=479368.76362

Architecture workgroup - Thursday at 1pm ET
https://jjconferencing.webex.com/jjconferencing/j.php?MTID=mb7e839a762fbdaab0608f27500679223

ANNOUNCEMENTS

2017 OHDSI Symposium - The date and location for the 2017 OHDSI Symposium has been confirmed! This year’s symposium will take place on Wednesday, October 18th at the Bethesda North Marriott. Full-day tutorial sessions will take place on October 19-20th. If you’re interested in attending, please save the date.

Symposium Registration - Registration for the symposium will open shortly. Check out the symposium website for updates: https://www.ohdsi.org/events/2017-ohdsi-symposium/

OHDSI F2F - Materials from the OHDSI F2F are now available here: https://www.ohdsi.org/past-events/

COMMUNITY PUBLICATIONS

Inter-labeler and intra-labeler variability of condition severity classification models using active and passive learning methods.

ncbi.nlm.nih.gov

Inter-labeler and intra-labeler variability of condition severity classification models using active and passive learning methods.

N Nissim, Y Shahar, Y Elovici, G Hripcsak and R Moskovitch, Artificial intelligence in medicine, Sep 2017

Labeling instances by domain experts for classification is often time consuming and expensive. To reduce such labeling efforts, we had proposed the application of active learning (AL) methods, introduced our CAESAR-ALE framework for classifying the severity of clinical conditions, and shown its significant reduction of labeling efforts. The use of any of three AL methods (one well known [SVM-Margin], and two that we introduced [Exploitation and Combination_XA]) significantly reduced (by 48% to 64%) condition labeling efforts, compared to standard passive (random instance-selection) SVM learning. Furthermore, our new AL methods achieved maximal accuracy using 12% fewer labeled cases than the SVM-Margin AL method. However, because labelers have varying levels of expertise, a major issue associated with learning methods, and AL methods in particular, is how to best to use the labeling provided by a committee of labelers. First, we wanted to know, based on the labelers' learning curves, whether using AL methods (versus standard passive learning methods) has an effect on the Intra-labeler variability (within the learning curve of each labeler) and inter-labeler variability (among the learning curves of different labelers). Then, we wanted to examine the effect of learning (either passively or actively) from the labels created by the majority consensus of a group of labelers.We used our CAESAR-ALE framework for classifying the severity of clinical conditions, the three AL methods and the passive learning method, as mentioned above, to induce the classifications models. We used a dataset of 516 clinical conditions and their severity labeling, represented by features aggregated from the medical records of 1.9 million patients treated at Columbia University Medical Center. We analyzed the variance of the classification performance within (intra-labeler), and especially among (inter-labeler) the classification models that were induced by using the labels provided by seven labelers. We also compared the performance of the passive and active learning models when using the consensus label.The AL methods: produced, for the models induced from each labeler, smoother Intra-labeler learning curves during the training phase, compared to the models produced when using the passive learning method. The mean standard deviation of the learning curves of the three AL methods over all labelers (mean: 0.0379; range: [0.0182 to 0.0496]), was significantly lower (p=0.049) than the Intra-labeler standard deviation when using the passive learning method (mean: 0.0484; range: [0.0275-0.0724). Using the AL methods resulted in a lower mean Inter-labeler AUC standard deviation among the AUC values of the labelers' different models during the training phase, compared to the variance of the induced models' AUC values when using passive learning. The Inter-labeler AUC standard deviation, using the passive learning method (0.039), was almost twice as high as the Inter-labeler standard deviation using our two new AL methods (0.02 and 0.019, respectively). The SVM-Margin AL method resulted in an Inter-labeler standard deviation (0.029) that was higher by almost 50% than that of our two AL methods The difference in the inter-labeler standard deviation between the passive learning method and the SVM-Margin learning method was significant (p=0.042). The difference between the SVM-Margin and Exploitation method was insignificant (p=0.29), as was the difference between the Combination_XA and Exploitation methods (p=0.67). Finally, using the consensus label led to a learning curve that had a higher mean intra-labeler variance, but resulted eventually in an AUC that was at least as high as the AUC achieved using the gold standard label and that was always higher than the expected mean AUC of a randomly selected labeler, regardless of the choice of learning method (including a passive learning method). Using a paired t-test, the difference between the intra-labeler AUC standard deviation when using the consensus label, versus that value when using the other two labeling strategies, was significant only when using the passive learning method (p=0.014), but not when using any of the three AL methods.The use of AL methods, (a) reduces intra-labeler variability in the performance of the induced models during the training phase, and thus reduces the risk of halting the process at a local minimum that is significantly different in performance from the rest of the learned models; and (b) reduces Inter-labeler performance variance, and thus reduces the dependence on the use of a particular labeler. In addition, the use of a consensus label, agreed upon by a rather uneven group of labelers, might be at least as good as using the gold standard labeler, who might not be available, and certainly better than randomly selecting one of the group's individual labelers. Finally, using the AL methods: when provided by the consensus label reduced the intra-labeler AUC variance during the learning phase, compared to using passive learning.

Patient’s satisfaction and incentive programs for physicians.

Personalized glucose forecasting for type 2 diabetes using data assimilation.

ncbi.nlm.nih.gov

Personalized glucose forecasting for type 2 diabetes using data assimilation.

DJ Albers, M Levine, B Gluckman, H Ginsberg, G Hripcsak and L Mamykina, PLoS computational biology, 2017 04

Type 2 diabetes leads to premature death and reduced quality of life for 8% of Americans. Nutrition management is critical to maintaining glycemic control, yet it is difficult to achieve due to the high individual differences in glycemic response to nutrition. Anticipating glycemic impact of different meals can be challenging not only for individuals with diabetes, but also for expert diabetes educators. Personalized computational models that can accurately forecast an impact of a given meal on an individual's blood glucose levels can serve as the engine for a new generation of decision support tools for individuals with diabetes. However, to be useful in practice, these computational engines need to generate accurate forecasts based on limited datasets consistent with typical self-monitoring practices of individuals with type 2 diabetes. This paper uses three forecasting machines: (i) data assimilation, a technique borrowed from atmospheric physics and engineering that uses Bayesian modeling to infuse data with human knowledge represented in a mechanistic model, to generate real-time, personalized, adaptable glucose forecasts; (ii) model averaging of data assimilation output; and (iii) dynamical Gaussian process model regression. The proposed data assimilation machine, the primary focus of the paper, uses a modified dual unscented Kalman filter to estimate states and parameters, personalizing the mechanistic models. Model selection is used to make a personalized model selection for the individual and their measurement characteristics. The data assimilation forecasts are empirically evaluated against actual postprandial glucose measurements captured by individuals with type 2 diabetes, and against predictions generated by experienced diabetes educators after reviewing a set of historical nutritional records and glucose measurements for the same individual. The evaluation suggests that the data assimilation forecasts compare well with specific glucose measurements and match or exceed in accuracy expert forecasts. We conclude by examining ways to present predictions as forecast-derived range quantities and evaluate the comparative advantages of these ranges.

Adjuvant concurrent chemoradiotherapy with low-dose daily cisplatin for extrahepatic bile duct cancer.

ncbi.nlm.nih.gov

Adjuvant concurrent chemoradiotherapy with low-dose daily cisplatin for extrahepatic bile duct cancer.

SW Kim, OK Noh, JH Kim, M Chun, YT Oh, SY Kang, HW Lee, RW Park and D Yoon, Cancer chemotherapy and pharmacology, Jun 2017

We aimed to present the clinical outcomes of adjuvant concurrent chemoradiotherapy (CCRT) with low-dose daily cisplatin regimen compared to the conventional 5-fluorouracil (5-FU)-based regimen for extrahepatic bile duct cancer (EHBDC).From October 1994 to April 2013, 41 patients received adjuvant CCRT with low-dose daily regimen or 5-FU-based regimens. Nineteen patients received low-dose of cisplatin just before every delivery of radiation therapy, and 21 patients received two cycles of 5-FU-based regimen during radiotherapy. We compared the clinical outcomes between two adjuvant CCRT regimens.Adjuvant CCRT with low-dose daily cisplatin showed comparable toxicity profiles compared with that of a 5-FU-based regimen. The median follow-up time was 33 months (range, 5-205), and the 5-year overall survival (OS), locoregional recurrence-free survival (LRRFS), and distant metastasis-free survival (DMFS) were 34.2, 50.8, and 49.7%, respectively. Univariable analyses showed no significant differences in OS, LRRFS, and DMFS between the groups with two regimens. In multivariable analyses, chemotherapeutic regimen was a significant prognostic factor for OS, favoring the low-dose daily cisplatin regimen (HR = 2.491, p = 0.036) over 5-FU-based regimen, though not for LRRFS (p = 0.642) and DMFS (p = 0.756).Adjuvant CCRT with low-dose daily cisplatin regimen showed acceptable toxicities and survivals compared to those of the 5-FU-based regimen. Low-dose daily cisplatin can be one of the feasible regimens for adjuvant CCRT for EHBDC.

ECG-ViEW II, a freely accessible electrocardiogram database.

ncbi.nlm.nih.gov

ECG-ViEW II, a freely accessible electrocardiogram database.

YG Kim, D Shin, MY Park, S Lee, MS Jeon, D Yoon and RW Park, PloS one, 2017

The Electrocardiogram Vigilance with Electronic data Warehouse II (ECG-ViEW II) is a large, single-center database comprising numeric parameter data of the surface electrocardiograms of all patients who underwent testing from 1 June 1994 to 31 July 2013. The electrocardiographic data include the test date, clinical department, RR interval, PR interval, QRS duration, QT interval, QTc interval, P axis, QRS axis, and T axis. These data are connected with patient age, sex, ethnicity, comorbidities, age-adjusted Charlson comorbidity index, prescribed drugs, and electrolyte levels. This longitudinal observational database contains 979,273 electrocardiograms from 461,178 patients over a 19-year study period. This database can provide an opportunity to study electrocardiographic changes caused by medications, disease, or other demographic variables. ECG-ViEW II is freely available at http://www.ecgview.org.