OHDSI MEETINGS THIS WEEK
2019 OHDSI F2F - June 3-4th at Case Western Reserve University in Cleveland, OH
Event page: https://www.ohdsi.org/events/2019-ohdsi-face-to-face/
Gold Standard Phenotype Library WG meeting - Tuesday at 10am ET
Book of OHDSI WG meeting - No meeting this week
OHDSI Community Call - No meeting this week
Common Data Model & Vocabulary WG meeting - Meeting re-scheduled to next week (Tuesday, June 11th)
ATLAS workgroup meeting - Wednesday at 10am ET
ACHILLES 2.0 working group meeting - Wednesday at 2pm ET*
Population-Level Estimation WG (Western Hemisphere) - Thursday at 12pm ET*
GIS Working Group Meeting - Next Monday (June 10th) at 10am ET
Meeting Number: 735 317 239
You can find a full list of upcoming OHDSI meetings here:
*Meetings may be re-scheduled due to OHDSI F2F, please confirm with WG lead
Looking for presenters for upcoming OHDSI community calls We are looking for collaborators to share their work on upcoming OHDSI calls. If you are interested in presenting on an upcoming OHDSI call please email me at firstname.lastname@example.org
2019 OHDSI Symposium - REGISTER NOW! - It’s official! Registration is now open for the 2019 OHDSI Symposium. This year’s OHDSI Symposium will take place September 15-17th 2019 at the Bethesda North Marriott, with the main symposium on Monday, September 16th and tutorials on September 15th and 17th. You can register here: https://www.ohdsi.org/symposium-registration-3/
For more information on the symposium, check out the 2019 OHDSI Symposium event page: https://www.ohdsi.org/events/2019-ohdsi-symposium/
2019 OHDSI Symposium - CALL FOR PARTICIPATION - The 2019 OHDSI Symposium Planning Committee is now accepting abstracts for the collaborator showcase. We are accepting admissions for posters, oral presentations and software demonstrations. For more details about submission types and topics of interest, please check out our collaborator showcase page: https://www.ohdsi.org/collaborator-showcase-2/
The deadline for abstract submissions is 8pm ET on Monday, June 24th, 2019.
For more details about submission guidelines and to submit your abstract, please check out our submissions page: https://www.ohdsi.org/collaborator-showcase-submissions/
2019 OHDSI Symposium - ABSTRACT MENTORSHIP - If you want like extra support with your abstract submission for this year’s collaborator showcase, you can request a mentor here: https://www.ohdsi.org/collaborator-showcase-2/
The deadline to request a mentor is June 7th
2019 OHDSI Symposium - CREATIVE SUBMISSIONS - In addition to scientific submissions for the collaborator showcase, we’re also accepting creative submissions. We want to give collaborators a chance to showcase their special talents! This could include, playing a musical instrument, singing, an interpretive dance, or an OHDSI-inspired painting. For more information about creative submissions, please check out our creative submissions page:
The deadline for creative submission is 5pm ET on Monday, August 12th, 2019
2019 OHDSI Symposium - TUTORIALS Registration is now open for tutorals at this year’s OHDSI Symposium. Tutorials are set to take September 15th and 17th. More details about tutorials being offered is available here: https://www.ohdsi.org/tutorialworkshops2019/
Register for tutorials here: https://www.ohdsi.org/tutorialregistration2019/
Inside every cynical person, there is a disappointed idealist.
George Carlin COMMUNITY PUBLICATIONS
Risks and clinical predictors of cirrhosis and hepatocellular carcinoma diagnoses in adults with diagnosed NAFLD: real-world study of 18 million patients in four European cohorts.
M Alexander, AK Loomis, J van der Lei, T Duarte-Salles, D Prieto-Alhambra, D Ansell, A Pasqua, F Lapi, P Rijnbeek, M Mosseveld, DM Waterworth, S Kendrick, N Sattar and W Alazawi,
BMC medicine, May 2019 20
Non-alcoholic fatty liver disease (NAFLD) is a common condition that progresses in some patients to steatohepatitis (NASH), cirrhosis and hepatocellular carcinoma (HCC). Here we used healthcare records of 18 million adults to estimate risk of acquiring advanced liver disease diagnoses in patients with NAFLD or NASH compared to individually matched controls.Data were extracted from four European primary care databases representing the UK, Netherlands, Italy and Spain. Patients with a recorded diagnosis of NAFLD or NASH (NAFLD/NASH) were followed up for incident cirrhosis and HCC diagnoses. Each coded NAFLD/NASH patient was matched to up to 100 "non-NAFLD" patients by practice site, gender, age ± 5 years and visit recorded within ± 6 months. Hazard ratios (HR) were estimated using Cox models adjusted for age and smoking status and pooled across databases by random effects meta-analyses.Out of 18,782,281 adults, we identified 136,703 patients with coded NAFLD/NASH. Coded NAFLD/NASH patients were more likely to have diabetes, hypertension and obesity than matched controls. HR for cirrhosis in patients compared to controls was 4.73 (95% CI 2.43-9.19) and for HCC, 3.51 (95% CI 1.72-7.16). HR for either outcome was higher in patients with NASH and those with high-risk Fib-4 scores. The strongest independent predictor of a diagnosis of HCC or cirrhosis was baseline diagnosis of diabetes.Real-world population data show that recorded diagnosis of NAFLD/NASH increases risk of life-threatening liver outcomes. Diabetes is an independent predictor of advanced liver disease diagnosis, emphasising the need to identify specific groups of patients at highest risk.
Consent Based Access Policy Framework
Doc2Hpo: a web application for efficient and accurate HPO concept curation
A Fast and Scalable Implementation Method for Competing Risks Data with the R Package fastcmprsk
A Minimum Representation of Potential Drug-Drug Interaction Knowledge and Evidence - Technical and User-centered Foundation
The association of vascular disorders with incident dementia in different age groups.
N Legdeur, SJ van der Lee, M de Wilde, J van der Lei, M Muller, AB Maier and PJ Visser,
Alzheimer's research & therapy, May 2019 17
There is increasing evidence that dementia risk associated with vascular disorders is age dependent. Large population-based studies of incident dementia are necessary to further elucidate this effect. Therefore, the aim of the present study was to determine the association of vascular disorders with incident dementia in different age groups in a large primary care database.We included 442,428 individuals without dementia aged ≥ 65 years from the longitudinal primary care Integrated Primary Care Information (IPCI) database. We determined in 6 age groups (from 65-70 to ≥ 90 years) the risk of hypertension, diabetes mellitus, dyslipidemia, stroke, myocardial infarction, heart failure, and atrial fibrillation for all-cause dementia using incidence rate ratios, Cox regression, and Fine and Gray regression models.The mean age at inclusion of the total study sample was 72.4 years, 45.7% of the participants were male, and median follow-up was 3.6 years. During 1.4 million person-years of follow-up, 13,511 individuals were diagnosed with dementia. The risk for dementia decreased with increasing age for all risk factors and was no longer significant in individuals aged ≥ 90 years. Adjusting for mortality as a competing risk did not change the results.We conclude that vascular disorders are no longer a risk factor for dementia at high age. Possible explanations include selective survival of individuals who are less susceptible to the negative consequences of vascular disorders and differences in follow-up time between individuals with and without a vascular disorder. Future research should focus on the identification of other risk factors than vascular disorders, for example, genetic or inflammatory processes, that can potentially explain the strong age-related increase in dementia risk.
Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes.
JH Son, G Xie, C Yuan, L Ena, Z Li, A Goldstein, L Huang, L Wang, F Shen, H Liu, K Mehl, EE Groopman, M Marasa, K Kiryluk, AG Gharavi, WK Chung, G Hripcsak, C Friedman, C Weng and K Wang,
American journal of human genetics, 2018 07 05
Integration of detailed phenotype information with genetic data is well established to facilitate accurate diagnosis of hereditary disorders. As a rich source of phenotype information, electronic health records (EHRs) promise to empower diagnostic variant interpretation. However, how to accurately and efficiently extract phenotypes from heterogeneous EHR narratives remains a challenge. Here, we present EHR-Phenolyzer, a high-throughput EHR framework for extracting and analyzing phenotypes. EHR-Phenolyzer extracts and normalizes Human Phenotype Ontology (HPO) concepts from EHR narratives and then prioritizes genes with causal variants on the basis of the HPO-coded phenotype manifestations. We assessed EHR-Phenolyzer on 28 pediatric individuals with confirmed diagnoses of monogenic diseases and found that the genes with causal variants were ranked among the top 100 genes selected by EHR-Phenolyzer for 16/28 individuals (p < 2.2 × 10-16), supporting the value of phenotype-driven gene prioritization in diagnostic sequence interpretation. To assess the generalizability, we replicated this finding on an independent EHR dataset of ten individuals with a positive diagnosis from a different institution. We then assessed the broader utility by examining two additional EHR datasets, including 31 individuals who were suspected of having a Mendelian disease and underwent different types of genetic testing and 20 individuals with positive diagnoses of specific Mendelian etiologies of chronic kidney disease from exome sequencing. Finally, through several retrospective case studies, we demonstrated how combined analyses of genotype data and deep phenotype data from EHRs can expedite genetic diagnoses. In summary, EHR-Phenolyzer leverages EHR narratives to automate phenotype-driven analysis of clinical exomes or genomes, facilitating the broader implementation of genomic medicine.
Cancer recording in patients with and without type 2 diabetes in the Clinical Practice Research Datalink primary care data and linked hospital admission data: a cohort study.
R Williams, TP van Staa, AM Gallagher, T Hammad, HGM Leufkens and F de Vries,
BMJ open, 26 2018 05
Conflicting results from studies using electronic health records to evaluate the associations between type 2 diabetes and cancer fuel concerns regarding potential biases. This study aimed to describe completeness of cancer recording in UK primary care data linked to hospital admissions records.Patients aged 40+ years with insulin or oral antidiabetic prescriptions in Clinical Practice Research Datalink (CPRD) primary care without type 1 diabetes were matched by age, sex and general practitioner practice to non-diabetics. Those eligible for linkage to Hospital Episode Statistics Admitted Patient Care (HES APC), and with follow-up during April 1997-December 2006 were included.Cancer recording and date of first record of cancer were compared. Characteristics of patients with cancer most likely to have the diagnosis recorded only in a single data source were assessed. Relative rates of cancer estimated from the two datasets were compared.53 585 patients with type 2 diabetes matched to 47 435 patients without diabetes were included.Of all cancers (excluding non-melanoma skin cancer) recorded in CPRD, 83% were recorded in HES APC. 94% of cases in HES APC were recorded in CPRD. Concordance was lower when restricted to same-site cancer records, and was negatively associated with increasing age. Relative rates for cancer were similar in both datasets.Good concordance in cancer recording was found between CPRD and HES APC among type 2 diabetics and matched controls. Linked data may reduce misclassification and increase case ascertainment when analysis focuses on site-specific cancers.
Massive parallelization boosts big Bayesian multidimensional scaling
A hybrid approach to automatic de-identification of psychiatric notes.
HJ Lee, Y Wu, Y Zhang, J Xu, H Xu and K Roberts,
Journal of biomedical informatics, Nov 2017
De-identification, or identifying and removing protected health information (PHI) from clinical data, is a critical step in making clinical data available for clinical applications and research. This paper presents a natural language processing system for automatic de-identification of psychiatric notes, which was designed to participate in the 2016 CEGS N-GRID shared task Track 1. The system has a hybrid structure that combines machine leaning techniques and rule-based approaches. The rule-based components exploit the structure of the psychiatric notes as well as characteristic surface patterns of PHI mentions. The machine learning components utilize supervised learning with rich features. In addition, the system performance was boosted with integration of additional data to the training set through domain adaptation. The hybrid system showed overall micro-averaged F-score 90.74 on the test set, second-best among all the participants of the CEGS N-GRID task.
A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD).
Y Wu, JC Denny, S Trent Rosenbloom, RA Miller, DA Giuse, L Wang, C Blanquicett, E Soysal, J Xu and H Xu,
Journal of the American Medical Informatics Association : JAMIA, Apr 2017 01
The goal of this study was to develop a practical framework for recognizing and disambiguating clinical abbreviations, thereby improving current clinical natural language processing (NLP) systems' capability to handle abbreviations in clinical narratives.We developed an open-source framework for clinical abbreviation recognition and disambiguation (CARD) that leverages our previously developed methods, including: (1) machine learning based approaches to recognize abbreviations from a clinical corpus, (2) clustering-based semiautomated methods to generate possible senses of abbreviations, and (3) profile-based word sense disambiguation methods for clinical abbreviations. We applied CARD to clinical corpora from Vanderbilt University Medical Center (VUMC) and generated 2 comprehensive sense inventories for abbreviations in discharge summaries and clinic visit notes. Furthermore, we developed a wrapper that integrates CARD with MetaMap, a widely used general clinical NLP system.CARD detected 27 317 and 107 303 distinct abbreviations from discharge summaries and clinic visit notes, respectively. Two sense inventories were constructed for the 1000 most frequent abbreviations in these 2 corpora. Using the sense inventories created from discharge summaries, CARD achieved an F1 score of 0.755 for identifying and disambiguating all abbreviations in a corpus from the VUMC discharge summaries, which is superior to MetaMap and Apache's clinical Text Analysis Knowledge Extraction System (cTAKES). Using additional external corpora, we also demonstrated that the MetaMap-CARD wrapper improved MetaMap's performance in recognizing disorder entities in clinical notes. The CARD framework, 2 sense inventories, and the wrapper for MetaMap are publicly available at https://sbmi.uth.edu/ccb/resources/abbreviation.htm . We believe the CARD framework can be a valuable resource for improving abbreviation identification in clinical NLP systems.
Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2.
A Stubbs, C Kotfila, H Xu and Ö Uzuner,
Journal of biomedical informatics, Dec 2015
The second track of the 2014 i2b2/UTHealth natural language processing shared task focused on identifying medical risk factors related to Coronary Artery Disease (CAD) in the narratives of longitudinal medical records of diabetic patients. The risk factors included hypertension, hyperlipidemia, obesity, smoking status, and family history, as well as diabetes and CAD, and indicators that suggest the presence of those diseases. In addition to identifying the risk factors, this track of the 2014 i2b2/UTHealth shared task studied the presence and progression of the risk factors in longitudinal medical records. Twenty teams participated in this track, and submitted 49 system runs for evaluation. Six of the top 10 teams achieved F1 scores over 0.90, and all 10 scored over 0.87. The most successful system used a combination of additional annotations, external lexicons, hand-written rules and Support Vector Machines. The results of this track indicate that identification of risk factors and their progression over time is well within the reach of automated systems.