Hi all,
The CIPHER-OHDSI Integration Pilot meeting is cancelled today 8/8/24. Offline, we are working to upload the first pilot phenotypes to the CIPHER website (CIPHER - VA).
Best,
Jackie
Hi all, please see our call notes. We had a productive review of the field mappings between libraries are getting close to finalizing it before adding the initial pilot OHDSI phenotypes to the CIPHER library.
Best,
Jackie
GR wrote script to pull fields from OHDSI library and populate into CIPHER metadata collection sheet for API uploads
Review of CIPHER–>OHDSI metadata mapping
OHDSIcohortid - not represented in CIPHER
Used to identify each phenotype in OHDSI library
ACTION: CIPHER consider how to incorporate OHDSIcohortid
Category - ACTION: GR update in sheet
General = non-lab or medication phenotypes
Lab = Measurement in OHDSI
Medication = Drug in OHDSI
Keywords
These are used to facilitate phenotype search
Currently OHDSI hashtags listed here - these hashtags have accepted meaning within OHDSI, for example “#accepted”, “#level2”
ACTION: CIPHER consider how to incorporate hashtags, may leave this section blank for now
Disease domain
This can be found by browsing the OHDSI hierarchy in ATLAS but there is not an easy mapping from OHDSI to CIPHER categories
Contact
OHDSI library uses orcid instead of managing emails
ACTION: GR create a placeholder email for now
ACTION: CIPHER consider integration of orcids
Data sources
OHDSI may not capture of health system name where partner created phenotype; list OMOP here
Phenotype description
ACTION: GR update to use cohortnamelong → transform so it’s not in markdown
Population description
Consider adding standard language here
Data used from/to
Will populate from OHDSI if available
Algorithm description
ACTION: GR update to use cohort entry event–> transform so it’s not in markdown
ICD codes/algorithm components
ACTION: Leave blank; GR to provide spreadsheet with codes used across vocabularies, concept id, concept code, and standard/non standard indicator (OK if ICD lists are long)
Note CIPHER is updating algorithm components section on website to add OMOP concept ids
OHDSI forum posts
Posting is usually the first step in contribution and the post may contain information useful to the following fields:
Publication
Acknowledgment
Population description
ACTION: CIPHER consider keeping post URL in one section of metadata and listing standard language about forum post in other fields
Version
OHDSI phenotype library is version controlled; need to represent version on phenotype page
ACTION: CIPHER determine how to represent OHDSI library version
Next steps
GR to share updated sheet with changes above with CIPHER
CIPHER create path for incorporating key fields from OHDSI that do not map to current standard
OHDSI/CIPHER agree on final mapping; CIPHER populates pilot phenotypes into library
Date: September 13, 2024 Time: 9:00 AM - 10:00 AM EST Attendees: Gowtham Rao, Azza Shoaibi, Jacqueline Honerlaw, Jamie Weaver, Joel Swerdel, Christopher Mecoli, Andrew Williams, Hayden Spence, Monika, and others.
Agenda
Planning for OHDSI Symposium 2024 Workgroup Meeting
Discuss Projected Attendee Numbers and Finalize Agenda Details
Scientific Discussion: Phenotype Stability
Diagnostics and Strategic Planning for the Upcoming Symposium
Update on the Dermatomyositis Network Study
Measurement Error
VA Cipher and OHDSI Phenotype Library Collaboration
Key Discussions and Updates
1. Planning for OHDSI Symposium 2024 Workgroup Meeting
Method Update 1: Integrating Measurement Error into Study Estimates (20 min): James Weaver
Method Update 2: Probabilistic Phenotyping (20 min): Joel Swerdel
Expanding and Promoting the Library: Integrating OHDSI Library into CIPHER (20 min): Jacqueline Honerlaw
Clinical and Network Studies Update: Dermatomyositis Phenotype Development and Evaluation Network Study (20 min): Dr. Mecoli
Method Update 3: Objective Diagnostics (60 min): Gowtham and Azza will run an experiment
Next Year Planning: Group Exercise (60 min)
2. Discuss Projected Attendee Numbers and Finalize Agenda Details
Joel: Confirms on-site attendance, 20 minutes is good.
Chris: Won’t attend in person, but Will, Kelly, and Ben will be on-site. Focus on lessons learned from phenotype network studies vs. clinical insights.
Jacky: Confirms on-site attendance with Ann. Plans a quick demo of the site, followed by a review and a fun exercise comparing five phenotypes (OHDSI vs. non-OHDSI - treasure hunt).
Jamie: Confirms on-site attendance, 20 minutes is good (asked for 30 minutes).
3. Scientific Discussion: Phenotype Stability
Objective Diagnostics for Phenotypes:
Focus on two diagnostics: stability over time within a data source and consistency of incidence rates across data sources.
Use of statistical tests and visualizations to determine stability and identify significant deviations.
Discussion on the need for confidence intervals and the impact of small sample sizes on the results.
4. Diagnostics and Strategic Planning for the Upcoming Symposium
Objective Diagnostics Experiment:
Plan to run an experiment during the symposium to validate the stability of phenotypes using statistical methods.
Participants will review results and validate or disagree with the algorithm’s findings.
5. Update on the Dermatomyositis Network Study
Dr. Mecoli: Will provide updates on the study, focusing on the results and insights gained from the network.
6. Measurement Error
James Weaver: Will discuss integrating measurement error into study estimates, particularly incidence rates.
7. VA Cipher and OHDSI Phenotype Library Collaboration
Jackie Honerlaw: Provided an update on the Cypher integration and the finalization of an abstract for the Immy Informatics Summit.
Plans to review and integrate OHDSI phenotypes into Cypher, with a test involving around 25 phenotypes.
Future Announcements: Once the 25 phenotypes are tested and integrated, an announcement will be made in the OHDSI community call.
Future Planning and Discussions
LLM:
As a Focus of Work: Workgroup should consider focusing on LLMs (Large Language Models) for phenotyping.
Deep Phenotyping Using Multi-Modal Data: Discussion on using genomics, imaging, waveform, and text data.
Azza’s Work Outside the Workgroup: LLM (KEEPER, literature search) - seeking volunteers to educate the group.
Models Trained on EHR Data: Hayden’s brought up FEMR models and their potential for phenotyping.
Journal Club: Jacky suggested the latest topic of AMIA could be a journal club discussion on LLMs. Monicka offered to lead after having clarity of expectations.
Conclusion
Call for Volunteers: Volunteers are needed to help create a system for collaborative evaluation of phenotype stability.
Jamie Weaver’s Upcoming Work: A teaser for Jamie’s work on incident rate correction was presented, with more details to be shared in future meetings.
Summary of the Presentation on Phenotype Stability and Objective Diagnostics by Azza Shoaibi and Gowtham Rao
Overview
Azza Shoaibi and Gowtham Rao provided an in-depth discussion on the development and application of objective diagnostics for evaluating phenotype stability. The presentation focused on two main aspects: the stability of phenotypes over time within a single data source and the consistency of incidence rates across multiple data sources.
Key Points
Objective Diagnostics for Phenotypes
The goal is to develop diagnostics that can objectively assess the stability and reliability of phenotypes.
These diagnostics use statistical methods to evaluate the performance of phenotypes.
Phenotype Stability Over Time
Purpose: To determine if a phenotype remains stable over time within a single data source.
Method: Utilizes incidence rate diagnostics, plotting incidence rates over calendar years.
Visualization:
Black Line: Represents the observed incidence rate over time.
Dashed Line: Represents the expected trend, modeled using a Poisson spline model with three knots.
Statistical Test: A likelihood function compares the area under the observed and expected incidence rate curves. A deviation greater than 25% (ratio > 1.25) indicates instability.
Example: The phenotype for pure red cell aplasia showed significant instability in certain data sources, indicating it should not be used across all time periods without adjustments.
Consistency Across Data Sources
Purpose: To assess if a phenotype produces consistent incidence rates across different data sources.
Method: Compares incidence rates across multiple data sources to identify significant deviations.
Future Work: Formalizing a statistical test to evaluate consistency across data sources.
Challenges and Considerations
Small Sample Sizes: Variability in incidence rates can be influenced by small sample sizes, necessitating the inclusion of confidence intervals in visualizations.
Natural Changes: Phenotypes may naturally change over time due to new guidelines, medications, or coding practices. The diagnostics aim to identify significant deviations that could impact study results.
Future Directions
Experiment at Symposium: Plan to run an experiment during the OHDSI Symposium to validate the stability diagnostics using human review.
Tool Development: Call for volunteers to help create a system for collaborative evaluation of phenotype stability.
Conclusion
The presentation highlighted the importance of developing robust diagnostics to ensure the reliability of phenotypes used in research. By identifying and addressing instability and inconsistency, researchers can improve the accuracy and validity of their studies. The upcoming experiment at the OHDSI Symposium will further refine these methods and engage the community in collaborative evaluation.
Provided an update on the integration of OHDSI phenotypes into the Cypher system.
Mentioned the team recently met to finalize their abstract for the Immy Informatics Summit, due next week.
Currently reviewing the initial integration of OHDSI phenotypes into Cypher and plan to test around 25 phenotypes on their test site.
Once the review is complete, they will push the phenotypes to production and share the links with the group.
Highlighted that after the 25 phenotypes are tested and integrated, an announcement will be made in the OHDSI community call.
Expressed willingness to attend the community call to make the announcement once the integration is complete.
Suggested using the latest issue of the Journal of the American Medical Informatics Association (JAMIA), which focuses on Large Language Models (LLMs), for a journal club session.
Proposed discussing a few articles from the issue if no one has the bandwidth to lead a session.
Emphasized the collaborative efforts and the importance of community engagement in these initiatives.
Joel Swerdel
Provided insights into probabilistic phenotyping and its progress.
Agreed that 20 minutes would be sufficient for his presentation but emphasized the need for discussion time.
Suggested that the discussion on phenotype stability might require more than an hour due to its complexity.
Inquired about the use of three knots in the Poisson spline model for expected trends.
Highlighted the importance of understanding the natural changes in incidence rates over time.
Emphasized that while the true incidence rate of a condition might change due to external factors like new guidelines or medications, the phenotype algorithm’s incidence rate should remain stable unless there is a significant deviation.
Suggested that the diagnostics should be able to split data into periods to identify when a phenotype is usable.
Underscored the need for thorough evaluation and discussion of phenotype stability to ensure reliable research outcomes.
Andrew Williams
Emphasized the importance of focusing on computable definitions for all study components, not just health states.
Pointed out that much of the current work in the community, as well as in other communities using EHR-based definitions, tends to focus more on characterizing health states rather than the components of care.
Argued that a meticulous articulation of all care components, including preconditions, follow-up, and management of identified conditions, is crucial for valid research.
Supported Hayden’s suggestion about the need for detailed care pathways and typical care elements.
Highlighted the potential of using Large Language Models (LLMs) for phenotyping and deep phenotyping using multi-modal data.
Suggested that these areas should be included in future planning.
Stressed the importance of understanding the implications of shifts in phenotype stability and the need for consensus on how to handle these shifts.
Kevin Haynes
Emphasized the necessity of including confidence intervals in visualizations to account for small sample sizes.
Pointed out that small sample sizes can significantly impact the variability of incidence rates.
Suggested that without confidence intervals, it is challenging to interpret the observed data accurately.
Highlighted that small sample sizes might lead to misleading conclusions about phenotype stability.
Recommended displaying confidence intervals to provide more context and help in understanding the true variability.
Measurement Error: Focused on integrating measurement error into study estimates, particularly incidence rates.
Time Allocation: Suggested extending the discussion from 20 minutes to 30 minutes for a more in-depth exploration.
Meta-Analytic Incidence Rates: Proposed creating a plot where each incidence rate on the fluctuating line is a meta-analytic incidence rate from multiple databases, fitting a smoothed curve across all data.
Phenotype Stability: Highlighted that this method could help identify real incidence rate variations and improve the understanding of phenotype stability.
Ongoing Work: Mentioned his ongoing work on incident rate correction, which he plans to apply to phenotypes developed during Phenotype February.
Community Engagement: Offered to provide a teaser of this work to interested participants after the meeting, indicating his commitment to advancing the methodology and engaging with the community for feedback and collaboration.
Hayden Spence
EHR Data Models: Discussed the potential of models trained on Electronic Health Records (EHR) data for phenotyping, specifically mentioning the FEMR models from Stanford’s Shah Lab.
Predictive Models: Highlighted that these models, although not necessarily Large Language Models (LLMs), are trained to interpret healthcare records as if EHR is the language, using OMOP as a framework.
Patterns in Patient Care: Emphasized the importance of understanding how patients move through the healthcare system, suggesting that certain events and patterns in patient care can be indicative of specific conditions.
Adaptation for Phenotyping: Pointed out that while these models are currently predictive, they could be adapted for phenotyping by identifying expected patterns of care for conditions like pneumonia or heart disease.
Stability Diagnostics: Suggested that the stability diagnostics could be applied not only at the population level but also within individual patients over time.
Consistency Within Patients: This approach could help identify whether a phenotype remains consistent within a person, which is crucial for conditions that may have fluctuating diagnoses.
Advanced Models and Care Pathways: His contributions underscored the potential of advanced models and detailed care pathways in improving phenotype stability and reliability.
Christopher Mecoli
Phenotype Instability: Acknowledged that there would likely be instability in phenotypes over time, which is not necessarily a negative outcome.
Expected Changes: Explained that such changes could be expected due to new guidelines, medications approved by the FDA, or other factors.
Data Interpretation: Emphasized that these changes are important to be aware of as they inform how data should be interpreted.
Understanding Instability: Highlighted that understanding the reasons behind phenotype instability is crucial for accurate data analysis and interpretation.
Research Validity: His contributions underscored the need to account for these variations when conducting studies to ensure the reliability and validity of the research findings.
Monika
LLM Expertise: Shared her expertise and experience with Large Language Models (LLMs) and artificial intelligence (AI).
Learning and Exploration: Mentioned that she has been learning about LLMs, AI, and related technologies, including creating embeddings and implementing pipelines.
Current Projects: Currently exploring how to create embeddings, implement retrieval-augmented generation (RAG) pipelines, and address questions using web scraping and specific file inputs to produce targeted outputs.
Moderation Offer: Expressed her willingness to help moderate discussions on LLMs and their application to phenotyping.
Scope and Expectations: Wanted to understand the scope and expectations before committing to moderating discussions.
Hi all, for today’s CIPHER-OHDSI Integration Pilot call at 2pm EST we will review the ~25 phenotypes remaining for upload to CIPHER and touch on our presentation at the OHDSI workgroup meeting.
Best,
Jackie
Recap: October 11th 2024: @XiaotongLi (PhD Student, Pharmaceutical Outcomes and Policy Research, University of Pittsburgh School of Pharmacy) presentation focused on identifying potentially inappropriate medications (PIMs) in older adults, a significant issue due to their increased risk of adverse events, such as higher hospitalization rates and healthcare costs. She explained that PIMs are drugs where risks often outweigh benefits, particularly when safer alternatives exist. Using the OMOP Common Data Model and RxNorm codes, her team implemented the Beers Criteria to identify 37 PIMs through computational methods. Their analysis of over 300,000 patients at the University of Pittsburgh Medical Center (2015–2018) revealed high incidences of inappropriate prescriptions, particularly benzodiazepines, first-generation antihistamines, and muscle relaxants.
Li emphasized that older adults’ unique physiological changes affect drug dynamics, making the identification of PIMs crucial. Her team crafted logic for defining PIM exposure, leveraging data standardization through the Odyssey platform. Their analysis showed a PIM incidence rate of 193.5 per 1,000 person-years. She highlighted that this work could improve pharmacovigilance and patient safety by incorporating PIM definitions into the OHDSI Phenotype Library to enable broader institutional comparability and further research.
She concluded by inviting guidance on integrating their work into the OHDSI (Observational Health Data Sciences and Informatics) ecosystem, hoping it would contribute to enhanced patient care and quality improvement.
VA Cipher Integration: Jacqueline Honerlaw provided an update on integrating 25 phenotype definitions from the OHDSI Phenotype Library into Virginia Cipher. These will soon be ready for review and integration into the VA system, particularly relevant for geriatrics work.
AI and Phenotyping: The team discussed generative AI’s role in phenotyping, with Shikha Kothari agreeing to review existing literature and lead a discussion on its use in phenotype definition. Shikha Kothari, Jacqueline Honerlaw, and Darya Zhukova volunteered to collaborate on the literature review and presentation on generative AI in phenotyping, with plans to share insights during the symposium.
Global Symposium Planning: Gowtham Rao led discussions about preparations for the upcoming OHDSI Global Symposium, including presentations on probabilistic phenotyping, measurement error in estimates, and the development of objective diagnostic rubrics for phenotype validation.