Meeting Summary & Analysis
OHDSI Phenotype Development and Evaluation Workgroup Date: July 25, 2025
Participants
- Meeting Lead: Gowtham Rao
- Contributor: Azza Shoaibi
- Contributor: Christopher Mecoli
- Contributor: Joel Swerdel
- Contributor: Ben Hamlin
- Contributor: Juan M Banda
- Contributor: Jacqueline Honerlaw
- Contributor: Tatsiana Skuhareuskaya
Executive Summary
The workgroup reconvened to celebrate a significant publication, pivot toward new research avenues, and align on administrative changes. The session was marked by a vibrant discussion on leveraging Large Language Models (LLMs) for phenotype development, sparking a debate on the appropriate evaluation methodologyâbalancing AI-driven generation against human expertise and established community resources. The group agreed to formalize an experiment comparing human vs. LLM concept set creation. Key updates were also shared on the Book of OHDSI and the ongoing integration between the OHDSI and VA Cypher phenotype libraries, reinforcing a push for greater interoperability and standardized metadata.
Topics Discussed
- Workgroup Administration & OKR Review
- Publication of Rheumatic Disease Phenotype Paper
- Proposed Experiment: LLM vs. Human-Generated Concept Sets
- Framework for Phenotype Evaluation
- Book of OHDSI & Phenotype Library Updates
- Cross-Library Collaboration & Future Directions
- Action Items & Next Steps
Detailed Topic Analysis
1. Workgroup Administration & OKR Review
Gowtham Rao opened the meeting, announcing a change in cadence to once a month, on the fourth Friday. He briefly reviewed the 2025 Objectives and Key Results (OKRs), noting progress on enhancing the science of phenotyping while acknowledging that some initiatives may need to be simplified or closed due to delivery challenges.
2. Publication of Rheumatic Disease Phenotype Paper
The group celebrated the recent publication of a paper on phenotyping a rare rheumatic disease, led by Dr. Christopher Mecoli. Dr. Mecoli described it as a âlong processâ that involved collaboration with multiple institutions and validated the output of the PheValuator tool through manual chart review. Gowtham Rao praised Dr. Mecoliâs end-to-end leadership as a clinician, highlighting his journey from learning the OHDSI tools to leading a team and publishing the work. The plan is to expand this validated methodology to other rheumatic and autoimmune diseases.
3. Proposed Experiment: LLM vs. Human-Generated Concept Sets
Joel Swerdel introduced a plan for a formal experiment to evaluate the use of LLMs in generating concept sets. The proposed methodology involves having one team create a concept set manually while another uses an LLM. A clinical adjudicator would then compare the two outputs to determine their accuracy and calculate performance statistics. Joel called for volunteers from the community, including clinicians to act as adjudicators and others to help develop the concept sets, suggesting the work could be a focus at the upcoming OHDSI symposium.
4. Framework for Phenotype Evaluation
Joelâs proposal sparked a broader discussion on the foundational principles of phenotype evaluation. Ben Hamlin argued for establishing a more explicit, abstract model framework within the Book of OHDSIâs Chapter 11. He felt that while the chapter contains excellent information, the core principles are âkind of buried.â Ben stressed the need for a clear, scientifically validated âgold standardâ to ensure referential integrity, especially as the community moves toward using AI and expanding the phenotype library.
This led to a debate on the nuances of evaluation. Juan M Banda questioned the premise of a âfairâ comparison between a human and an LLM, noting the LLMâs vast, embedded knowledge of literature versus a humanâs practical experience. He cautioned that LLMs can âcome up with a bunch of garbageâ and suggested that not using the existing Phenotype Library as a baseline could erode trust in it. Chris Mecoli proposed taking the evaluation a step further with chart reviews to see which concept setâAI or humanâbetter identifies the true clinical concept, though he acknowledged the significant effort required. Azza Shoaibi synthesized the discussion, noting that Joelâs LLM workflow is sophisticated, using OHDSIâs vocabulary ontology and the Phoebe tool to assess concept prevalence, making it more than a simple query.
Key Takeaway: The group converged on the idea that a more structured evaluation process is needed. While clinical adjudication remains the gold standard for concept sets, the experiment provides an opportunity to refine and formalize a multi-faceted evaluation framework that could incorporate tools like CohortDiagnostics and PheValuator alongside chart review.
5. Book of OHDSI & Phenotype Library Updates
Azza Shoaibi reported that the first draft of Chapter 11 of the Book of OHDSI is complete and open for community review. The content is a curation of ideas discussed over the past several years. She specifically requested that Juan M Banda review and update the section on probabilistic phenotyping. She and Ben Hamlin agreed to collaborate on creating a clearer visual diagram for the evaluation framework to be included in the chapter. Gowtham Rao also committed to finalizing the OHDSI Phenotype Library paper with Juan.
6. Cross-Library Collaboration & Future Directions
Jacqueline Honerlaw provided an update on the integration between the VAâs Cypher Phenotype Library and the OHDSI library. The collaboration is expanding to include the PKB and HDRUK phenotype libraries, which she believes will make for a more compelling joint publication. This work highlights the need for a common metadata standard across libraries.
Gowtham Rao affirmed that this aligns with the future direction for the OHDSI Phenotype Library, which needs a better user interface and clearer contribution pathways. He called for volunteers to form a subgroup to work on the next generation of the library.
7. Action Items & Next Steps
Azza Shoaibi summarized the key action items from the meeting:
- LLM Experiment: Joel and Azza will draft a formal protocol for the LLM vs. human concept set experiment and share it with the workgroup to form a dedicated team.
- Book of OHDSI:
- Juan will update the probabilistic phenotyping section.
- Azza and Ben will develop a new diagram for the evaluation framework.
- All members are encouraged to review the chapter.
- Sajjan will assist with formatting and citations.
- Phenotype Libraries:
- Gowtham will clean up and release the next batch of OHDSI cohort definitions for integration into Cypher within two weeks.
- Jacqueline will share a draft of the multi-library integration paper in early August.
- Azza will work on integrating Phenotype February definitions into the library.
- Meeting Schedule: The August 22nd meeting is canceled due to vacations. The group will explore meeting on September 5th or 12th.