OHDSI Phenotype Phebruary and workgroup updates

Gowtham_Rao · July 25, 2025, 4:58pm

Meeting Summary & Analysis

OHDSI Phenotype Development and Evaluation Workgroup Date: July 25, 2025

Participants

Meeting Lead: Gowtham Rao
Contributor: Azza Shoaibi
Contributor: Christopher Mecoli
Contributor: Joel Swerdel
Contributor: Ben Hamlin
Contributor: Juan M Banda
Contributor: Jacqueline Honerlaw
Contributor: Tatsiana Skuhareuskaya

Executive Summary

The workgroup reconvened to celebrate a significant publication, pivot toward new research avenues, and align on administrative changes. The session was marked by a vibrant discussion on leveraging Large Language Models (LLMs) for phenotype development, sparking a debate on the appropriate evaluation methodology—balancing AI-driven generation against human expertise and established community resources. The group agreed to formalize an experiment comparing human vs. LLM concept set creation. Key updates were also shared on the Book of OHDSI and the ongoing integration between the OHDSI and VA Cypher phenotype libraries, reinforcing a push for greater interoperability and standardized metadata.

Topics Discussed

Workgroup Administration & OKR Review
Publication of Rheumatic Disease Phenotype Paper
Proposed Experiment: LLM vs. Human-Generated Concept Sets
Framework for Phenotype Evaluation
Book of OHDSI & Phenotype Library Updates
Cross-Library Collaboration & Future Directions
Action Items & Next Steps

Detailed Topic Analysis

1. Workgroup Administration & OKR Review

Gowtham Rao opened the meeting, announcing a change in cadence to once a month, on the fourth Friday. He briefly reviewed the 2025 Objectives and Key Results (OKRs), noting progress on enhancing the science of phenotyping while acknowledging that some initiatives may need to be simplified or closed due to delivery challenges.

2. Publication of Rheumatic Disease Phenotype Paper

The group celebrated the recent publication of a paper on phenotyping a rare rheumatic disease, led by Dr. Christopher Mecoli. Dr. Mecoli described it as a “long process” that involved collaboration with multiple institutions and validated the output of the PheValuator tool through manual chart review. Gowtham Rao praised Dr. Mecoli’s end-to-end leadership as a clinician, highlighting his journey from learning the OHDSI tools to leading a team and publishing the work. The plan is to expand this validated methodology to other rheumatic and autoimmune diseases.

3. Proposed Experiment: LLM vs. Human-Generated Concept Sets

Joel Swerdel introduced a plan for a formal experiment to evaluate the use of LLMs in generating concept sets. The proposed methodology involves having one team create a concept set manually while another uses an LLM. A clinical adjudicator would then compare the two outputs to determine their accuracy and calculate performance statistics. Joel called for volunteers from the community, including clinicians to act as adjudicators and others to help develop the concept sets, suggesting the work could be a focus at the upcoming OHDSI symposium.

4. Framework for Phenotype Evaluation

Joel’s proposal sparked a broader discussion on the foundational principles of phenotype evaluation. Ben Hamlin argued for establishing a more explicit, abstract model framework within the Book of OHDSI’s Chapter 11. He felt that while the chapter contains excellent information, the core principles are “kind of buried.” Ben stressed the need for a clear, scientifically validated “gold standard” to ensure referential integrity, especially as the community moves toward using AI and expanding the phenotype library.

This led to a debate on the nuances of evaluation. Juan M Banda questioned the premise of a “fair” comparison between a human and an LLM, noting the LLM’s vast, embedded knowledge of literature versus a human’s practical experience. He cautioned that LLMs can “come up with a bunch of garbage” and suggested that not using the existing Phenotype Library as a baseline could erode trust in it. Chris Mecoli proposed taking the evaluation a step further with chart reviews to see which concept set—AI or human—better identifies the true clinical concept, though he acknowledged the significant effort required. Azza Shoaibi synthesized the discussion, noting that Joel’s LLM workflow is sophisticated, using OHDSI’s vocabulary ontology and the Phoebe tool to assess concept prevalence, making it more than a simple query.

Key Takeaway: The group converged on the idea that a more structured evaluation process is needed. While clinical adjudication remains the gold standard for concept sets, the experiment provides an opportunity to refine and formalize a multi-faceted evaluation framework that could incorporate tools like CohortDiagnostics and PheValuator alongside chart review.

5. Book of OHDSI & Phenotype Library Updates

Azza Shoaibi reported that the first draft of Chapter 11 of the Book of OHDSI is complete and open for community review. The content is a curation of ideas discussed over the past several years. She specifically requested that Juan M Banda review and update the section on probabilistic phenotyping. She and Ben Hamlin agreed to collaborate on creating a clearer visual diagram for the evaluation framework to be included in the chapter. Gowtham Rao also committed to finalizing the OHDSI Phenotype Library paper with Juan.

6. Cross-Library Collaboration & Future Directions

Jacqueline Honerlaw provided an update on the integration between the VA’s Cypher Phenotype Library and the OHDSI library. The collaboration is expanding to include the PKB and HDRUK phenotype libraries, which she believes will make for a more compelling joint publication. This work highlights the need for a common metadata standard across libraries.

Gowtham Rao affirmed that this aligns with the future direction for the OHDSI Phenotype Library, which needs a better user interface and clearer contribution pathways. He called for volunteers to form a subgroup to work on the next generation of the library.

7. Action Items & Next Steps

Azza Shoaibi summarized the key action items from the meeting:

LLM Experiment: Joel and Azza will draft a formal protocol for the LLM vs. human concept set experiment and share it with the workgroup to form a dedicated team.
Book of OHDSI:
- Juan will update the probabilistic phenotyping section.
- Azza and Ben will develop a new diagram for the evaluation framework.
- All members are encouraged to review the chapter.
- Sajjan will assist with formatting and citations.
Phenotype Libraries:
- Gowtham will clean up and release the next batch of OHDSI cohort definitions for integration into Cypher within two weeks.
- Jacqueline will share a draft of the multi-library integration paper in early August.
- Azza will work on integrating Phenotype February definitions into the library.
Meeting Schedule: The August 22nd meeting is canceled due to vacations. The group will explore meeting on September 5th or 12th.