OHDSI Phenotype Phebruary and workgroup updates

Phenotype Phebruary 2025 Office Hours – February 7, 2025

Topics

1. Meeting Goals & Agenda

  • 1.1 Purpose of Office Hours
  • 1.2 Proposed Flow & Tools

2. Tracking Progress & Identifying Gaps

  • 2.1 Reviewing the 14 Studies
  • 2.2 Updating the Progress Tracker
  • 2.3 Ensuring Phenotypes & Clinical Descriptions Are Uploaded

3. Guidance & Lessons Learned on Clinical Descriptions

  • 3.1 ChatGPT Usage & Limitations
  • 3.2 Prompt Engineering Considerations
  • 3.3 Differentiating Domain Types (Drug vs Procedure vs Condition)

4. Literature Search & Existing Phenotype Repositories

  • 4.1 OHDSI Phenotype Library & Citing Prior Work
  • 4.2 PubMed & Other Structured Searches
  • 4.3 Validated Algorithms & Prior References

5. Building Cohort Definitions

  • 5.1 Using Atlas Demo
  • 5.2 Atlas Demo Record Counts (OHDSI Evidence Network)
  • 5.3 Collaboration & Sharing Definitions
  • 5.4 Logic and Strategy for Concept Set Development

6. Planning for Next Tuesday’s Meeting

  • 6.1 Volunteer Demonstrations (Psychosis, IBD)
  • 6.2 Agenda & Time Allocation
  • 6.3 Action Items & Timeline

Topic 1: Meeting Goals & Agenda

During the February 7, 2025 Phenotype Phebruary Office Hours, the meeting opened with clarifications on its purpose: to serve as a loosely structured session for troubleshooting participants’ questions and ensuring ongoing progress. The immediate goals included finalizing clinical descriptions for 14 studies, populating the progress tracker, and preparing for concept set building and cohort construction. Organizers emphasized the importance of accurate updates in the tracker and timely sharing of final documents. They also covered practical lessons learned from using ChatGPT to accelerate drafting tasks, highlighting prompt engineering as a critical step. The meeting concluded with a roadmap for next steps, including concept set building, Atlas demonstrations, and strategic guidance for the entire community.

Speaker Positions ([Advocacy]/[Inquiry])

  • Azza ([Inquiry]): Guided participants to update the tracker, confirm clinical description status, and raised questions on Atlas use.
  • Gowtham ([Advocacy]): Encouraged systematic use of GenAI prompts and underscored the need for standardized processes (e.g., Atlas Demo).
  • Kevin ([Inquiry/Advocacy]): Shared ChatGPT experiences, emphasizing pros/cons and advocating for improved prompt engineering.

Implicit Assumptions & Information Gaps

  • Assumes all leads will finalize clinical descriptions by Tuesday’s meeting.
  • Requires clarity on adapting ChatGPT prompts to non–drug-induced conditions.
  • Expects participants to learn Atlas or request additional support if needed.

Topic 2: Tracking Progress & Identifying Gaps

A central focus was ensuring that all 14 studies were captured in the shared progress tracker. Participants noted that only four had fully documented their phenotypes and clinical descriptions, leaving ten incomplete. The group highlighted the importance of updating the tracker and uploading relevant files in designated folders for accurate measurement and collective troubleshooting. Some leads confirmed readiness to submit materials, while others needed extra time or assistance. By Tuesday’s community call, each study lead should finalize naming conventions and clinical descriptions to facilitate concept set building and cohort definitions. The discussion also emphasized standardization, avoiding duplicate entries, and maintaining schedule alignment.

Speaker Positions ([Advocacy]/[Inquiry])

  • Azza ([Inquiry]): Checked off who had populated the tracker, prompting others to ensure timely updates.
  • Tatsiana ([Inquiry]): Asked about file duplication and verification links.
  • Vlad & Masha ([Inquiry]): Confirmed pending uploads/finalization of phenotypes.

Implicit Assumptions & Information Gaps

  • Assumes all leads understand the naming/upload format.
  • Unclear if everyone knows the tracker’s location or uses consistent naming standards.
  • Relies on manual entry rather than an automated system.

Topic 3: Guidance & Lessons Learned on Clinical Descriptions

Participants exchanged firsthand experiences for creating clinical descriptions, focusing on ChatGPT. Kevin noted limitations like prompt length restrictions and overwritten text, while Gowtham emphasized prompt engineering—fine-tuning queries to avoid irrelevant references (e.g., “drug-induced” when describing general conditions). Recognizing that some phenotypes need minimal details (e.g., drugs/procedures), the group stressed adapting descriptions to each cohort context. Overall, they agreed large language models are valuable for rapid drafting but require careful review by clinicians or epidemiologists. Specialized prompts for target cohorts, outcomes, or drug classes were seen as especially helpful.

Speaker Positions ([Advocacy]/[Inquiry])

  • Kevin ([Inquiry/Advocacy]): Highlighted ChatGPT’s iterative editing issues and advocated for improved prompt design.
  • Azza & Gowtham ([Advocacy]): Encouraged using ChatGPT as an accelerator while reinforcing the importance of human validation.

Implicit Assumptions & Information Gaps

  1. Prompt Adaptation & Engineering: Assumes users can modify prompts for varied clinical needs.
  2. Validation Requirements: Final descriptions still need domain expert review.
  3. Workflow Variation: Different domain types (drug, procedure, condition) may call for unique strategies.

Topic 4: Literature Search & Existing Phenotype Repositories

Participants outlined a workflow for leveraging prior work when building new phenotype definitions: (1) consult the OHDSI Phenotype Library, (2) check PubMed for validated algorithms (often ICD-9-based), and (3) adapt or translate these to modern coding standards (ICD-10/ICD-10-CM). While older references may lack detail or rely on outdated codes, participants underscored the value of citing them to strengthen credibility. The group emphasized structured reviews to locate any existing validated algorithms and the importance of carefully transitioning older approaches into current vocabularies.

Speaker Positions ([Advocacy]/[Inquiry])

  • Chris ([Inquiry]): Described a systematic approach (ODHSI Library → PubMed → synthesis).
  • Kevin ([Advocacy]): Noted challenges tracing older ICD-9 validations.
  • Azza & Gowtham ([Advocacy]): Encouraged structured reviews and reminded participants about available OHDSI tools.

Implicit Assumptions & Information Gaps

  1. Continued Relevance of Old Algorithms: Some are poorly documented, hindering adaptation.
  2. Standardization Tools: Converting ICD-9 algorithms to ICD-10 presumes enough detail is available.
  3. Time Constraints: Deep systematic reviews may exceed the Phenotype Phebruary timeline.

Topic 5: Building Cohort Definitions

Moving from clinical descriptions to operational phenotypes, participants stressed Atlas—ODHSI’s platform for creating and sharing cohort definitions—and the OHDSI Evidence Network for global code usage counts. These record counts help identify frequently used vs. rare concepts for inclusion/exclusion. The conversation also highlighted easy cohort transfer between different Atlas instances, beneficial for organizations with internal Atlases. Demonstrations covered using the “shopping cart” to assemble concept sets, refining logic (broad vs. narrow definitions), and exporting final cohorts. Overall, a cohesive approach in Atlas was seen as key for reproducibility and efficient collaboration.

Speaker Positions ([Advocacy]/[Inquiry])

  • Gowtham ([Advocacy]): Showed Atlas Demo features, emphasizing record counts.
  • Azza ([Inquiry/Advocacy]): Stressed uniform cohort-building approaches for leads and contributors.
  • Kevin ([Inquiry]): Sought guidance on splitting IBD definitions (Crohn’s vs. Ulcerative Colitis).

Implicit Assumptions & Information Gaps

  1. Familiarity with Atlas: Some leads may need extra training.
  2. Ownership & Permissions: Private instances must handle local security.
  3. Complex Logic: Certain phenotypes demand intricate logic (multiple concept sets, time windows).

Topic 6: Planning for Next Tuesday’s Meeting

To advance Phenotype Phebruary, participants finalized the agenda and action items for the next Tuesday community call. They will review which of the 14 studies have completed clinical descriptions and confirm that each phenotype is recorded in the tracker. Two volunteers (Tatsiana and Kevin) will present live demonstrations on building concept sets and defining cohorts (e.g., first-episode psychosis and IBD), each lasting about 10 minutes. Organizers urged any leads needing help with documents or the tracker to complete those tasks before Tuesday, emphasizing this call as a pivotal checkpoint for developing logic, identifying concept sets, and refining code lists. Ultimately, these demos will guide participants toward robust, evidence-based cohort definitions.

Speaker Positions ([Advocacy]/[Inquiry])

  • Azza ([Advocacy]): Urged leads to finalize documentation, coordinate demos, and update progress by Tuesday.
  • Kevin & Tatsiana ([Advocacy]): Agreed to demonstrate their phenotype-building process, highlighting real-world complexities.
  • Gowtham ([Inquiry/Advocacy]): Confirmed readiness to assist with demonstration logistics and Atlas usage.

Implicit Assumptions & Information Gaps

  1. Adherence to Deadlines: Success of the demos depends on all leads completing their parts.
  2. Technical Preparedness: Demos require stable setups and rehearsals.
  3. Community Involvement: Participants should engage with or ask questions about Atlas if unfamiliar.