OHDSI Phenotype Phebruary - (in aPHril) 2026

Gowtham_Rao · March 28, 2026, 11:16am

OHDSI Phenotype Phebruary - (in aPHril) 2026

Phenotype Phebruary (this year in aPHril) represents our community’s collective effort to advance the field of phenotyping in observational studies, backed by our desire for continuous learning and the integration of cutting-edge AI technologies. This year, we are bridging the gap between traditional rule-based methods and agentic AI workflows.

The Phenotype Workgroup’s 2026 OKRs focus on advancing the science of AI-assisted systematic phenotyping (in collaboration with the AI Workgroup) while simultaneously standardizing metadata to ensure that phenotype definitions are findable, accessible, and reproducible across the global OHDSI network.

Phenotype Phebruary 2022 Homepage

Phenotype Phebruary 2023 Homepage

Phenotype Phebruary 2024 Homepage

Phenotype Phebruary 2025 Homepage

Why Do We Conduct Phenotype Phebruary (in aPHril)?

Community Engagement and Collaboration

Dedicated time for collaborative focus on the “Gold Standard” of phenotyping.
Fosters engagement between clinical investigators, data partners, and the Generative AI workgroup.
Inclusive environment for both traditional epidemiologists and AI developers.

Advancement in Phenotyping Science

Benchmarking iterative, empirically grounded, AI-assisted workflows across diverse RWD network sources.
Evaluating the performance of Large Language Model (LLM) adjudication against traditional chart review.
Populating the OHDSI Phenotype Library with high-fidelity clinical definitions.

Education and Practice

Training on the use of modern OHDSI tools like KEEPER LLM (https://github.com/OHDSI/Keeper) and the Study Agent (https://github.com/OHDSI/StudyAgent).
Standardizing metadata ontologies to ensure cross-network reproducibility.

What We Aim to Achieve in Phenotype Phebruary (in aPHril) 2026

The Clinical Use Case: Acute Myocardial Infarction (AMI)

We will focus on the full continuum of algorithms for Acute myocardial infarction. We will move from simple 1-code definitions to highly specific multi-criteria phenotypes, differentiating between new episodes and chronic history.

The “One-Iteration” Challenge

To keep participation high-value and low-stress, we are asking for one iteration from the community (you may submit as many cohort definition variants as you want - but we will only iterate once). We will provide reference definitions and “seed” cohorts built live; your challenge is to use our new tools to improve upon them.

Scheduled Activities and Milestones

All activities will be synchronized with our Tuesday Community Calls and Phenotype Workgroup meetings:

March 31: The Kick-off. Introduction of the plan, clinical description of Acute Myocardial Infarction, and overview of the 2026 OKRs. [lead: Patrick Ryan]
April 7: The Seed Build. Live build session of the seed cohorts using ATLAS. [lead: Gowtham Rao]
April 14: KEEPER Evaluation & Submission Open. Interactive session using LLM-enabled KEEPER outputs for live polling and profile adjudication. Submission window officially opens. [lead: Martijn Schuemie]
April 21: Break. (OHDSI Europe Symposium).
April 24: Submission Deadline. All JSON cohort definitions due.
April 28: Final Rundown. Summary of learnings, performance benchmarks (PPV/Sensitivity), and showcase of the top-performing AI-assisted phenotypes. [lead: Azza Shoaibi and Gowtham Rao]

Ways to Contribute

There are many ways to get involved, regardless of your technical background or data access:

Contributor Group	Opportunity	Timing
Data Partners	Run CohortDiagnostics (CD) on predefined phenotypes and publish results to the shared Shiny app	April 14
Data Partners	Run KEEPER, generate patient profiles, and perform manual adjudication	April 14
Data Partners	Run LLM-enabled KEEPER and generate outputs for AI-based adjudication	April 14
Data Partners	Run other shared evaluation tools (e.g., KEEPER variants, PheValuator)	April 14
Phenotype Developers	Submit Acute Myocardial Infarction (AMI) definitions for evaluation	April 14–24
Phenotype Evaluators	Use available tools to evaluate submitted cohort definitions	April 24
Clinical Collaborators	Help adjudicate a sample of KEEPER patient profiles	April 14
Everyone	Adjudicate 5–10 cases during open community calls	Scheduled Calls
Everyone	Attend Phenotype April community calls and learn the workflows	Throughout April
Everyone	Attend and reflect on learnings during the final wrap-up call	April 28

How to Participate and Submit

We want to make participation as frictionless as possible.

Build: You can use atlas-demo.ohdsi.org to create your cohort, or use your own local ATLAS instance.
Submit: Once you have refined your cohort for Acute Myocardial Infarction, export the JSON file and email it directly to rao@ohdsi.org.
Collaborate: Join the Phenotype Development and Evaluation Workgroup calls for ad-hoc working sessions between community updates.

Details of the 2026 Collaborative Study

Phenotype Representation & Agentic Support

We will explore the use of “Librarian Agents” that operate on top of the Phenotype Library using the Model Context Protocol (MCP). We aim to identify how AI can help researchers interpret clinical descriptions and translate them into robust R code (Caper) or ATLAS JSON.

Validation via KEEPER LLM

A primary objective is to assess the reliability of AI-extraction tools. Participants will see how LLM agents can summarize complex patient profiles and provide adjudication evidence, significantly reducing the burden of traditional manual chart review while maintaining high specificity.

Join the Journey!

Whether you are a clinical expert, a data scientist, or an AI enthusiast, Phenotype aPHril is your chance to help shape the future of evidence generation in OHDSI.

Supersedes: Phenotype Phebruary 2026 (see aPHril thread)

Gowtham_Rao · March 28, 2026, 11:25am

‘WIKI’ post

Links and resources: We need a place for the community to share files/write documents collaboratively. We had used OHDSI MS Teams before of the OHDSI Phenotype Working group, and it was open to internet. It appears to be closed to members of the phenotype working group. Anyone can sign up using this link Microsoft Forms - but should we switch to Google drive?

Gowtham_Rao · March 28, 2026, 11:26am

Calendar

Gowtham_Rao · April 8, 2026, 10:13am

Gowtham_Rao · April 8, 2026, 10:30am

(AI Assisted recap)

Week 1 Recap: Phenotype Aphril & The AI-Driven “Black Box”

Welcome to Phenotype Aphril! To kick off our fifth iteration of the community-driven Phenotype Phebruary initiative, Gowtham Rao led a fantastic session exploring a provocative question: Given a clinical intent, can a “black box” AI system predict tokens with such reliability and reproducibility that it can build cohort definitions better than humans?

For this month-long sprint, the community is uniting around a single, highly complex clinical idea: Acute Myocardial Infarction (AMI). Our lofty goal is to collaboratively generate and evaluate over 100 community-submitted cohort definitions for AMI.

Here is a detailed recap of Gowtham’s presentation, the theoretical frameworks introduced, and the live ATLAS demonstration.

The Vision: Moving Beyond “Then a Miracle Occurs”

Currently, the phenotype development process is highly subjective, labor-intensive, and relies heavily on human heuristics. Gowtham shared the classic cartoon of two scientists looking at a complex equation with “then a miracle occurs” written in the middle—a perfect metaphor for how we often manually pick concept codes today.

To build a reliable open-science system, we need to transition to a transparent, systematic pipeline. Gowtham introduced an AI-Assisted Proposer-Validator Framework:

The Proposer (Maximizing Sensitivity): A Large Language Model (LLM) acts as a creative agent, understanding the semantic meaning of clinical concepts to propose data-driven rule expansions and cast a wide net.
The Validator (Maximizing Specificity): A deterministic system (like ATLAS and our evaluation tools) verifies these proposals, checks marginal data, and ensures the correct clinical profile is met.

To systematically deconstruct clinical intent, the AI engine uses a 6-Dimensional (6D) Framework:

Diagnosis
Symptoms
Diagnostic Procedures
Therapeutic Interventions
Complications
Alternative Diagnoses (Mimics)

The Clinical Challenge: Modeling Acute Myocardial Infarction

AMI is the perfect use case for this challenge because its clinical definition has evolved significantly. We are no longer looking for a purely syndromic diagnosis.

According to the 4th Universal Definition of MI, biochemical necrosis (e.g., an elevated high-sensitivity cardiac troponin) alone only indicates myocardial injury. To qualify as a true myocardial infarction, that injury must be driven by active ischemia. Therefore, a robust phenotype needs to look for Troponin PLUS corroborating evidence like ischemic EKG changes, crushing chest pain, or imaging evidence.

When searching for this in real-world data, we fall into two major traps:

False Positives: Alert fatigue leading to provisional “Rule-Out” ED codes being accepted, or “Copy-Forward” macros duplicating old AMI diagnoses into active problem lists.
False Negatives: “Click fatigue” during rushed visits where a provider uses a quick-sign code (like ankle sprain) and traps the complex AMI narrative in unstructured free-text notes, or siloed care where the AMI code is dropped at discharge by an attending surgeon.

The ATLAS Demo: Evolution of a Cohort Definition

Gowtham demonstrated how to map this 6D framework directly into ATLAS, showing the constant tug-of-war between sensitivity and specificity.

Phase 1: Expanding Sensitivity (The Wide Net)
In ATLAS, Cohort Entry Events operate on OR logic. Gowtham started with a simple baseline—requiring an AMI Diagnosis code. To catch more potential cases (increasing sensitivity), he expanded the entry events using the “Complications” dimension. A patient could now enter the cohort if they had an AMI diagnosis code OR a complication directly tied to AMI (e.g., cardiac rupture, hemopericardium, ventricular aneurysm).

Phase 2: Tightening Specificity (Filtering Errors)
Next, he used Inclusion Rules, which operate on strict AND logic, to weed out false positives. To ensure an outpatient code wasn’t just a routine follow-up or a provisional rule-out, he added a rule requiring the AMI event to overlap with an Emergency Room or Inpatient visit.

Phase 3: Modeling the Clinical Pathway
To truly capture the 4th Universal Definition and find patients who may not be labeled correctly because of “click fatigue” (where the billing code is missing but the disease is present), Gowtham built a highly complex entry criteria relying on clinical markers. A patient could enter the cohort if they had:

Cardiac Troponin measurement
AND
Symptoms (chest pain/sweating) OR EKG changes OR Wall motion abnormalities
AND
Coronary diagnostics (angiography) OR Revascularization (stents) OR Thrombolytic therapy.

Phase 4: Filtering Mimics (The 6D Climax)
Finally, to optimize Positive Predictive Value (PPV), Gowtham introduced the Alternative Diagnoses dimension. He applied strict exclusion rules to kick out patients who had competing diagnoses on the same day, filtering out:

Cardiac/Vascular mimics (Myocarditis, Aortic Dissection)
Chest Wall/Mediastinal mimics
Pulmonary mimics (Pneumonia, Pulmonary Embolism)
Psychogenic mimics (Panic Attacks)

Key Discussion Points

Cross-Database Harmonization: Building phenotypes that rely purely on structured billing codes will fail on EHR-heavy databases, and vice versa. By utilizing the 6D framework and creating complex, multi-criteria definitions (combining diagnosis codes with clinical marker pathways), we can build algorithms that function effectively across fundamentally diverse data sources.
Performance is Local: As Azza Shoaibi noted, a phenotype’s performance is a function of that specific definition on a specific data source. There is no single “magic bullet” definition, which is why network-wide evaluation is critical.
AI Integration: The community actively discussed the future of integrating semantic AI directly into ATLAS for concept set selection, building upon the foundations of tools like the Phoebe recommender system and OMOP Dash Graph.

Next Steps!

This is what Phenotype Aphril is all about! We want to see your creative proposals for AMI.

Take these concepts, build your own AMI cohort definitions in ATLAS, and share your approaches on the forums. Propose your best cohort definitions, submit them to rao@ohdsi.org
Join us next week as Patrick Ryan and Martin Schuemie take over to discuss Phenotype Evaluation and showcase the upgraded KEEPER system—an “AI clinician” designed to automate case adjudication and validate Positive Predictive Value.

Let’s build some cohorts! You are the creative PROPOSER!

Gowtham_Rao · April 8, 2026, 11:33am

** [Phenotype Aphril 2026] Call for Collaboration: Catalog of Potential Publications & Research Topics**

Welcome to the ongoing catalog of research ideas stemming from our Phenotype Aphril 2026 initiative!

This will be a frequently updated post to track potential topics for discourse and community-driven scientific publications. Please post your own ideas in the thread below and I will capture them here. If you are interested in collaborating on, leading, or contributing to any of these papers, please reply to this thread or ping me directly at rao@ohdsi.org.

I. System & Architecture (The “Black Box” Methodology)

Focus: How the underlying informatics tools, LLMs, and neuro-symbolic systems are engineered to build and validate phenotypes.

1. “Neuro-Symbolic Workflows in Observational Research: Constraining LLMs for Deterministic Phenotyping”
- Core Idea: A methodological paper detailing the “proposer-validator” framework for phenotyping to optimize operating characteristics.
2. “Automated Semantic Mapping: The Role of AI in Concept Set Selection for Observational Data”
- Core Idea: Focuses on the transition from manual vocabulary searching and heuristic recommenders (like Phoebe) to LLM-driven semantic interpretation.

II. Qualitative Logic & Design (The Cohort Structures)

Focus: Analyzing the theoretical frameworks, clinical translations, and impact expression of combinatorial logic used by the community to model intent.

3. “Deconstructing Clinical Intent: A Qualitative Analysis of the 6-Dimensional Phenotype Framework”
- Core Idea: An analysis of the 100+ community-submitted cohort definitions. This paper categorizes the qualitative combinations researchers proposed using the 6D framework (Diagnosis, Symptoms, Diagnostic Procedures, Therapeutic Interventions, Complications, and Alternative Diagnoses). It will analyze dimensions for frequency and their overall contribution towards operating characteristics.
4. “From Syndromic to Pathophysiological: Translating the 4th Universal Definition of Myocardial Infarction into RWD Computable Logic”
- Core Idea: Explores the feasibility and difficulty of modeling complex, modern clinical guidelines in current tools. It qualitatively assesses how researchers successfully transitioned from relying on simple billing codes to building multi-criteria pathways that require evidence of both biochemical necrosis (troponin) and active ischemia (EKG changes, specific symptoms).
5. “The Anatomy of a ‘Mimic’: Utilizing Exclusion Pathways to Maximize Specificity”
- Core Idea: A deep dive into the 6th dimension (Alternative Diagnoses). This paper evaluates the qualitative logic behind cohort variants that actively exclude patients with concurrent pulmonary (pneumonia), psychogenic (panic attacks), or chest wall mimics. It asks: how often are these alternate diagnoses an indicator of false positive vs. true positive, and what is their impact on measurement error?

III. Quantitative Performance & Evaluation (Empirical Results)

Focus: The objective measurement, PPV, sensitivity trade-offs, and error analysis of the generated cohorts.

6. “The KEEPER Effect: Quantifying AI-Assisted Case Adjudication in Phenotype Refinement”
- Core Idea: A pre-and-post evaluation study. It analyzes the baseline PPV and Sensitivity of initial community cohorts, and quantifies the exact performance delta achieved after KEEPER (acting as an AI clinician) highlighted false positives/negatives, prompting researchers to add specific inclusion rules.
7. “Cross-Database Phenotype Generalizability: Comparing Structured Claims vs. EHR Lab Trajectories”
- Core Idea: A quantitative network study assessing how the same cohort logic performs across fundamentally different data sources. It compares the performance of pure diagnosis-driven definitions against complex clinical-marker definitions (Troponin + Ischemia) in claims databases versus EHR-heavy systems.
8. “Quantifying ‘Garbage In’: An Empirical Analysis of RWD Measurement Errors in Cardiology”
- Core Idea: A taxonomy and quantitative breakdown of the errors discovered during the KEEPER evaluation phase. It analyzes the frequency of false positives caused by “Rule-Out” provisional coding and “Copy-Forward” macros, versus false negatives caused by “Click Fatigue” and siloed surgical care.

IV. Community & Open Science Dynamics (The Process)

Focus: How the global network collaborates, crowdsources, and establishes standards.

9. “Crowdsourcing the Gold Standard: Rapid Cohort Generation via the ‘Phenotype Aphril’ Challenge”
- Core Idea: A workshop/process report detailing how a globally distributed network of epidemiologists, clinicians, and data scientists collaborated openly to iterate on a single disease state. It evaluates the velocity and quality of open-source crowdsourcing versus traditional, siloed phenotype development.
10. “Human vs. Machine Ensembles: Benchmarking Cohort Viability in Observational Research”
- Core Idea: Expanding on the “Minds Meet Machines” symposium experiment. This paper quantitatively compares the PPV/Sensitivity of definitions generated purely by the automated LLM “black box” against definitions meticulously hand-crafted by human clinical experts.

What ideas are we missing? Drop them in the thread below!