OHDSI Phenotype Phebruary and workgroup updates

Gowtham_Rao · November 22, 2024, 8:00pm

Meeting Recap: November 22, 2024
AI generated

Workshop Reflections

Feedback:
- The workshop received an average score of 4.9/5, with many praising the high engagement and quality of the VA team’s presentations.
- Jacqueline highlighted the workshop’s value for fostering community connections and expressed interest in scheduling a demo for deeper exploration of key topics.
Follow-up Discussions:
- Plans to schedule calls focusing on key topics raised during the workshop:
  - Objective Diagnostics: To be discussed in January.
  - VA Large-Scale Phenotyping: Methods-focused session targeted for March.
  - Probabilistic Phenotyping: Identified as a 2025 OKR priority.

Objective Diagnostics

Update:
- Azza reported progress on the DME paper, which focuses on objective diagnostics. The paper is expected to be ready for discussion in approximately three weeks.
- A dedicated session is planned in January to review and discuss the diagnostic methods for cohort evaluation.
Next Steps:
- Finalize the paper and prepare for a full Friday session in January.

Inappropriate Medication Definitions

Update:
- Dr. Richard Boyce is revising the definitions for inappropriate medication use to align with the 2023 Beers Criteria.
- These definitions will be integrated into the phenotype library and evaluated through cohort diagnostics.
Evaluation Plan:
- JSON definitions will be created for integration.
- Cohort diagnostics will be run on various data sources to validate the definitions and identify necessary adjustments.
- Friday meetings in January will focus on this evaluation process.
Timeline:
- Updated definitions to be finalized by December and evaluation discussions scheduled for January.

January Meeting Plans

Schedule:
- January meetings will be dedicated to:
  1. Reviewing inappropriate medication definitions.
  2. Discussing objective diagnostics.
Integration with Phenotype February:
- These discussions may inform Phenotype February activities, depending on progress and relevance.

Probabilistic Phenotyping

Discussion:
- Joel and Azza emphasized expanding probabilistic phenotyping use cases as a 2025 OKR.
- Collaboration opportunities with FinGen (OMOP and GWAS data integration) were highlighted. Joel plans to explore this further in early 2025.
Next Steps:
- Define specific objectives for probabilistic phenotyping as part of 2025 OKRs.

OKRs and Phenotype February Planning

Next Steps:
- December 13 meeting will focus on:
  - Setting 2025 OKRs.
  - Planning topics for Phenotype February.
- Ideas will be circulated beforehand to ensure a productive discussion.
- Community input will be sought through forum posts to encourage broader participation.

Atlas Deployment and Collaboration

Concerns:
- Hayden raised issues with the Atlas user interface, which hinders collaboration on phenotyping tasks.
- Current deployment faces challenges like spam and infrastructure limitations.
Proposed Solutions:
- Explore setting up a temporary or improved Atlas instance for Phenotype February.
- Azza and Gowtham to discuss feasibility with Patrick and Evan.

Action Items

Objective Diagnostic Paper:
- Finalize and prepare for a dedicated session in January. (Azza)
Inappropriate Medication Definitions:
- Update to align with 2023 Beers Criteria and prepare for phenotype library submission. (Richard)
January Meeting Schedule:
- Schedule discussions on inappropriate medication definitions and objective diagnostics. (Azza, Gowtham)
Phenotype February and 2025 OKRs:
- Draft and finalize plans during the December 13 meeting. (Azza, Gowtham)
Community Engagement:
- Post on the forum to solicit ideas and participation for Phenotype February. (Azza)
Atlas Instance:
- Discuss the possibility of setting up a temporary Atlas instance for Phenotype February. (Azza, Gowtham)

Next Meeting

Date: December 13, 2024
Focus:

2025 OKRs planning.
Phenotype February topics and community engagement.

Gowtham_Rao · December 13, 2024, 12:20pm

We will cancel the meeting on 12/13/2024. We are looking forward to reconnecting in 2025!

Gowtham_Rao · February 1, 2025, 12:33pm

Meeting Recap January 10th 2025
AI Generated

1. Project Scope and Phenotype Development in Autoimmune/Rheumatology

Focus on a single clinical area (autoimmune/rheumatology) to address specific clinical needs
Comparison of multiple phenotype development approaches (manual, probabilistic, high-throughput methods)
Integration of clinical questions with methodological evaluation
Linking phenotype development to academic deliverables (e.g., manuscripts)

2. Collaboration and Outreach Initiatives

Community engagement through presentations (e.g., VA cipher phenotype library on weekly calls)
Involvement of external groups (clinical partners, generative AI group) to enhance phenotype quality and evaluation
Open training sessions and educational outreach using tools such as Atlas
Cross-group collaboration to leverage expertise and data

3. Operational Support and Management

Discussion on dividing workload and the need for dedicated project management support
Exploring funding and resource allocation to sustain working group activities
Coordination of administrative tasks and follow-up communications

4. Atlas Phenotyping Tool and Technical Issues

Troubleshooting cohort definition logic and inclusion criteria (handling OR statements)
Strategies for generating diagnostic statistics from Atlas (attrition reports, code diagnostics)
Guidance on using demo data versus local data for phenotype evaluation
Sharing of best practices for debugging and validating phenotype definitions

5. Additional Phenotype Definitions and Library Updates

Update and testing of inappropriate medication phenotype definitions (transition from 2019 to 2020 versions)
Compatibility testing using JSON files and integrating with the phenotype library
Establishing protocols for sharing definitions (public Git access or email distribution)
Future plans for methodological work in safety/utility phenotypes

6. Next Steps and Action Items

Drafting and circulation of a proposal for the 2025 workgroup plan
Soliciting feedback and additional ideas via forum posts and email communications
Establishing timelines for further reviews and meetings (e.g., follow-up in two weeks)
Coordination for individual assistance (reviewing cohort definitions, troubleshooting Atlas issues)

Topic 1: Project Scope and Phenotype Development in Autoimmune/Rheumatology

The conversation opens with a strong commitment to focusing on a specific clinical area—in this case, autoimmune diseases with a special emphasis on rheumatology. Participants agreed that anchoring the project in a well-defined clinical domain not only addresses pressing clinical needs but also aligns with academic deliverables (e.g., generating publishable manuscripts). The discussion emphasized developing phenotypes using multiple approaches. For instance, one proposal suggested comparing traditional manual code selection with newer probabilistic and high-throughput methods, allowing an “apples-to-apples” evaluation of performance characteristics. This dual emphasis on both methodological innovation and clinical relevance is central to the group’s strategy.

Speakers such as Azza Shoaibi and Christopher Mecoli contributed proactively—Azza underscored the value of choosing a focused area (“focus on one clinical area… autoimmune and rheumatology”) to yield tangible outputs that benefit both individual Members and the community. Christopher Mecoli, on the other hand, stressed the importance of comparing current practices with novel approaches, advocating for rigorous performance evaluation. Additional insights, such as linking the development work to clinical questions (e.g., identifying gaps in clinical guidelines) and ensuring the work supports both method and clinical research, were raised. This indicates a consensus that phenotyping should not be an end in itself but a means to answer significant clinical questions.

Azza Shoaibi [Advocacy]: Assertively proposed a structured approach—focusing on a single clinical area, leveraging multiple evaluation methods, and tying outcomes to academic outputs.
Christopher Mecoli [Advocacy]: Emphasized the need to generate new phenotypes in the autoimmune space and compare existing methods with novel probabilistic approaches, advocating for methodological rigor.
Other Contributors [Inquiry/Support]: Provided additional perspectives, such as the importance of clinical validation and manuscript linkage, thereby reinforcing the central theme.

The group implicitly assumes that concentrating on a single clinical domain will streamline the phenotype development process and facilitate deeper evaluation. There is a clear understanding that aligning clinical research questions with methodological testing will produce dual-value outcomes—both in clinical insight and methodological advancement. However, a potential information gap remains regarding the precise operationalization of the comparative evaluations (e.g., specific metrics, standardized procedures) which might need further clarification as the project advances.

The workgroup is considering prioritizing phenotype development in the autoimmune/rheumatology space by focusing on a single clinical area to address specific clinical questions. Multiple methods—ranging from manual coding to probabilistic phenotyping—will be evaluated side-by-side. This approach is designed to not only advance clinical understanding but also generate academic outputs, such as peer-reviewed manuscripts. Key contributors have stressed the importance of rigorous comparison and clinical validation, ensuring that the developed phenotypes are both scientifically robust and clinically relevant.

Topic 2: Collaboration and Outreach Initiatives

The discussion highlighted the importance of broad collaboration and active outreach within the workgroup. Participants emphasized leveraging internal and external resources to enrich phenotype development. A key proposal was to integrate community engagement by showcasing tools like the VA cipher phenotype library during weekly community calls, which would both inform and energize the broader community. Additionally, there was clear support for involving groups with expertise in emerging technologies—for example, collaboration with the generative AI group—to enhance phenotype quality and evaluation methods. The conversation also touched on using training sessions and open educational sessions (via tools like Atlas) to empower community members, thereby encouraging active participation and ensuring that methodologies are well-understood and consistently applied.

Jamie Weaver [Advocacy]: Advocated for community engagement by suggesting that presentations on the VA cipher phenotype library be incorporated into regular community calls. This proposal was intended to stimulate interest and drive collaborative efforts.
Honerlaw, Jacqueline [Supportive/Inquiry]: Built on earlier points by linking method innovations (such as high-throughput phenotyping) with clinical expertise, underscoring the value of a cross-disciplinary approach.
Gowtham Rao [Advocacy]: Introduced the idea of involving the generative AI group to explore alternative approaches for both phenotype development and evaluation, emphasizing improvements in data quality and evaluation precision.
Other Contributors [Support/Inquiry]: Contributed ideas on coordinating training sessions and sharing best practices across the community to maximize resource utilization and collective expertise.

The collaborative approach reflects an understanding that enhancing phenotype development requires not only technical innovation but also broad community participation. The integration of training and outreach initiatives ensures that new methodologies are disseminated widely, thereby fostering consistency in approach and validation across different groups. Implicitly, the workgroup acknowledges that bridging technical development with community engagement can accelerate both the refinement of methodologies and their adoption. An identified gap is the need for clear, structured plans for these outreach efforts, including timelines and designated roles, which may need further refinement as the project progresses.

The workgroup is actively pursuing enhanced collaboration and outreach strategies to support phenotype development. Proposals include showcasing resources like the VA cipher phenotype library during community calls and involving the generative AI group to improve phenotype quality and evaluation. Additionally, the group plans to conduct training sessions to build community capacity around tools like Atlas. This integrated approach aims to harness diverse expertise, foster broader engagement, and ensure that methodological advances are effectively communicated and adopted across the community.

Topic 3: Operational Support and Management

The conversation underscored a growing need for structured operational support and management within the workgroup. Participants recognized that the increasing scope of work, particularly around phenotype development and evaluation, necessitates a more coordinated administrative effort. This includes dividing tasks among group members and considering the hiring of dedicated project management support. Discussions touched on exploring funding opportunities and leveraging grant resources to support these roles. In addition, the group acknowledged the importance of establishing clear communication channels and follow-up processes (e.g., drafting a proposal for 2025 and scheduling subsequent meetings) to ensure continuity and effective collaboration.

Gowtham Rao [Advocacy]: Actively raised concerns about the workload and the need for operational support, suggesting the possibility of obtaining grant-funded assistance for project management and development activities.
Other Contributors [Support/Inquiry]: Echoed the need for shared responsibility, with several members highlighting that the current workload distribution is unsustainable and that a structured support framework is critical for the group’s success.

The emphasis on operational support reflects an implicit understanding that the success of technical and methodological initiatives hinges on efficient management and resource allocation. By considering dedicated project management roles and pursuing external funding, the group aims to create a sustainable framework that can handle the complexity and volume of work ahead. A notable gap, however, is the absence of a detailed plan for how these roles will be integrated and funded—a point that will likely need to be addressed in follow-up discussions and formal proposals.

The workgroup is addressing the need for enhanced operational support and management to cope with its expanding workload. Members have discussed the potential for dedicated project management and the pursuit of grant funding to help distribute tasks and streamline administrative processes. This move is intended to create a more sustainable framework for ongoing and future phenotype development projects by ensuring clear communication, structured follow-up, and effective resource allocation.

Topic 4: Atlas Phenotyping Tool and Technical Issues

The discussion on the Atlas phenotyping tool centered around technical challenges and practical workarounds when defining and evaluating cohorts. Participants raised specific issues regarding the configuration of inclusion criteria—particularly the use of OR statements—and how Atlas bundles these criteria, which affects the diagnostic statistics (such as patient counts and attrition reports) that can be generated. Members shared practical solutions, such as creating multiple copies of a definition to isolate the effects of individual inclusion criteria. There was also a debate about the limitations of using demo data versus proprietary datasets; while the demo instance is useful for learning and initial testing, it may not fully capture the behavior of definitions when applied to richer, real-world data.

Andrew [Inquiry]: Raised concerns about why the inclusion criteria in his phenotype definition did not yield expected changes in patient counts, indicating a potential misalignment between the logic and the underlying data.
Azza Shoaibi [Advocacy]: Provided detailed guidance on troubleshooting Atlas issues. She suggested practical workarounds such as splitting complex OR statements into separate definitions and using Atlas’s native attrition reports to better understand the impact of each criterion.
Jamie Weaver [Inquiry/Advocacy]: Contributed by highlighting related issues, such as the importance of ensuring that concept sets (e.g., measurement-related conditions) are relevant and appropriate for the data source, given that demo data might lack sufficient records.

The technical discussion reveals a critical balance between leveraging existing tools (like Atlas) for efficient phenotype development and understanding their limitations when working with incomplete or synthetic data sets. The group implicitly assumes that, with adequate troubleshooting and methodological adjustments, Atlas can be effectively used to generate meaningful diagnostic outputs. However, there is an information gap regarding standardized protocols or best practices for these troubleshooting steps, suggesting an opportunity for developing documentation or training materials. This would not only improve consistency in phenotype evaluation but also support less experienced users in navigating these technical challenges.

The workgroup addressed several technical challenges with the Atlas phenotyping tool, particularly concerning the handling of inclusion criteria using OR statements and the limitations of using demo data. Members discussed practical workarounds, such as creating multiple definitions to isolate individual criteria effects and relying on Atlas’s attrition reports for diagnostic insights. This discussion highlights the need for clear protocols and potentially enhanced training materials to ensure robust and accurate phenotype development using Atlas.

Topic 5: Additional Phenotype Definitions and Library Updates

This topic centers on updating and validating additional phenotype definitions, with a particular focus on inappropriate medication phenotypes. Participants discussed plans to update an existing definition—from a 2019 version to a 2020 version—emphasizing the need to test compatibility across different datasets. The conversation revealed that the updated phenotype definitions are maintained as JSON files and that there is an intention to integrate these definitions into the public phenotype library. This integration will enable users to run the definitions on various data sources, obtain diagnostic statistics (such as patient counts and attrition), and subsequently assess the robustness of these phenotypes. The discussion also touched on the importance of timely evaluation and collaboration, as well as establishing accessible repositories (e.g., public Git or direct email distribution) to streamline the update process.

Li, Xiaotong [Inquiry/Advocacy]: Initiated the discussion by raising the need to update the phenotype definition for potentially inappropriate medications and to test its compatibility using existing JSON files.
Boyce, Richard David [Support/Advocacy]: Referred to an earlier plan established in November, affirming the group’s commitment to this update and ensuring alignment with prior objectives.
Azza Shoaibi [Advocacy]: Outlined a clear process for incorporating the updated definitions into the phenotype library. She emphasized sending the JSON files to her for integration, so they could be executed on external data sources, thereby providing critical diagnostic feedback.

The discussion highlights the group’s strategic effort to keep phenotype definitions current and interoperable with various data environments. By standardizing updates and using JSON files, the workgroup is addressing potential issues of compatibility and reproducibility. However, there is an implicit assumption that once integrated, the evaluation metrics will provide sufficient feedback to validate the updates; yet, the discussion does not elaborate on specific evaluation benchmarks or protocols. This suggests an opportunity for the group to further formalize the evaluation process, ensuring that updates not only integrate smoothly but also meet quality standards.

The workgroup is updating its phenotype definitions for inappropriate medication from a 2019 to a 2020 version and plans to test these updates for compatibility across multiple datasets. The updated definitions, maintained as JSON files, will be integrated into the public phenotype library to facilitate automated diagnostic evaluations such as patient counts and attrition statistics. This initiative, reaffirmed by previous plans, aims to enhance both the reliability and usability of phenotype definitions, thereby supporting more robust research and clinical assessments.

Topic 6: Next Steps and Action Items

This segment of the discussion focused on actionable steps to move the group’s initiatives forward. Key next steps include drafting and circulating a proposal for the 2025 workgroup plan, which will detail the focus areas (e.g., autoimmune/rheumatology and inappropriate medication phenotypes) and integrate methodological evaluation components. Participants agreed to leverage multiple communication channels—for instance, posting on the forum and sending targeted emails—to solicit feedback and share resources such as cohort definitions and updated JSON files. Specific assignments were outlined: Azza Shoaibi will draft the proposal and facilitate further integration of phenotype definitions into the library, while individuals like Andrew, William, and Li, Xiaotong are expected to share their definitions and data links for troubleshooting and compatibility testing. Additionally, the conversation acknowledged the need for ongoing coordination (e.g., follow-up meetings scheduled in approximately two weeks) and expressed a willingness to address technical issues and manage collaborative tasks through dedicated project management support.

Azza Shoaibi [Advocacy]: Took the lead by outlining the plan for drafting the proposal, assigning tasks for sending phenotype definition links, and scheduling follow-up communications.
Andrew & William [Inquiry/Support]: Sought clarifications regarding technical troubleshooting of phenotype definitions and affirmed their commitment to sharing necessary data and links.
Li, Xiaotong [Advocacy/Inquiry]: Confirmed readiness to contribute by updating and sending JSON files for the inappropriate medication phenotype definitions, ensuring their timely integration into the public library.

The dialogue clearly delineates actionable items with designated responsibilities and timelines, reflecting a well-organized transition from conceptual discussions to concrete tasks. The structure is designed to ensure accountability and prompt iterative feedback, particularly for technical issues such as cohort definition validation in Atlas and JSON file compatibility. One implicit assumption is that the proposed communication and project management strategies will adequately address the technical and operational complexities; however, further clarification on evaluation benchmarks and project timelines may be necessary as the project progresses.

The workgroup has outlined clear next steps, including drafting a detailed proposal for the 2025 workgroup plan focused on autoimmune/rheumatology phenotypes and updating inappropriate medication definitions. Key actions involve sharing cohort definitions and JSON files for troubleshooting, soliciting community feedback via forum posts and emails, and scheduling follow-up meetings (targeted in the next two weeks). This structured approach, supported by dedicated project management efforts and collaborative troubleshooting, is intended to ensure smooth progression and accountability across all workgroup initiatives.

Gowtham_Rao · February 1, 2025, 12:33pm

Meeting Recap January 24th 2025
AI Generated

Agenda:

Guest Speaker Presentation: Albert Prats Uribe
- A detailed presentation by Albert Prats Uribe on the process for standardized and reproducible phenotyping in Darwin. This included a walkthrough of the workflow—from phenotype proposals and clinical descriptions to the generation of concept sets, cohort creation, and the use of diagnostic tools.
Discussion of OKRs for 2025:
- A review and discussion of the workgroup’s Objectives and Key Results for 2025. The OKRs covered three main areas: enhancing the science of phenotyping and best practices, community engagement and educational outreach, and the maintenance and development of tools for the community.
Brief Mention of the Anterior Uveitis Phenotype:
- The agenda noted an intention to talk about the anterior uveitis phenotype “if possible,” although this topic did not feature prominently in the subsequent discussions.
Open Q&A and Follow-Up Discussions:
- A period for participants to ask technical questions and seek clarifications—covering topics like matching strategies in cohort evaluations, documentation of process logic, and further technical details.

Albert Prats Uribe’s presentation and the ensuing discussion, capturing the key points and details as expressed during the meeting:

1. Overview and Rationale

Purpose of the Presentation:
Albert Prats Uribe outlined how phenotyping is organized within Darwin to improve transparency, reproducibility, and traceability in cohort development. He emphasized that a systematic process is critical for ensuring that every decision—from concept set creation to cohort generation—is clearly documented and reproducible.
Context and Motivation:
- Traceability: He stressed the need to “lock every decision” by assigning responsibility for each choice, which is especially important in complex phenotyping where decisions are often based on judgment rather than strictly correct or incorrect answers.
- Reusability: The process was designed so that once a phenotype is developed, the associated metadata and decision logic can be reused or reviewed in future studies.

2. Detailed Phenotyping Workflow

Albert provided a step-by-step overview of the workflow used in Darwin:

Phenotype Proposal and Clinical Description:
- The process begins with a phenotype proposal, usually initiated by a Principal Investigator (PI) or an external request (for example, from EMA).
- A clinical description is established—with input from clinical phenotyping resources (like clinical repositories) and discussions with the requestor—to ensure the phenotype’s relevance and clarity.
Developing the Phenotyping Plan:
- Once the clinical description is clear, the team outlines a phenotyping plan. This plan includes decisions such as:
  - Whether one or multiple concept sets are needed.
  - The specific logic (for example, differentiating between prevalent and incident cases) to define the cohort.
- An important part of this planning stage is verifying if similar cohorts or concept sets already exist, which would allow reusing or refining previous work.
Concept Set Generation and Review:
- Albert described two methods for generating concept sets:
  - Using Athena: Retrieving concepts (and all descendants) through established tools.
  - Using a Code Generator: A tool that conducts a systematic keyword search (including exclusions) over vocabularies.
- Documentation:
  - Every step of the concept set generation is documented (e.g., the keywords used and the search strategy), ensuring that the process is reproducible.
- Review Process:
  - Two clinical reviewers independently evaluate the generated concept set. They indicate which codes to include or exclude and provide comments.
  - If there are discrepancies between reviewers, a third person (often the PI) resolves the differences.
Cohort Generation and Diagnostics:
- The concept sets, once refined, are used to generate the actual cohort using tools like Adler or Kaper.
- Diagnostics and Iteration:
  - After the initial cohort generation, diagnostic forms are completed. These capture any discrepancies or issues (for example, identifying missing codes or misclassifications).
  - Based on these diagnostics, recommendations are made, further adjustments are performed, and the process is repeated until the cohort is finalized and signed off.
Matching Strategy:
- Albert also discussed the approach to creating a comparator or matched population:
  - Matching Criteria: Primarily based on demographic factors such as age and sex.
  - Observation Window: The matching is determined by ensuring that the comparator is “on observation” at the same time (i.e., they have overlapping observation periods with the cohort entry date).
  - He acknowledged that matching strategies might evolve (e.g., considering matching on visit dates) depending on the database characteristics and study needs.

3. Discussion and Clarifications

During the Q&A, several clarifications were made regarding the presentation:

Definitions within Concept Sets:
- Swerdel, Joel asked for clarification on the roles of “broad,” “narrow,” and “prevalent” codes.
- Albert’s Response: He clarified that in some chronic diseases, history codes (e.g., “history of” a condition) are simply added to the prevalent category, which is particularly relevant for primary care databases.
Documentation of Logic Beyond Concept Sets:
- Questions were raised about whether there is a formal review form to document the logic added to the cohort (beyond just the concept set generation).
- Albert’s Answer: He noted that the logic is pre-specified in the phenotyping plan and that while current processes focus on documenting the concept set, they are considering further refinement to capture detailed logic reviews.
Comparator Matching Details:
- Additional questions focused on how the matching for the comparator population is performed (e.g., ensuring that the comparator group has the same index date or visit conditions).
- Clarification Provided: Albert confirmed that, at present, matching is performed on the basis of demographics and overlapping observation periods, with some acknowledgment that improvements (such as matching on visit dates) could be beneficial.
Tool and Library Integration:
- Open Source Initiative:
  - Azza Shoaibi inquired about making the phenotype library public.
  - Albert’s Response: He mentioned that the intention is to make the tool open source by around July, facilitating broader collaboration and transparency.

Discussion on the OKRs for 2025:

Overview of OKRs:
- Presentation by Gowtham Rao introduced the 2025 OKRs, which are organized around three main objectives:
  - Enhance the Science of Phenotyping and Best Practices:
    - Key results include publishing a paper on the RSA phenotype library by Q2 2025, prototyping objective diagnostics (with pass/fail criteria) by Q2/3, and piloting AI-driven methods for concept set generation by Q4.
  - Community Engagement and Educational Outreach:
    - Key initiatives include executing “Phenotype February 2025” (a month-long collaborative effort), integrating the Odyssey Library into the VA Cipher with an accompanying paper (targeted for Q1/2), and evaluating core definitions for inappropriate medication use (targeted for Q1).
  - Maintenance and Development of Tools:
    - A major release of the RSA phenotype library is scheduled for Q4 2025.
Discussion and Participant Feedback:
- Collaborative Call-to-Action:
  - Azza Shoaibi emphasized the need for participants to sign up on the collaborative slide for various tasks, ensuring distributed workload and clear responsibilities.
- Ambition and Capacity Concerns:
  - Anna Ostropolets noted that while the objectives are ambitious, there is concern about overloading the workgroup with too many tasks.
- Technical Skepticism and Innovation:
  - Swerdel, Joel expressed healthy skepticism regarding the AI-driven concept set generation. Although he’s cautious about its validity, he also indicated willingness to contribute to this innovative area.
Implications for the Workgroup:
- The OKRs set clear, time-bound milestones that build on previous work while pushing the group toward more formalized, reproducible, and innovative approaches in phenotyping.
- The discussion reflects both enthusiasm for advancing the field and an awareness of potential resource and capacity challenges, highlighting the need for collaborative effort and careful planning.

Open Q&A and Follow-Up Discussions

Clarification on Matching Strategies:
- Participants, led by Swerdel, Joel, questioned how the comparator (matched) population is defined.
- Albert Prats Uribe clarified that matching is currently based on demographics (age and sex) and the requirement that comparators are “on observation” at the same time as the cohort’s index date.
- There was discussion about possibly refining the matching criteria (e.g., incorporating visit dates) to improve the relevance of the comparison groups.
Process Documentation and Logic Review:
- Questions were raised regarding the existence of a formal review form or documentation process for capturing the additional logic used in defining cohorts beyond the concept set generation.
- Albert explained that the phenotyping plan pre-specifies this logic, though he acknowledged that further refinements to document this process are being considered.
Extended Technical Inquiries:
- Additional technical clarifications emerged concerning nuances in cohort definitions and the application of criteria (for example, HIV criteria/Vitis) as well as discussions on ensuring robust documentation of decision-making steps.
- Andrew Kim and others contributed to these follow-up discussions, underscoring the complexity and need for continuous process improvement.
Administrative Follow-Up:
- The session concluded with reminders for participants to sign up on collaborative slides to indicate their interest in specific tasks.
- Gowtham Rao and Azza Shoaibi reiterated the importance of ongoing discussion and collaboration, even as the formal meeting wrap-up was announced.

Gowtham_Rao · February 1, 2025, 12:50pm

Kicking Off 2025 Phenotype Phebruary!

Dear OHDSI Community,

We are beyond excited to launch 2025 Phenotype Phebruary! For the past three years, February has been one of the most thrilling times for our community—a month where we come together to focus on the crucial science of phenotype development and evaluation.

Looking back, the past three Phebruaries have brought remarkable achievements:

Dozens of new phenotypes added to our library
Over 50 new collaborators joining our mission
Two peer-reviewed publications advancing the field
Key clinical insights into critical health conditions
Tens of educational sessions and engaging community calls ️
A deeper understanding of the challenges and opportunities in phenotype science

Now, as we step into February 2025, we’re fueled by our past successes, inspired by the magic of OHDSI, and more motivated than ever to push boundaries. We recognize the challenges ahead, but we also believe that the sky is the limit when this incredible community comes together!

Let’s make 2025 Phenotype Phebruary our best one yet!

Building on the momentum from Dry January, we are excited to continue our journey of tackling the 14 critical evidence gaps—questions where our observational data can provide valuable insights! Addressing these questions starts with building high-quality phenotypes for the exposures, indications (patient populations), and outcomes specified in each research question. Our primary goal this Phebruary is to develop these phenotypes, make them analysis-ready, and ensure they are available in the OHDSI Phenotype Library for future reuse.

How We’ll Achieve This

We’ll structure the month by aligning each week with a key step in the phenotype development and evaluation process:

Week 1 – Writing clinical descriptions & reviewing prior work
Week 2 – Developing cohort definitions
Week 3 – Evaluating cohorts using CohortDiagnostics
Week 4 – Iterating on cohort definitions & evaluating with additional OHDSI tools

Rather than discussing these steps in the abstract, we will work hands-on using real phenotypes needed for the 14 studies, engaging both study leads and the broader OHDSI community to collaborate at each stage.

Get Involved!

Weekly instructions will be posted in the forums
Discussions will take place in our community calls
Support will be available in Phenotype WG weekly meetings

Our collective mission is to track our progress and finalize as many phenotypes as possible for these studies, ensuring they are ready for analysis and available in the OHDSI Phenotype Library for broader community use.

Together, let’s make 2025 Phenotype Phebruary a milestone for phenotype development!

Your fellow Phebruarians
@Azza_Shoaibi @aostropolets @Gowtham_Rao

(Thank you to our artificial intelligence friend for the cheese writing style)

Gowtham_Rao · February 2, 2025, 11:54am

Phenotype Phebruary Planning
January 31st 2025
Leads: @Azza_Shoaibi @aostropolets @Gowtham_Rao

1. Phenotype Phebruary Overview and Timeline
    a. Purpose of a dedicated phenotyping month and introduction of the 14 study proposals
    b. Overall timeline and weekly breakdown (Week 1: clinical description; Week 2: concept set and logic building; Weeks 3–4: evaluation and iteration)
    c. Roles of study leads versus community participants

2. Phenotyping Process and Methodology
    a. Clinical description step (including use of a Gen AI prompt)
    b. Creation of concept sets and phenotype logic development
    c. Evaluation of phenotypes using tools (e.g., cohort diagnostics)
    d. Reusing existing phenotype definitions and leveraging the phenotype library

3. Logistics, Coordination, and Tracking
    a. Folder and repository structure within Teams (study-level vs. phenotype-level organization)
    b. Creation and management of a master tracker (Excel sheet) for progress and cohort ID tracking
    c. Ensuring access for all study leads and working group members

4. Contributor Engagement and Task Allocation
    a. Soliciting contributors via sign-up forms and clarifying required skills/commitment
    b. Defining responsibilities between study leads and collaborators
    c. Allocating specific tasks (clinical description, cohort development, evaluation, etc.)

5. Meeting Scheduling and Communication Protocols
    a. Planning recurring meetings (Tuesday demo, Wednesday study lead coordination, Friday working group calls)
    b. Setting up and managing Teams invitations and shared calendars
    c. Coordinating forum posts, meeting announcements, and follow-up communications

6. Q&A, Clarifications, and Process Adjustments
    a. Addressing questions on phenotype numbers and definitions (e.g., major surgery, antibiotics)
    b. Clarifying process steps and timelines for iterative improvement
    c. Discussing adjustments based on feedback and evolving needs

Phenotype Phebruary Overview and Timeline.

The conversation opens with an explanation of the purpose behind dedicating Phebruary to phenotype development. @aostropolets [Advocacy] outlines that while previous years lacked a specific focus, the current year is structured around 14 study proposals contributed by community members. The goal is to support study leads (e.g., @zhuk for an AKI study) and collaborators by providing a bounded time frame and a clear set of deliverables. The overall timeline is broken into a four-week plan:

Week 1 – Clinical Description:
A session is planned to discuss clinical descriptions. Participants are expected to use a Gen AI prompt (developed by @Gowtham_Rao ) that extracts necessary information to form a phenotype. This step also includes a literature search and an exploration of existing phenotype definitions (for example, checking for pre-existing definitions of AKI or obesity management).
Week 2 – Concept Set and Logic Building:
The second week focuses on creating the concept sets and building the logical framework of the phenotype. Here, many participants already familiar with tutorial work on concept sets building are expected to contribute. The study leads are expected to be fully engaged.
Weeks 3–4 – Evaluation and Iteration:
The final two weeks are dedicated to evaluating the developed phenotypes using tools such as cohort diagnostics. Iterations and refinements will be made based on these diagnostics. There is also mention of showcasing additional tools and validation approaches during these weeks.
@aostropolets [Advocacy]: Clearly explains the overall process, emphasizing the structured timeline and how it aligns with the community’s prior experiences. Her narrative is instructive and guiding, setting the stage for both study leads and contributors.
@Gowtham_Rao [Inquiry/Advocacy]: Inquires about specifics (e.g., the number of unique phenotypes required) and supports the rationale by referencing templates from the existing phenotype library.
@Azza_Shoaibi [Advocacy]: Reinforces the process details and clarifies that the structured timeline is strictly for Phebruary, while also assuring that post-Phebruary modifications are possible if necessary.
@zhuk [Inquiry]: Raises questions regarding the timeline and potential flexibility, particularly concerning iterative updates or late-stage additions.

Implicit Assumptions and Information Gaps:

Assumption: All study leads and contributors are presumed to be familiar with the phenotyping process and the available resources (e.g., phenotype libraries and cohort diagnostics).
Information Gap: While the timeline is clear, there is an implicit assumption that data will be available (for instance, the J&J data is mentioned as a fallback for cohort diagnostics). However, the contingency for participants without immediate data access is not fully explored.
Assumption on Reusability: It is assumed that many of the phenotypes (such as those for MI, stroke, or common drug exposures) already exist and can be reused. The process for validating and adapting these pre-existing phenotypes is touched upon but not fully detailed.

The structured approach for Phenotype Phebruary is designed to create a sense of urgency and clarity among participants. The division into weekly milestones helps to focus efforts and provides measurable checkpoints (e.g., completion of clinical descriptions by the end of week one). There is a strong reliance on existing resources (such as the phenotype library), which streamlines work but may also limit innovation if not revisited critically. The dialogue reflects a collaborative dynamic, with participants openly discussing potential issues (like data availability and phenotype granularity) and affirming that iterative refinement is both expected and supported. The approach seems tailored to balance guided instruction (for less experienced members) with flexibility for seasoned participants.

In the recent Phenotype Phebruary planning session, the working group outlined a structured four-week process dedicated to phenotype development. The process begins with a clinical description phase using Gen AI prompts and literature searches, followed by a week dedicated to building concept sets and phenotype logic. The final two weeks focus on evaluating and refining these phenotypes with diagnostic tools. The session emphasized support for study leads and reusing existing phenotypes where possible, while also ensuring that contingency plans are in place for data limitations. The overall goal is to have ready-to-use phenotypes by the end of Phebruary, providing a clear framework for the community to build upon.

Phenotyping Process and Methodology.

The discussion on phenotyping methodology centers on the stepwise approach that underpins the entire Phenotype Phebruary initiative. Participants describe a three-step process: first, creating a clinical description; second, developing concept sets and the logical framework for the phenotype; and third, evaluating the phenotypes using diagnostic tools. The process starts with leveraging a Gen AI prompt to extract key clinical information, followed by targeted literature reviews and comparisons with existing phenotype definitions (e.g., for conditions like AKI or obesity management). In the subsequent phase, the group focuses on constructing concept sets—drawing on both pre-existing tutorials and community expertise—to form the logical constructs of the phenotypes. Finally, the evaluation phase involves running cohort diagnostics and iterating on the phenotype definitions, with additional tools demonstrated later in the month to further validate the results.

@aostropolets [Advocacy]: Establishes the overall process by detailing the weekly breakdown and emphasizing the importance of a structured clinical description. Her instructions set the foundation for understanding the subsequent steps in the phenotyping process.
@Gowtham_Rao [Inquiry/Advocacy]: Contributes by questioning and clarifying aspects of phenotype creation, such as the use of templates from the existing phenotype library and the differentiation between drug exposures and condition phenotypes. His input reinforces the notion that many phenotypes can be templated, though some require bespoke development.
@Azza_Shoaibi [Advocacy]: Reiterates the need for a clear, methodical approach—stressing that the clinical description should inform whether a phenotype needs to be built from scratch or can be adapted from existing resources. Azza also clarifies the process for concept set development and evaluation, ensuring that both experienced and new members understand the intended workflow.
@zhuk [Inquiry]: Raises questions regarding the flexibility of the process and timing, particularly around iterative updates and whether additional phenotypes can be incorporated later. His contributions highlight the need for clear boundaries and timelines while also acknowledging that some flexibility will be maintained.
Assumption: It is presumed that all participants have at least a basic familiarity with using concept set tutorials and the Gen AI tools that will support clinical description.
Information Gap: While the methodology emphasizes reusing existing phenotypes, there is limited discussion on the criteria for determining when a pre-existing phenotype is “good enough” versus when it needs to be rebuilt from scratch.
Assumption: There is an underlying expectation that the iterative evaluation phase (using cohort diagnostics) will be sufficient to identify and correct any deficiencies in the phenotype definitions.

The methodology is designed to streamline phenotype development by dividing the process into discrete, manageable steps. The structured approach not only facilitates collaboration among participants with varying levels of expertise but also encourages the reuse of validated resources, which can save time and reduce redundancy. However, the discussion suggests a tension between templated approaches and the need for customization, particularly in complex cases such as major surgery or specific drug formulations. The iterative evaluation phase is critical, as it provides a built-in mechanism for quality control, but success in this phase hinges on the availability of appropriate data and the effective use of diagnostic tools. Overall, the process is both rigorous and flexible—enabling rapid progress while allowing room for adjustments based on real-world findings.

The Phenotyping Process and Methodology session outlined a structured, three-phase approach for developing phenotypes. The process begins with creating detailed clinical descriptions using Gen AI prompts and literature reviews, followed by constructing concept sets and logical frameworks. The final phase focuses on evaluating these phenotypes through cohort diagnostics and iterative refinements. Participants emphasized reusing existing phenotype templates where applicable, while also discussing criteria for developing new definitions when necessary. The approach is designed to balance standardized procedures with the flexibility to address complex or unique cases.

Logistics, Coordination, and Tracking

The discussion on logistics, coordination, and tracking centers on the administrative and technical infrastructure needed to manage the Phenotype Phebruary initiative. Participants deliberate on how to structure shared folders and repositories within Microsoft Teams, debating whether organization should be study-centric or phenotype-centric. There is a consensus that a master tracker (an Excel sheet) should be created to consolidate the list of phenotypes from the initial proposals—transforming Patrick’s original data into a long-form, one-row-per-phenotype format. This tracker is intended to record progress across key steps (e.g., clinical description, cohort development, evaluation) and to track the eventual cohort IDs. Additionally, ensuring that all study leads and community contributors have the necessary access to working group channels is a priority. Discussions also cover how to manage the logistics of version control (e.g., public Atlas versus secured Atlas) and the subsequent migration of completed artifacts into the phenotype library.

@Azza_Shoaibi [Advocacy]: Leads the logistics discussion by emphasizing the immediate need for a structured tracker and clear folder organization. Azza provides detailed instructions on how to consolidate and monitor progress using the tracker, and outlines the requirements for data tracking (e.g., cohort IDs).
@Gowtham_Rao [Inquiry/Advocacy]: Supports the conversation by probing how to best organize the folders (study-level vs. phenotype-level) and clarifies the importance of having unified tracking for phenotypes, particularly regarding shared definitions and reusability.
@aostropolets [Advocacy]: Contributes by highlighting the need for study leads to verify and, if necessary, update phenotype labels in the master tracker. Anna also discusses the coordination of access permissions and stresses the eventual transfer of finalized cohorts into the phenotype library or GitHub repositories.
Lana Shubinsky [Inquiry]: Provides input on the technical aspects of folder organization and meeting logistics, seeking clarification on how best to set up recurring Teams meetings and ensure consistent access for all relevant members.
Assumption: It is assumed that transforming Patrick’s list into a detailed, long-form tracker will be straightforward and that all study leads have the familiarity to verify and annotate their phenotype labels accordingly.
Information Gap: The discussion hints at potential technical challenges—such as coordinating access for external study leads or managing the integration between public Atlas IDs and the secured Atlas—but these challenges are not fully resolved in the conversation.
Assumption: There is an underlying expectation that the chosen platform (Microsoft Teams and associated tools) is sufficiently robust to handle the tracking and collaborative tasks without major technical issues.

The logistics discussion is a crucial backbone for the Phenotype Phebruary initiative. The group is focused on establishing a clear, centralized system for tracking progress and managing documents, ensuring that every phenotype is accounted for and accessible to all collaborators. The decision to use a master tracker as a central repository of information highlights the need for transparency and real-time updates in a collaborative, multi-stakeholder environment. However, the conversation also reveals the complexities of integrating multiple platforms (Teams, Atlas, GitHub) and ensuring all participants, including external study leads, have the appropriate access and understanding. This coordination will be key to maintaining momentum throughout the initiative and ensuring that administrative challenges do not impede scientific progress.

In the logistics and coordination session, the group agreed to create a master tracker—an Excel-based tool—to consolidate and monitor progress for each phenotype. The tracker will record key milestones such as clinical descriptions, cohort development steps, and final cohort IDs, ensuring clear oversight across the initiative. Discussions also focused on establishing an effective folder structure within Teams, debating whether to organize by study or phenotype, and ensuring all study leads have access to the necessary resources. This robust tracking and coordination framework is intended to streamline progress and facilitate the eventual migration of completed phenotypes into the centralized library.

Contributor Engagement and Task Allocation.

This segment of the conversation centers on strategies for actively engaging contributors and clearly assigning tasks within the Phenotype Phebruary initiative. The discussion outlines the need to solicit volunteer participation using structured methods such as a sign-up form or Google Form, where potential contributors can indicate their skill sets (e.g., clinical description, concept set building, evaluation) and time commitment. The aim is to build a comprehensive pool of participants—including both study leads and community members—to ensure that every aspect of the phenotype development process is covered. The conversation also clarifies that not every contributor is required to complete all steps; instead, individuals can focus on specific tasks aligned with their expertise. Furthermore, responsibilities are delineated such that study leads are tasked with overall phenotype oversight and validation, while collaborators assist with defined components, ensuring a distributed workload and collaborative synergy.

@Azza_Shoaibi [Advocacy]:
Azza leads the discussion by emphasizing the immediate need for a formal contributor sign-up process. She suggests that a form should capture critical details such as email address, time commitment, access to data, and specific skills relevant to the phenotyping process. Her guidance aims to streamline task allocation and ensure that each step in the process—from clinical description to evaluation—is adequately staffed.
@Gowtham_Rao [Inquiry/Advocacy]:
Gowtham contributes by discussing the importance of pooling both new volunteers and those already known to the group (e.g., from previous collaborations or responses to Patrick’s earlier call). He raises the idea of leveraging the existing pool of study leads who already have a vested interest in their respective phenotypes.
@aostropolets [Advocacy]:
Anna underscores the necessity for study leads to validate and update phenotype labels, which will later feed into the master tracker. She emphasizes that clear communication of roles and responsibilities—both for study leads and the additional contributors—is essential for the smooth progression of the work.
Implicit Positioning:
There is a shared recognition that while the initiative must harness collective expertise, clear delineation of tasks is needed to avoid duplication and ensure accountability. Contributors are expected to volunteer based on their strengths, thereby optimizing the collaborative effort.
Assumption: The approach assumes that contributors, once engaged through the sign-up process, will commit to specific tasks and that their skill levels will align with the needs of the project.
Information Gap: While the mechanism for collecting contributor details is well described, there is less clarity on how conflicts in task allocation (e.g., overlapping interests or skill mismatches) will be managed.
Assumption on Flexibility: It is implied that task assignments may evolve as the project unfolds, yet the process for revising or reallocating responsibilities if needed is not explicitly defined.

The contributor engagement and task allocation strategy is designed to harness the diverse skills of the working group while maintaining clear accountability. By utilizing a structured sign-up process, the initiative aims to capture essential information that will inform subsequent task assignments and ensure a balanced workload. This method fosters a participatory environment where study leads receive the necessary support while also opening the door for new contributors to bring fresh insights. However, success depends on clear communication channels and the effective management of the contributor pool, especially as task requirements evolve throughout the initiative.

In the Contributor Engagement and Task Allocation session, the working group outlined plans to streamline volunteer participation by launching a sign-up form where interested members can indicate their skills, time commitment, and access to data. This structured approach is intended to build a diverse pool of contributors who will support various stages of the phenotype development process—from clinical description to evaluation—while study leads maintain overall responsibility for their respective projects. The strategy emphasizes clear role delineation to prevent overlap and ensure effective collaboration across the initiative.

Meeting Scheduling and Communication Protocols.

This segment focuses on how the team plans to coordinate their meetings and manage communication throughout the Phenotype Phebruary initiative. The conversation addresses potentially setting up recurring meetings—including Tuesday demos, Wednesday coordination calls for study leads, and Friday working group sessions—to ensure consistent progress. The group discusses the logistics of using Microsoft Teams for these meetings, including the creation of shared links, calendar invitations, and distribution lists. Key points include determining the optimal meeting times (with some debate over Wednesday’s call start times) and ensuring that all study leads and contributors are added to the relevant Teams channels. There is also consideration given to integrating these meeting schedules with the overarching project tasks, such as reviewing phenotype labels and discussing logistics updates.

@aostropolets [Advocacy]:
Anna emphasizes the need for clear communication, ensuring that study leads are informed and have access to the meetings. She suggests using the workgroup’s communication channels to disseminate meeting invitations and agenda details effectively.
@Azza_Shoaibi [Advocacy]:
Azza plays a leading role in outlining the specific meeting schedule and clarifying the objectives for each call (e.g., Tuesday demos, Wednesday study lead discussions, Friday open group sessions). She stresses that meeting invitations should include links to supporting documents (such as the forum post) so that participants can easily access relevant information.
@Gowtham_Rao [Inquiry/Advocacy]:
Gowtham raises questions about the integration of Teams features—like recurring links—and highlights the need to accommodate different schedules, ensuring that meetings are convenient for all involved. His contributions underline the importance of technical logistics in sustaining smooth communication.
Lana Shubinsky [Inquiry]:
Lana queries the practical aspects of scheduling, such as ensuring that all study leads can attend and confirming that the recurring meeting link functions as intended. She also clarifies the process for setting up the invitation, demonstrating attention to detail in meeting logistics.
Assumption: The team assumes that using a single recurring Teams link for all meetings (or for specific sets of meetings) will be both efficient and sufficient to reach all intended participants.
Information Gap: There is limited discussion on how last-minute scheduling conflicts or changes in availability will be managed, particularly for the externally invited study leads.
Assumption: It is presumed that all participants are familiar with Microsoft Teams and its scheduling features, and that the existing distribution list will dynamically update to include new members without manual intervention.

The meeting scheduling and communication protocols are designed to maintain momentum and ensure that every stakeholder remains informed and engaged. The conversation reflects a balance between structured planning (with predefined meeting times and clear objectives) and flexibility (allowing adjustments based on participant availability). Emphasis on linking meeting invitations to supporting materials indicates a comprehensive approach to information sharing. However, reliance on a single communication platform assumes uniform proficiency and may require additional contingency measures for addressing unforeseen scheduling conflicts.

The team established a detailed meeting schedule to coordinate the Phenotype Phebruary initiative, setting recurring sessions on Tuesday, Wednesday, and Friday. These meetings are intended to cover demos, study lead coordination, and broader working group discussions. Participants stressed the importance of using Microsoft Teams to create recurring invitations with shared links to essential documents, ensuring all study leads and contributors are informed. This structured approach aims to foster clear communication and sustained progress throughout the project.

Q&A, Clarifications, and Process Adjustments.

This segment covers the session’s open discussion, during which participants fielded questions, clarified process uncertainties, and contemplated potential adjustments to the phenotyping workflow. The dialogue reveals concerns about the specificity and granularity of phenotype definitions—such as the varying definitions of “major surgery” or nuances in drug exposures—and whether pre-existing phenotypes are sufficient or require modifications. Questions about timeline flexibility, particularly if some phenotypes are not finalized by the end of Phebruary, are raised, highlighting the tension between adhering to deadlines and accommodating real-world constraints. The discussion also touches on the possibility of iterating beyond Phebruary, even though the official process is bounded to that month, emphasizing that iterative refinement is both expected and supported. Overall, this Q&A session serves to address ambiguities and ensure that all participants understand the process and expectations, while also allowing for minor adjustments based on emerging challenges.

@aostropolets [Advocacy]:
Anna is proactive in clarifying process steps and the importance of adhering to the timeline, while reassuring participants that iterative adjustments are permissible even after the formal deadline.
@zhuk [Inquiry]:
Oleg raises concerns regarding the timing and the possibility of modifying definitions—especially in complex cases such as “major surgery”—which signals a need for flexible boundaries within a structured framework.
@Azza_Shoaibi [Advocacy]:
Azza reinforces the timeline by emphasizing that while post-Phebruary modifications are possible, having a clear cutoff is essential for progressing to subsequent phases (e.g., data network studies). He also clarifies that the process is designed to accommodate iterative refinement, even if it means revisiting definitions or cohort specifications later.
@Gowtham_Rao [Inquiry/Advocacy]:
Gowtham contributes by underlining the importance of meeting deadlines for progression while acknowledging that the process is inherently iterative. He supports the idea that if phenotypes are not ready by the deadline, those studies might not advance to the next phase.
Assumption: The process assumes that the predefined deadlines (end of Phebruary) are sufficient to capture and correct most issues, even though participants acknowledge that complete alignment may be challenging.
Information Gap: There is some ambiguity around the criteria for “good enough” phenotypes and how exactly iterative improvements will be integrated post-deadline. While the team signals flexibility, the detailed process for post-deadline modifications is not fully delineated.
Assumption: It is presumed that clarifications provided during the meeting will be sufficient to resolve most participant queries, thereby maintaining progress without significant delays.

The Q&A and clarification phase is crucial as it addresses potential friction points in the process and reinforces the balance between structure and flexibility. The conversation reflects a well-calibrated effort to manage expectations: while the initiative sets firm deadlines to drive progress, it also recognizes the need for iterative refinement and adjustments based on real-world complexities. This dual approach is essential for managing a collaborative project with diverse participants and varying levels of expertise. The open exchange of questions and clarifications helps to build confidence and ensures that all stakeholders are aligned on process goals, timelines, and contingencies.

In the Q&A and Clarifications session, the team addressed concerns about phenotype specificity, timeline rigidity, and the process for making adjustments. Participants discussed the challenges of defining complex phenotypes—such as “major surgery”—and debated whether pre-existing templates are adequate. While the official deadline for phenotype completion is set at the end of Phebruary, the team acknowledged that iterative improvements can continue post-deadline. This balanced approach reinforces the need for structured progress while accommodating real-world complexities, ensuring that all contributors understand the expectations and can collaborate effectively.

Gowtham_Rao · February 4, 2025, 10:59am

Seeking Volunteers: Help Create and Manage Tracking Spreadsheet

Hey everyone,

We are seeking volunteers to assist in converting the OHDSI_2025_GuidelineDrivenEvidencePhenotypeNeeds_28Jan2025.xlsx dataset from its current wide format to a long format. This restructuring is crucial for our ongoing efforts in the OHDSI 2025 Guideline-driven Evidence Phenotype initiative.

Objective:

Transform the dataset so that each row represents a single phenotype, with specific attributes detailed in separate columns.

Target Long-Form Structure something like:

Each row should include the following columns

phenotypeName – The name of the phenotype.
phenotypeType – Classification such as indication, target, comparator, or outcome.
needPhenotypeDevEval – Indicates whether phenotype development and evaluation are needed (e.g., drug cohorts).
similarCohortsInOhdsiPl – References to any similar cohorts in the OHDSI Phenotype Library.
canReuseCohortsInOhdsiPl – Specifies if existing cohorts in the OHDSI Phenotype Library can be reused.
linkToLiteratureSearch – URLs or references to literature supporting the phenotype
studyLeadsNeedPhenotype – Names of study leads who require this phenotype.
targetClinicalDescription – A detailed clinical description of the phenotype.
candidateCohortDefinitions – Any candidate cohort definitions; track contributions from the community. Use atlas-demo.ohdsi.org

Gowtham_Rao · February 7, 2025, 4:44pm

Phenotype Phebruary 2025 Office Hours Phebruary 5th 2025

Topics discussed:

1. File Management & Team Coordination

Centralized Repository in Phenotype Workgroup Folder
Future Storage in GitHub
Use of Teams Channels & Folder Organization
Practical Tips for Document Sharing & Searching

2. Searching & Reusing Existing Phenotypes

OHDSI Phenotype Library & CSV Lookup
Shiny App for Cohort Diagnostics
VA’s Cipher Library
Other Resources: Darwin, Sentinel, Forum Posts

3. Phenotype Development & Validation Approaches

Pass/Fail Rubrics & Decision Making
Tools: Cohort Diagnostics, PheEvaluator, Keeper
Balancing Qualitative & Quantitative Measures
Importance of Iterative Reviews & Expert Input

4. Discussion on Pediatric Vision Screening Use Case

Non-Disease Phenotype Strategies
Role of Comparator Cohorts (Well-Child Visits, Vaccinations)
Procedure-Based Logic vs. Condition-Based Logic
Documenting Known Coding Gaps & Uncertainties

5. Discussion on Ulcerative Colitis

Kevin’s IBD Phenotypes
Overlaps & Synergy with UC/CD Definitions
“Enhanced” Treatment Pathways
Transition from J&J Library to OHDSI Library

Topic 1: File Management & Team Coordination

The group agreed to use the Teams Phenotype Workgroup folder as the centralized location for all Phenotype Phebruary materials, ensuring easy discovery and consistent organization. Ultimately, finalized study cohorts and code will move to GitHub for open collaboration. Each study has a dedicated subfolder, and leads should confirm access rights for all contributors. This setup aims to minimize confusion, promote transparency, and create a smooth hand-off to GitHub once the cohorts are finalized.

Centralized Repository in Phenotype Workgroup Folder
- Several participants [Inquiry] asked where to store new or updated study documents.
- Anna [Advocacy] explained that all phenotype-related content for “Phenotype Phebruary” should be placed in the Phenotype Workgroup Files (Teams → Phenotype Workgroup → Documents → General → Phenotype Phebruary 2025 subfolder).
- Rationale: Keeping everything in one place ensures the workgroup can easily review and coordinate, especially during the month-long phenotype push.
Future Storage in GitHub
- Anna [Advocacy] noted that while Teams will function as the short-term workspace, GitHub will be the eventual long-term repository for finalized cohorts and study packages.
- The group [Agreement] accepted that each study’s ultimate destination is a dedicated GitHub repository to ensure transparent version control and public access.
Use of Teams Channels & Folder Organization
- Chris [Inquiry] highlighted confusion about which Teams channel or folder to use for their specific study materials.
- Anna [Advocacy] clarified that all Phenotype Phebruary content should live in the Phenotype Workgroup’s general channel and be placed under the appropriate subfolder named by week (e.g., Week 1, Week 2).
- Action Item: Each study lead will direct collaborators to the correct Teams subfolder for uploading relevant files.
Practical Tips for Document Sharing & Searching
- Christopher M. [Inquiry] raised questions about how to efficiently find existing files and how to share them with the right collaborators.
- Anna [Advocacy] suggested using the “Files” tab in the channel and employing consistent naming conventions for clinical descriptions and spreadsheet logs.
- Information Gap: Some participants lack admin rights to add new members. Resolution: Anna [Advocacy] volunteered to add missing members upon request.

Implicit Assumptions & Information Gaps

Assumption: Everyone in the workgroup can access Teams’ general channel and subfolders; some participants might still need channel permissions.
Gap: No automated process to ensure new members are promptly added; relies on manual requests to the channel admins.

Topic 2: Searching & Reusing Existing Phenotypes
The group discussed various resources for locating existing phenotypes, highlighting the OHDSI Phenotype Library (with a supplemental CSV file), a Shiny diagnostics app, VA’s Cipher platform, and external sources like Darwin and Sentinel. While each provides a starting point for defining cohorts, users must verify that available definitions align with their precise research questions. Conversion to Atlas-compatible logic and the need for context-specific adjustments remain crucial steps.

OHDSI Phenotype Library & CSV Lookup
- Chris [Inquiry] expressed difficulty searching through the OHDSI Phenotype Library—particularly having to use the CSV file for definitions.
- Anna and Gowtham [Advocacy] acknowledged the suboptimal nature of the current CSV-based lookup.
- Action Item: Use the CSV along with references in the library’s GitHub repository to locate existing cohort definitions.
Shiny App for Cohort Diagnostics
- Anna [Advocacy] shared that a Shiny application at data.ohdsi.org can display patient characteristics and basic diagnostics for some phenotypes in the OHDSI Library.
- Gap: The Shiny app is not entirely up to date—some newer or modified cohorts may be missing or deprecated.
- Gowtham [Inquiry/Advocacy] explained that the Shiny app gives partial coverage but still helps in evaluating initial patient counts and characteristics.
VA’s Cipher Library
- Gowtham [Advocacy] recommended VA’s Cipher (Centralized Interactive Phenomics Resource) as a more user-friendly front-end to find validated or established OMOP-conformant definitions.
- Implicit Assumption: Cipher includes machine-learning-based phenotypes as well as “rubric-based” definitions (i.e., clear rule-based logic). Study leads must verify they are using rule-based cohorts for community-wide network studies.
Other Resources: Darwin, Sentinel, Forum Posts
- Anna [Advocacy] noted that beyond the official OHDSI Library, other large-scale initiatives (e.g., Darwin EU, Sentinel) publish or store phenotype definitions.
- Implicit Assumption: Accessing these can be challenging if they are not readily packaged for OMOP. Sometimes only code lists (no Atlas JSON) are available.
- Recommended Approach: Identify relevant code sets, then manually adapt them into Atlas cohorts or JSON specification for usage in OHDSI studies.

Implicit Assumptions & Information Gaps

Assumption: Contributors know how to adapt external code lists into Atlas format for use in the OHDSI ecosystem.
Gap: Many validated definitions exist in scattered resources, requiring manual curation or reformatting.

Topic 3: Phenotype Development & Validation Approaches
The workgroup discussed the importance of combining qualitative (expert judgment) and quantitative (cohort diagnostics) approaches to phenotype validation. While no single pass/fail threshold exists, guidelines and tools—Cohort Diagnostics, PheEvaluator, Keeper—can help users systematically refine definitions. Ultimately, iterative reviews that incorporate stakeholder input are critical to build robust phenotypes suited for multi-site network analyses.

Pass/Fail Rubrics & Decision Making
- Andrew [Inquiry] asked about a systematic way to judge “pass/fail” phenotypes or data quality.
- Gowtham [Advocacy] clarified that while there’s no single objective threshold, qualitative and quantitative criteria from cohort diagnostics guide “fitness for use.”
- Anna [Advocacy] and Gowtham [Advocacy] referenced prior research on creating a workflow rather than a strict numeric threshold, recommending iterative review and stakeholder consensus.
Tools: Cohort Diagnostics, PheEvaluator, Keeper
- Anna & Gowtham [Advocacy] described multiple validation tools already developed in the OHDSI ecosystem:
  - Cohort Diagnostics – Provides descriptive statistics and potentially flags data quality issues.
  - PheEvaluator – Evaluates cohorts using chart review logic or external references for “ground truth.”
  - Keeper – Offers (in some contexts) patient-level review with potential GenAI enhancements.
- Information Gap: Not all tools are equally mature; the group must tailor usage to each study’s data availability and needs.
Balancing Qualitative & Quantitative Measures
- Andrew [Advocacy] emphasized a structured, qualitative approach for consistent decision-making.
- Gowtham [Advocacy] explained that human judgment (clinical expertise) combines with data-driven insights (prevalence, incidence, code frequencies) for final determination.
- Action Item: Follow published frameworks (e.g., the “phenotype evaluation” paper) as a reference to unify qualitative impressions with objective metrics.
Importance of Iterative Reviews & Expert Input
- Anna [Advocacy] underscored that peer feedback—including clinicians and analysts—often improves definitions.
- Assumption: Each study lead will incorporate repeated reviews of cohort diagnostics output, revising inclusion/exclusion logic until confident in the final design.

Implicit Assumptions & Information Gaps

Assumption: Most participants have at least some familiarity with tools like Cohort Diagnostics, or can attend training sessions/office hours if needed.
Gap: Access to all advanced validation approaches (e.g., chart review data) varies by site.

Topic 4: Discussion on Pediatric Vision Screening Use Case
The group examined how to characterize pediatric vision screening within OMOP. Rather than a disease-centric approach, leaders proposed defining procedures, typical age criteria, and relevant codes. Comparators such as well-child visits or vaccination records can help gauge screening uptake. However, data capture inconsistencies—where only a fraction of screening procedures appear in OMOP—must be documented and addressed during analysis.

Non-Disease Phenotype Strategies
- Michelle [Inquiry] posed questions on how to handle a phenotype that is not strictly a disease—specifically, pediatric vision screening.
- Anna [Advocacy] acknowledged this use case differs from typical condition-based phenotypes, urging Michelle to think about procedures and clinical events rather than diagnoses.
- Implicit Assumption: The process for describing a “non-disease” phenomenon still requires stating clear, clinically relevant rules (e.g., age brackets, procedure codes) in a clinical description.
Role of Comparator Cohorts (Well-Child Visits, Vaccinations)
- Michelle [Inquiry] raised the idea of comparator cohorts—e.g., children who had at least one well-child visit or vaccination—to assess screening rates by comparison.
- Anna [Advocacy] supported including these comparator definitions, explaining they might serve as proxies for overall healthcare engagement.
- Action Item: Identify suitable codes for well-child visits or vaccinations and define inclusion/exclusion logic for a comparator cohort.
Procedure-Based Logic vs. Condition-Based Logic
- The discussion clarified that pediatric vision screening often appears in billing data as a procedure code rather than a diagnosis code.
- Anna [Advocacy] suggested systematically outlining the hallmarks of these screenings (e.g., device usage, frequency, typical patient age).
- Gap: Not all screening methods have standardized or reliably used procedure codes, so some local data may only be partially mapped to OMOP.
Documenting Known Coding Gaps & Uncertainties
- Michelle [Inquiry] noted a mismatch between local data (where screening codes appear inconsistently) and the corresponding OMOP procedures that capture only ~10% of expected events.
- Anna & Gowtham [Advocacy] recommended explicitly documenting these data limitations in the clinical description, clarifying possible underestimation in large-scale network analyses.
- Assumption: Iterative refinement and testing via cohort diagnostics will help gauge the magnitude of missing codes.

Implicit Assumptions & Information Gaps

Assumption: There are existing CPT4 or other billing codes for vision screening in pediatric populations; however, usage may vary by site.
Gap: Unclear how best to reconcile partial capture of procedure codes in the data, especially when local EHR data differ from standard claims databases.

Topic 5: Discussion on Ulcerative Colitis
Kevin’s Ulcerative Colitis and Crohn’s Disease phenotypes are nearly ready for the OHDSI Library, having originated in J&J’s internal library. The group underscored the importance of harmonizing existing definitions, ensuring they align with network standards. Kevin is refining an “enhanced treatment pathway” approach that captures drug switching and durations, moving beyond traditional summary visualizations.

Kevin’s IBD Phenotypes
- Kevin [Advocacy] reported that most of his Inflammatory Bowel Disease (IBD) phenotypes have been developed and are ready to be transferred from the J&J Library to the OHDSI Library.
- Action Item: Ensure a clean handoff of final cohort definitions, referencing IDs in the J&J Atlas environment.
Overlaps & Synergy with UC/CD Definitions
- Kevin [Advocacy] noted he has expanded from simply Ulcerative Colitis (UC) to include Crohn’s Disease (CD) definitions.
- The group [Agreement] acknowledged potential overlaps in treatments and code sets for UC and CD, emphasizing consistent definitions to support cross-comparisons.
- Information Gap: Some IBD phenotypes already exist in the OHDSI Library; Kevin plans to reconcile any redundant or conflicting definitions.
“Enhanced” Treatment Pathways
- Kevin [Advocacy] distinguished his approach from standard “pathway” methods. He is interested in capturing switching behaviors, durations, and sequence of therapies, beyond the simple donut plots in typical Atlas treatment pathways.
- Implicit Assumption: Atlas’s built-in treatment pathways may require customization or post-processing to fully capture medication timelines.
Transition from J&J Library to OHDSI Library
- Kevin [Advocacy] and Anna [Inquiry/Advocacy] discussed that while Kevin’s cohorts are mostly finalized in the J&J internal library, they still need official OHDSI Library IDs.
- Action Item: Kevin will list the relevant J&J definitions and submit them for inclusion in the OHDSI Library, or confirm they are already present.

Implicit Assumptions & Information Gaps

Assumption: Treatment duration and switch metrics can be accurately extracted from observational data. Actual usage patterns may vary by site.
Gap: Additional clarity is needed on how to handle combination therapies, dose changes, and overlapping treatments in the enhanced pathways.

Gowtham_Rao · February 7, 2025, 5:54pm

Phenotype Phebruary 2025 Office Hours – February 7, 2025

Topics

1. Meeting Goals & Agenda

1.1 Purpose of Office Hours
1.2 Proposed Flow & Tools

2. Tracking Progress & Identifying Gaps

2.1 Reviewing the 14 Studies
2.2 Updating the Progress Tracker
2.3 Ensuring Phenotypes & Clinical Descriptions Are Uploaded

3. Guidance & Lessons Learned on Clinical Descriptions

3.1 ChatGPT Usage & Limitations
3.2 Prompt Engineering Considerations
3.3 Differentiating Domain Types (Drug vs Procedure vs Condition)

4. Literature Search & Existing Phenotype Repositories

4.1 OHDSI Phenotype Library & Citing Prior Work
4.2 PubMed & Other Structured Searches
4.3 Validated Algorithms & Prior References

5. Building Cohort Definitions

5.1 Using Atlas Demo
5.2 Atlas Demo Record Counts (OHDSI Evidence Network)
5.3 Collaboration & Sharing Definitions
5.4 Logic and Strategy for Concept Set Development

6. Planning for Next Tuesday’s Meeting

6.1 Volunteer Demonstrations (Psychosis, IBD)
6.2 Agenda & Time Allocation
6.3 Action Items & Timeline

Topic 1: Meeting Goals & Agenda

During the February 7, 2025 Phenotype Phebruary Office Hours, the meeting opened with clarifications on its purpose: to serve as a loosely structured session for troubleshooting participants’ questions and ensuring ongoing progress. The immediate goals included finalizing clinical descriptions for 14 studies, populating the progress tracker, and preparing for concept set building and cohort construction. Organizers emphasized the importance of accurate updates in the tracker and timely sharing of final documents. They also covered practical lessons learned from using ChatGPT to accelerate drafting tasks, highlighting prompt engineering as a critical step. The meeting concluded with a roadmap for next steps, including concept set building, Atlas demonstrations, and strategic guidance for the entire community.

Speaker Positions ([Advocacy]/[Inquiry])

Azza ([Inquiry]): Guided participants to update the tracker, confirm clinical description status, and raised questions on Atlas use.
Gowtham ([Advocacy]): Encouraged systematic use of GenAI prompts and underscored the need for standardized processes (e.g., Atlas Demo).
Kevin ([Inquiry/Advocacy]): Shared ChatGPT experiences, emphasizing pros/cons and advocating for improved prompt engineering.

Implicit Assumptions & Information Gaps

Assumes all leads will finalize clinical descriptions by Tuesday’s meeting.
Requires clarity on adapting ChatGPT prompts to non–drug-induced conditions.
Expects participants to learn Atlas or request additional support if needed.

Topic 2: Tracking Progress & Identifying Gaps

A central focus was ensuring that all 14 studies were captured in the shared progress tracker. Participants noted that only four had fully documented their phenotypes and clinical descriptions, leaving ten incomplete. The group highlighted the importance of updating the tracker and uploading relevant files in designated folders for accurate measurement and collective troubleshooting. Some leads confirmed readiness to submit materials, while others needed extra time or assistance. By Tuesday’s community call, each study lead should finalize naming conventions and clinical descriptions to facilitate concept set building and cohort definitions. The discussion also emphasized standardization, avoiding duplicate entries, and maintaining schedule alignment.

Speaker Positions ([Advocacy]/[Inquiry])

Azza ([Inquiry]): Checked off who had populated the tracker, prompting others to ensure timely updates.
Tatsiana ([Inquiry]): Asked about file duplication and verification links.
Vlad & Masha ([Inquiry]): Confirmed pending uploads/finalization of phenotypes.

Implicit Assumptions & Information Gaps

Assumes all leads understand the naming/upload format.
Unclear if everyone knows the tracker’s location or uses consistent naming standards.
Relies on manual entry rather than an automated system.

Topic 3: Guidance & Lessons Learned on Clinical Descriptions

Participants exchanged firsthand experiences for creating clinical descriptions, focusing on ChatGPT. Kevin noted limitations like prompt length restrictions and overwritten text, while Gowtham emphasized prompt engineering—fine-tuning queries to avoid irrelevant references (e.g., “drug-induced” when describing general conditions). Recognizing that some phenotypes need minimal details (e.g., drugs/procedures), the group stressed adapting descriptions to each cohort context. Overall, they agreed large language models are valuable for rapid drafting but require careful review by clinicians or epidemiologists. Specialized prompts for target cohorts, outcomes, or drug classes were seen as especially helpful.

Speaker Positions ([Advocacy]/[Inquiry])

Kevin ([Inquiry/Advocacy]): Highlighted ChatGPT’s iterative editing issues and advocated for improved prompt design.
Azza & Gowtham ([Advocacy]): Encouraged using ChatGPT as an accelerator while reinforcing the importance of human validation.

Implicit Assumptions & Information Gaps

Prompt Adaptation & Engineering: Assumes users can modify prompts for varied clinical needs.
Validation Requirements: Final descriptions still need domain expert review.
Workflow Variation: Different domain types (drug, procedure, condition) may call for unique strategies.

Topic 4: Literature Search & Existing Phenotype Repositories

Participants outlined a workflow for leveraging prior work when building new phenotype definitions: (1) consult the OHDSI Phenotype Library, (2) check PubMed for validated algorithms (often ICD-9-based), and (3) adapt or translate these to modern coding standards (ICD-10/ICD-10-CM). While older references may lack detail or rely on outdated codes, participants underscored the value of citing them to strengthen credibility. The group emphasized structured reviews to locate any existing validated algorithms and the importance of carefully transitioning older approaches into current vocabularies.

Speaker Positions ([Advocacy]/[Inquiry])

Chris ([Inquiry]): Described a systematic approach (ODHSI Library → PubMed → synthesis).
Kevin ([Advocacy]): Noted challenges tracing older ICD-9 validations.
Azza & Gowtham ([Advocacy]): Encouraged structured reviews and reminded participants about available OHDSI tools.

Implicit Assumptions & Information Gaps

Continued Relevance of Old Algorithms: Some are poorly documented, hindering adaptation.
Standardization Tools: Converting ICD-9 algorithms to ICD-10 presumes enough detail is available.
Time Constraints: Deep systematic reviews may exceed the Phenotype Phebruary timeline.

Topic 5: Building Cohort Definitions

Moving from clinical descriptions to operational phenotypes, participants stressed Atlas—ODHSI’s platform for creating and sharing cohort definitions—and the OHDSI Evidence Network for global code usage counts. These record counts help identify frequently used vs. rare concepts for inclusion/exclusion. The conversation also highlighted easy cohort transfer between different Atlas instances, beneficial for organizations with internal Atlases. Demonstrations covered using the “shopping cart” to assemble concept sets, refining logic (broad vs. narrow definitions), and exporting final cohorts. Overall, a cohesive approach in Atlas was seen as key for reproducibility and efficient collaboration.

Speaker Positions ([Advocacy]/[Inquiry])

Gowtham ([Advocacy]): Showed Atlas Demo features, emphasizing record counts.
Azza ([Inquiry/Advocacy]): Stressed uniform cohort-building approaches for leads and contributors.
Kevin ([Inquiry]): Sought guidance on splitting IBD definitions (Crohn’s vs. Ulcerative Colitis).

Implicit Assumptions & Information Gaps

Familiarity with Atlas: Some leads may need extra training.
Ownership & Permissions: Private instances must handle local security.
Complex Logic: Certain phenotypes demand intricate logic (multiple concept sets, time windows).

Topic 6: Planning for Next Tuesday’s Meeting

To advance Phenotype Phebruary, participants finalized the agenda and action items for the next Tuesday community call. They will review which of the 14 studies have completed clinical descriptions and confirm that each phenotype is recorded in the tracker. Two volunteers (Tatsiana and Kevin) will present live demonstrations on building concept sets and defining cohorts (e.g., first-episode psychosis and IBD), each lasting about 10 minutes. Organizers urged any leads needing help with documents or the tracker to complete those tasks before Tuesday, emphasizing this call as a pivotal checkpoint for developing logic, identifying concept sets, and refining code lists. Ultimately, these demos will guide participants toward robust, evidence-based cohort definitions.

Speaker Positions ([Advocacy]/[Inquiry])

Azza ([Advocacy]): Urged leads to finalize documentation, coordinate demos, and update progress by Tuesday.
Kevin & Tatsiana ([Advocacy]): Agreed to demonstrate their phenotype-building process, highlighting real-world complexities.
Gowtham ([Inquiry/Advocacy]): Confirmed readiness to assist with demonstration logistics and Atlas usage.

Implicit Assumptions & Information Gaps

Adherence to Deadlines: Success of the demos depends on all leads completing their parts.
Technical Preparedness: Demos require stable setups and rehearsals.
Community Involvement: Participants should engage with or ask questions about Atlas if unfamiliar.

Gowtham_Rao · February 13, 2025, 12:38am

Phenotype Phebruary 2025 Office Hours – February 12, 2025

Topics

Type 2 Diabetes Phenotype Development
- Approaches to definition: diagnosis codes only vs. inclusion of lab values and medications
- Sensitivity versus specificity trade-offs
- Consideration of insulin use and alternate entry criteria
- Comparison of multiple cohort variants and their impact on incidence rates
Diabetic Retinopathy Screening Cohort Design
- Differentiating in‐office, telemedicine, and AI-based screening outcomes
- Analysis strategies: earliest event versus repeated events
- Implementation of washout periods
- Challenges with provider specialty mapping and reliance on specific CPT codes
- Custom SQL versus standard Atlas cohort diagnostics
Antipsychotic Treatment Cohort and Censoring Strategy
- Exclusion of patients on other antipsychotics prior to index
- Censoring rules for patients who switch medications post-index
- Impact of censoring on outcome incidence and potential biases
- Balancing strict (monotherapy) versus broader real-world cohorts
Technical Implementation in Atlas and Query Logic
- Indexing criteria: using visit start dates versus condition start dates
- Handling of visits with embedded diagnoses and setting proper time constraints
- SQL logic for defining event start/end dates and cohort entry/exit
Naming Conventions and Infrastructure for the Phenotype Library
- Establishing a standard naming scheme (e.g., prefixes like PP25 or F-25)
- Integration with Odyssey forums and GitHub for clinical descriptions
- Best practices for updating and maintaining phenotype definitions
Project Coordination, Communication, and Next Steps
- Progress tracking and proactive outreach to study leads
- Scheduling of future office hours and follow-up meetings
- Volunteer support and addressing any blockers in phenotype development

Topic 1 – Type 2 Diabetes Phenotype Development:

In this segment, the group deliberated on the optimal definition for a type 2 diabetes phenotype. Cindy Cai initiated the discussion by outlining the need for multiple phenotype variants for type 2 diabetes—one relying solely on diagnosis codes and others that incorporate lab values (e.g., high glucose measurements) and medication data (including or excluding insulin). This reflects a core tension: the trade‐off between sensitivity (capturing as many potential cases as possible) and specificity (avoiding misclassification by including only confirmed cases).

For type 2 diabetes phenotype development, the group discussed two primary approaches: one using diagnosis codes only and another that includes additional criteria such as lab measurements and medication exposures. While a sensitive definition is favored to ensure comprehensive patient capture for subsequent retinopathy screening, concerns were raised about potential specificity losses—especially regarding the inclusion of insulin. The consensus is to build and compare multiple phenotype variants and use cohort diagnostics to decide which definition best meets study requirements.

Anna Ostropolets advocated for a more sensitive approach, suggesting that when the primary focus is on downstream diabetic retinopathy screening, it is preferable to “capture everybody” by allowing alternate entry criteria such as lab values and medication records. Her [Advocacy] position emphasizes that maximizing sensitivity is critical for ensuring that the subsequent screening outcomes are not biased by an overly narrow diabetes definition. Conversely, Evan Minty raised questions regarding the inclusion of insulin—pointing out that, for many type 2 patients, the use of insulin may be transient or secondary to other treatments. His inquiry highlights a potential pitfall: including insulin might inadvertently lower specificity by capturing patients whose treatment patterns do not represent typical type 2 diabetes management.

Gowtham Rao further enriched the discussion by noting that, in real-world datasets, some patients may not have a diagnosis code even though lab and medication data indicate diabetes. This observation underlines the importance of using a dual approach to balance sensitivity and specificity. Implicit in these discussions is the assumption that the ideal phenotype should reflect clinical reality across diverse institutions, yet a gap remains regarding standardized thresholds (e.g., how many lab measurements qualify as “repetitive” enough to confirm diabetes).

Topic 2 – Diabetic Retinopathy Screening Cohort Design:

In this segment, the group focused on designing a cohort for diabetic retinopathy screening. Cindy Cai framed the discussion by emphasizing that retinopathy screening is inherently a repeated event rather than a one‐time occurrence. The intent is to capture all screening events over time rather than solely relying on the earliest event. This approach is important because the recommended clinical practice is that patients with diabetes should receive screening at least once a year.

For diabetic retinopathy screening, the group discussed designing a cohort that recognizes the recurring nature of screenings. The debate centered on whether to index solely on the earliest event or to capture all screening events, with a preference for the latter to align with annual screening recommendations. Key challenges include handling variable provider specialty mappings and potentially using custom SQL to integrate appropriate washout periods. The plan is to validate different cohort definitions using cohort diagnostics, ensuring that the final design accurately reflects clinical practice.

Key points discussed include:

Repeated Event vs. Earliest Event Approach:
Cindy highlighted the need to differentiate between the first screening event and subsequent screenings. She proposed analyzing the pattern of screenings over multiple years (e.g., once per year) to assess adherence to clinical guidelines. This naturally leads to a choice between indexing on the earliest event versus capturing all events during a defined time at risk.
Implementation Challenges:
The group noted that while the standard package supports basic cohort diagnostics, answering nuanced questions—like the impact of repeated screening events—may require custom SQL queries. These queries would allow for the integration of a washout period (for example, excluding a patient from being “at risk” for another screening within 365 days of the prior event).
Provider Specialty and CPT Codes:
A challenge emerged regarding provider specialty mapping. In many datasets, especially within the OMOP framework, the specialty of the provider (e.g., ophthalmologist, optometrist) may not be consistently mapped. Consequently, using specific CPT codes for in-office screenings might be insufficient for capturing all relevant events. This led to a discussion about combining office visit codes with condition codes for visual system disorders as a potentially more sensitive method.
Empirical Evaluation:
The speakers agreed that the optimal cohort design should be validated through cohort diagnostics. Running multiple versions of the cohort—each with slightly different criteria—will help determine which definition best captures the intended patient population and aligns with the clinical guidelines for annual screening.

Topic 3 – Antipsychotic Treatment Cohort and Censoring Strategy:

For the antipsychotic treatment cohort, the workgroup discussed strategies to isolate patients on a single antipsychotic. The plan involves excluding patients who have taken other antipsychotics prior to the index date and censoring individuals at the time they switch or add another medication post-index. While this approach helps maintain a homogeneous cohort, concerns were raised about the risk of informative censoring—since treatment changes might be driven by clinical factors that relate to the outcomes. The group recognized this trade-off and underscored the need for careful analysis when interpreting results.

Key points include:

Exclusion Prior to Index Date:
Participants agreed that patients with exposure to antipsychotics other than the target treatment should be excluded before the index date. This step ensures that the study population begins as a monotherapy group, reducing confounding factors that could arise from prior polypharmacy.
Censoring Post-Index Date:
The conversation then shifted to handling patients who switch or add antipsychotic medications after the index date. Anna Ostropolets recommended implementing censoring at the point of switching to maintain the purity of the treatment cohort. In other words, if a patient starts another antipsychotic after initiating the target drug, the patient’s follow-up should end at that moment.
Trade-Offs and Informative Censoring:
Andrew Williams raised an important methodological consideration: censoring patients who switch treatments might introduce informative censoring. This means that the reasons for switching (e.g., side effects, lack of effectiveness) could be related to the outcomes of interest, potentially biasing the results. The discussion acknowledged this trade-off, with some group members noting that while stricter censoring maintains a cleaner cohort, it might limit generalizability if many patients switch medications in real-world settings.
Balancing Monotherapy with Real-World Practice:
The group recognized the tension between an ideal monotherapy cohort—which can closely resemble a clinical trial setting—and the variability inherent in real-world treatment patterns. The consensus was that while excluding patients with post-index changes might lead to a loss of data (and possibly informative censoring), it remains a common approach to ensure that observed outcomes are attributable to the treatment under study.

Topic 4 – Technical Implementation in Atlas and Query Logic:

For technical implementation in Atlas, the team agreed to use the visit start date as the index, with conditions required to occur between the visit’s start and end dates. Given that condition end dates are often missing—especially for outpatient records—the query logic defaults to the condition start date. Although standard Atlas functionality covers most needs, the group acknowledged that custom SQL might be necessary to handle more complex cases and ensure the cohort definition accurately reflects clinical intent.

The discussion centered on several key aspects:

Indexing Criteria and Date Logic:
Participants debated whether the index date should be based on the visit start date or the condition start date. The consensus leaned toward using the visit start date as the anchor, ensuring that any associated condition occurrence falls within the boundaries of that visit. This means the query logic should enforce that the condition’s start date is on or after the visit’s start date and before the visit’s end date.
Handling Condition Occurrence Dates:
A significant technical point was the fact that the condition occurrence table in OMOP may not always populate the condition end date—especially in outpatient data. As a result, Atlas SQL typically defaults to using the condition start date when an end date is absent. The group highlighted that this approach ensures that the condition is appropriately linked to the visit period, even if the full temporal span isn’t available.
Query Construction in Atlas:
The conversation covered how to set up constraints within Atlas. For example, one can configure the inclusion criteria so that the condition occurrence must start within a specified interval relative to the visit occurrence. This may involve adding additional constraints (such as “0 days before and all days after” the visit start date) to capture the full intended window. There was also mention of “inverse logic” and concatenating condition dates where necessary, emphasizing that the SQL logic should ultimately reflect the clinical rationale behind the cohort.
Customization and Data Source Variability:
Several participants noted that while a standard query logic can be defined, adjustments may be necessary depending on the specifics of the data source—such as how visits and conditions are recorded. The group acknowledged that custom SQL might be required for certain questions, particularly when standard Atlas functionality doesn’t fully accommodate complex scenarios.

Topic 5 – Naming Conventions and Infrastructure for the Phenotype Library:

The workgroup discussed the need for standardized naming conventions and infrastructure enhancements for the phenotype library. Proposed conventions include using a prefix (such as “F-25” or “PP25”) to denote the phenotype development period followed by the study name. This system, combined with brief clinical descriptions, aims to improve consistency, discoverability, and ease of collaboration across both Odyssey forums and the GitHub repository. Although strict enforcement may be challenging, adopting these guidelines is expected to enhance the overall management and utility of phenotype definitions.

Key points include:

Standardized Naming Conventions:
Participants proposed using a consistent prefix to denote the phenotype development initiative (e.g., “F-25” or “PP25” representing “Phenotype February 2025”) followed by an underscore and the study or cohort name. This convention would make it easier to identify, organize, and retrieve phenotype definitions across the network.
Integration with Existing Infrastructure:
The discussion emphasized linking these standardized names to both the Odyssey forums (where clinical descriptions and discussions reside) and the GitHub-hosted phenotype library. This dual-system approach leverages the public and searchable nature of the forums while maintaining version-controlled records on GitHub.
Collaboration and Documentation:
There was consensus on the importance of not only naming the phenotypes consistently but also including brief clinical descriptions. These descriptions clarify the clinical intent and study context without requiring extensive documentation. Such metadata will be essential for both study leads and volunteers to understand the nuances of each phenotype.
Flexibility and Enforcement:
While a recommended naming scheme was discussed, it was acknowledged that strict enforcement might be challenging given the volunteer-driven nature of the initiative. However, standard guidelines would enhance interoperability and make it easier for data partners and study leads to locate and run the relevant cohorts.

Topic 6 – Project Coordination, Communication, and Next Steps:

For project coordination and next steps, the workgroup agreed to maintain robust communication through scheduled office hours and proactive email follow-ups to track study progress. Volunteers are encouraged to assist study leads, and standard naming and documentation practices will be enforced to ensure consistency across the phenotype library. These efforts aim to streamline collaboration and ensure that all studies progress efficiently toward their targets.

Key discussion points include:

Progress Tracking and Follow-Up:
Anna Ostropolets noted that while some study leads are advancing well with their phenotypes, there is a lack of visibility into the status of the other half of the group. The plan is to proactively reach out via email—especially on Thursday—to assess each study’s progress and provide assistance as needed.
Scheduled Meetings and Office Hours:
The team confirmed that regular office hours will continue (with the next session scheduled for Friday at 9:00 AM), ensuring ongoing support and timely updates. This recurring communication channel is critical for addressing blockers and sharing updates across studies.
Volunteer Engagement and Resource Sharing:
The group emphasized the importance of volunteer contributions and encouraged study leads to connect with volunteers who have relevant expertise. This collaborative spirit is designed to maximize the effective use of available skills and improve overall study outcomes.
Infrastructure and Standardization:
Alongside coordination efforts, there was an emphasis on aligning study outputs with the established infrastructure—such as ensuring that phenotype definitions are properly named, linked in the progress tracker, and available in both the Odyssey forums and GitHub. Clear documentation and standard naming conventions were highlighted as essential for facilitating both internal review and external dissemination.

Christian_Reich · February 13, 2025, 1:45am

Somebody needs to check on chatGPT. And the spineless thing will just say “I am so sorry. Of course you are right.” when it is pointed out.

Gowtham_Rao · February 13, 2025, 2:12am

https://athena.ohdsi.org/search-terms/terms/4092388

Gowtham_Rao · February 14, 2025, 11:05pm

Phenotype Phebruary 2025 Office Hours – February 14, 2025

1. Project Overview & Code Definition Process

Context & Deadline:
The team is under a strict February 14 deadline to finalize and extract core phenotype definitions. These definitions are being reviewed, renamed, and packaged as JSON files for deployment (e.g., in Strategus) and diagnostic testing by Johnson and Johnson. Outputs will eventually be shared on results.ohdsi.org for broader community access.
Key Steps:
- Review and properly name existing code definitions (likely developed within Atlas).
- Extract definitions and run diagnostic tests.
- Update the progress tracker and coordinate next actions.
Speaker Highlights:
- Gowtham Rao: Sets clear objectives to “grab all the core definitions.”
- Anna Ostropolets: Confirms the deadline and extraction process.
- Team Input: Supports and clarifies immediate next steps.
Assumptions & Gaps:
- Assumes familiarity with current definitions and processes.
- Lacks discussion on technical challenges during JSON extraction.

2. Rheumatic Disease Phenotyping

Overview:
The team is finalizing phenotype definitions for rheumatic diseases, focusing on patient identification and medication outcomes. While most definitions are near-final, some (especially those involving steroid use) require further review and expert input.
Key Actions:
- Refine definitions with feedback from clinical experts across infectious disease, ophthalmology, and rheumatology.
- Address complexities in cases involving steroids and cancer outcomes.
Speaker Highlights:
- Christopher Mecoli: Provides status updates and notes that only select cases need further input.
- Clinical Experts: Their input will be integrated to ensure accuracy and clinical relevance.
Assumptions & Gaps:
- Assumes that the majority of definitions are robust.
- Specific strategies for addressing complex cases remain to be detailed.

3. Vision Screening Phenotyping & Analysis

Overview:
For pediatric vision screening, the team is defining a cohort using well-child visits as a proxy, while considering stratification by age and calendar periods despite some data limitations (e.g., for children under one).
Key Actions:
- Determine whether to use well-child visits exclusively or a broader set of outpatient visits.
- Incorporate age restrictions and temporal stratification to facilitate trend analysis.
Speaker Highlights:
- Michelle Hribar: Raises questions on cohort definition and data limitations.
- Gowtham Rao & Anna Ostropolets: Guide on structuring the phenotype and refining concept sets.
Assumptions & Gaps:
- Assumes well-child visits are a reliable proxy despite potential data capture challenges.
- Lacks a clear validation plan against actual vision screening records.

4. Osteoporosis Phenotyping & Criteria

Overview:
The osteoporosis phenotype is being refined using a “two out of three” logic based on diagnosis, medication, and fragility fractures—ensuring traumatic fractures (especially in patients under 50) are excluded.
Key Actions:
- Establish nested logic that includes any two of the three criteria.
- Align the clinical description with the operational model in Atlas.
Speaker Highlights:
- Chen Yanover: Questions the clarity of the “two out of three” approach.
- Gowtham Rao & Anna Ostropolets: Emphasize logical clarity and proper exclusion of traumatic fractures.
Assumptions & Gaps:
- Assumes current definitions and concept sets are robust.
- More detail is needed on handling borderline cases and potential data inconsistencies.

5. Medication Concept Set Challenges

Overview:
The team is refining medication concept sets for the osteoporosis phenotype by addressing classification challenges—favoring RxNorm ingredients over SPL for capturing oral residronic acid formulations and ensuring clear differentiation between monotherapy and combination therapies.
Key Actions:
- Reassess the medication grouping logic to align with clinical intent.
- Validate that all descendant concepts (various formulations) are appropriately captured without over-specification.
Speaker Highlights:
- Anna Ostropolets: Recommends against using SPL and advocates for RxNorm-based classification.
- Chen Yanover & Gowtham Rao: Stress the need for streamlined logic that matches the intended clinical grouping.
Assumptions & Gaps:
- Assumes current mappings between classification systems are robust.
- More discussion is needed on reconciling data differences across sites.

6. Operational Next Steps & Diagnostics

Overview:
The team is transitioning to operational readiness by finalizing core phenotype definitions, updating the progress tracker, and standardizing cohort naming. Upcoming steps include running diagnostic tests internally and with Johnson and Johnson.
Key Actions:
- Extract definitions as JSON files and update the progress tracker.
- Standardize naming conventions (using square brackets with study abbreviations).
- Coordinate diagnostic runs and follow-up meetings to address any data inconsistencies.
Speaker Highlights:
- Gowtham Rao: Urges quick finalization and execution of diagnostic tests.
- Anna Ostropolets: Highlights the importance of consistent naming and data validation.
- Oleg Zhuk & Evan Minty: Ensure operational steps are practical given the current infrastructure.
Assumptions & Gaps:
- Assumes the progress tracker accurately reflects the status of each definition.
- Does not elaborate contingency plans for unexpected diagnostic discrepancies.

OHDSI Phenotype Phebruary and workgroup updates

Meeting Recap: November 22, 2024 AI generated

Workshop Reflections

Objective Diagnostics

Inappropriate Medication Definitions

January Meeting Plans

Probabilistic Phenotyping

OKRs and Phenotype February Planning

Atlas Deployment and Collaboration

Action Items

Next Meeting

Agenda:

Albert Prats Uribe’s presentation and the ensuing discussion, capturing the key points and details as expressed during the meeting:

1. Overview and Rationale

2. Detailed Phenotyping Workflow

3. Discussion and Clarifications

Discussion on the OKRs for 2025:

Open Q&A and Follow-Up Discussions

How We’ll Achieve This

Get Involved!

Phenotype Phebruary Overview and Timeline.

Phenotyping Process and Methodology.

Logistics, Coordination, and Tracking

Contributor Engagement and Task Allocation.

Meeting Scheduling and Communication Protocols.

Q&A, Clarifications, and Process Adjustments.

1. File Management & Team Coordination

2. Searching & Reusing Existing Phenotypes

3. Phenotype Development & Validation Approaches

4. Discussion on Pediatric Vision Screening Use Case

5. Discussion on Ulcerative Colitis

Implicit Assumptions & Information Gaps

Implicit Assumptions & Information Gaps

Implicit Assumptions & Information Gaps

Implicit Assumptions & Information Gaps

Implicit Assumptions & Information Gaps

Phenotype Phebruary 2025 Office Hours – February 7, 2025

Topics

1. Meeting Goals & Agenda

2. Tracking Progress & Identifying Gaps

3. Guidance & Lessons Learned on Clinical Descriptions

4. Literature Search & Existing Phenotype Repositories

5. Building Cohort Definitions

6. Planning for Next Tuesday’s Meeting

Topic 1: Meeting Goals & Agenda

Topic 2: Tracking Progress & Identifying Gaps

Topic 3: Guidance & Lessons Learned on Clinical Descriptions

Speaker Positions ([Advocacy]/[Inquiry])

Implicit Assumptions & Information Gaps

Topic 4: Literature Search & Existing Phenotype Repositories

Speaker Positions ([Advocacy]/[Inquiry])

Implicit Assumptions & Information Gaps

Topic 5: Building Cohort Definitions

Speaker Positions ([Advocacy]/[Inquiry])

Implicit Assumptions & Information Gaps

Topic 6: Planning for Next Tuesday’s Meeting

Speaker Positions ([Advocacy]/[Inquiry])

Implicit Assumptions & Information Gaps

Phenotype Phebruary 2025 Office Hours – February 12, 2025

Topics

Topic 1 – Type 2 Diabetes Phenotype Development:

Topic 2 – Diabetic Retinopathy Screening Cohort Design:

Topic 3 – Antipsychotic Treatment Cohort and Censoring Strategy:

Topic 4 – Technical Implementation in Atlas and Query Logic:

Topic 5 – Naming Conventions and Infrastructure for the Phenotype Library:

Topic 6 – Project Coordination, Communication, and Next Steps:

Phenotype Phebruary 2025 Office Hours – February 14, 2025

1. Project Overview & Code Definition Process

2. Rheumatic Disease Phenotyping

3. Vision Screening Phenotyping & Analysis

4. Osteoporosis Phenotyping & Criteria

5. Medication Concept Set Challenges

6. Operational Next Steps & Diagnostics

Meeting Recap: November 22, 2024
AI generated