Phenotype Phebruary Planning
January 31st 2025
Leads: @Azza_Shoaibi @aostropolets @Gowtham_Rao
1. Phenotype Phebruary Overview and Timeline
a. Purpose of a dedicated phenotyping month and introduction of the 14 study proposals
b. Overall timeline and weekly breakdown (Week 1: clinical description; Week 2: concept set and logic building; Weeks 3–4: evaluation and iteration)
c. Roles of study leads versus community participants
2. Phenotyping Process and Methodology
a. Clinical description step (including use of a Gen AI prompt)
b. Creation of concept sets and phenotype logic development
c. Evaluation of phenotypes using tools (e.g., cohort diagnostics)
d. Reusing existing phenotype definitions and leveraging the phenotype library
3. Logistics, Coordination, and Tracking
a. Folder and repository structure within Teams (study-level vs. phenotype-level organization)
b. Creation and management of a master tracker (Excel sheet) for progress and cohort ID tracking
c. Ensuring access for all study leads and working group members
4. Contributor Engagement and Task Allocation
a. Soliciting contributors via sign-up forms and clarifying required skills/commitment
b. Defining responsibilities between study leads and collaborators
c. Allocating specific tasks (clinical description, cohort development, evaluation, etc.)
5. Meeting Scheduling and Communication Protocols
a. Planning recurring meetings (Tuesday demo, Wednesday study lead coordination, Friday working group calls)
b. Setting up and managing Teams invitations and shared calendars
c. Coordinating forum posts, meeting announcements, and follow-up communications
6. Q&A, Clarifications, and Process Adjustments
a. Addressing questions on phenotype numbers and definitions (e.g., major surgery, antibiotics)
b. Clarifying process steps and timelines for iterative improvement
c. Discussing adjustments based on feedback and evolving needs
Phenotype Phebruary Overview and Timeline.
The conversation opens with an explanation of the purpose behind dedicating Phebruary to phenotype development. @aostropolets [Advocacy] outlines that while previous years lacked a specific focus, the current year is structured around 14 study proposals contributed by community members. The goal is to support study leads (e.g., @zhuk for an AKI study) and collaborators by providing a bounded time frame and a clear set of deliverables. The overall timeline is broken into a four-week plan:
-
Week 1 – Clinical Description:
A session is planned to discuss clinical descriptions. Participants are expected to use a Gen AI prompt (developed by @Gowtham_Rao ) that extracts necessary information to form a phenotype. This step also includes a literature search and an exploration of existing phenotype definitions (for example, checking for pre-existing definitions of AKI or obesity management).
-
Week 2 – Concept Set and Logic Building:
The second week focuses on creating the concept sets and building the logical framework of the phenotype. Here, many participants already familiar with tutorial work on concept sets building are expected to contribute. The study leads are expected to be fully engaged.
-
Weeks 3–4 – Evaluation and Iteration:
The final two weeks are dedicated to evaluating the developed phenotypes using tools such as cohort diagnostics. Iterations and refinements will be made based on these diagnostics. There is also mention of showcasing additional tools and validation approaches during these weeks.
-
@aostropolets [Advocacy]: Clearly explains the overall process, emphasizing the structured timeline and how it aligns with the community’s prior experiences. Her narrative is instructive and guiding, setting the stage for both study leads and contributors.
-
@Gowtham_Rao [Inquiry/Advocacy]: Inquires about specifics (e.g., the number of unique phenotypes required) and supports the rationale by referencing templates from the existing phenotype library.
-
@Azza_Shoaibi [Advocacy]: Reinforces the process details and clarifies that the structured timeline is strictly for Phebruary, while also assuring that post-Phebruary modifications are possible if necessary.
-
@zhuk [Inquiry]: Raises questions regarding the timeline and potential flexibility, particularly concerning iterative updates or late-stage additions.
Implicit Assumptions and Information Gaps:
- Assumption: All study leads and contributors are presumed to be familiar with the phenotyping process and the available resources (e.g., phenotype libraries and cohort diagnostics).
- Information Gap: While the timeline is clear, there is an implicit assumption that data will be available (for instance, the J&J data is mentioned as a fallback for cohort diagnostics). However, the contingency for participants without immediate data access is not fully explored.
- Assumption on Reusability: It is assumed that many of the phenotypes (such as those for MI, stroke, or common drug exposures) already exist and can be reused. The process for validating and adapting these pre-existing phenotypes is touched upon but not fully detailed.
The structured approach for Phenotype Phebruary is designed to create a sense of urgency and clarity among participants. The division into weekly milestones helps to focus efforts and provides measurable checkpoints (e.g., completion of clinical descriptions by the end of week one). There is a strong reliance on existing resources (such as the phenotype library), which streamlines work but may also limit innovation if not revisited critically. The dialogue reflects a collaborative dynamic, with participants openly discussing potential issues (like data availability and phenotype granularity) and affirming that iterative refinement is both expected and supported. The approach seems tailored to balance guided instruction (for less experienced members) with flexibility for seasoned participants.
In the recent Phenotype Phebruary planning session, the working group outlined a structured four-week process dedicated to phenotype development. The process begins with a clinical description phase using Gen AI prompts and literature searches, followed by a week dedicated to building concept sets and phenotype logic. The final two weeks focus on evaluating and refining these phenotypes with diagnostic tools. The session emphasized support for study leads and reusing existing phenotypes where possible, while also ensuring that contingency plans are in place for data limitations. The overall goal is to have ready-to-use phenotypes by the end of Phebruary, providing a clear framework for the community to build upon.
Phenotyping Process and Methodology.
The discussion on phenotyping methodology centers on the stepwise approach that underpins the entire Phenotype Phebruary initiative. Participants describe a three-step process: first, creating a clinical description; second, developing concept sets and the logical framework for the phenotype; and third, evaluating the phenotypes using diagnostic tools. The process starts with leveraging a Gen AI prompt to extract key clinical information, followed by targeted literature reviews and comparisons with existing phenotype definitions (e.g., for conditions like AKI or obesity management). In the subsequent phase, the group focuses on constructing concept sets—drawing on both pre-existing tutorials and community expertise—to form the logical constructs of the phenotypes. Finally, the evaluation phase involves running cohort diagnostics and iterating on the phenotype definitions, with additional tools demonstrated later in the month to further validate the results.
-
@aostropolets [Advocacy]: Establishes the overall process by detailing the weekly breakdown and emphasizing the importance of a structured clinical description. Her instructions set the foundation for understanding the subsequent steps in the phenotyping process.
-
@Gowtham_Rao [Inquiry/Advocacy]: Contributes by questioning and clarifying aspects of phenotype creation, such as the use of templates from the existing phenotype library and the differentiation between drug exposures and condition phenotypes. His input reinforces the notion that many phenotypes can be templated, though some require bespoke development.
-
@Azza_Shoaibi [Advocacy]: Reiterates the need for a clear, methodical approach—stressing that the clinical description should inform whether a phenotype needs to be built from scratch or can be adapted from existing resources. Azza also clarifies the process for concept set development and evaluation, ensuring that both experienced and new members understand the intended workflow.
-
@zhuk [Inquiry]: Raises questions regarding the flexibility of the process and timing, particularly around iterative updates and whether additional phenotypes can be incorporated later. His contributions highlight the need for clear boundaries and timelines while also acknowledging that some flexibility will be maintained.
-
Assumption: It is presumed that all participants have at least a basic familiarity with using concept set tutorials and the Gen AI tools that will support clinical description.
-
Information Gap: While the methodology emphasizes reusing existing phenotypes, there is limited discussion on the criteria for determining when a pre-existing phenotype is “good enough” versus when it needs to be rebuilt from scratch.
-
Assumption: There is an underlying expectation that the iterative evaluation phase (using cohort diagnostics) will be sufficient to identify and correct any deficiencies in the phenotype definitions.
The methodology is designed to streamline phenotype development by dividing the process into discrete, manageable steps. The structured approach not only facilitates collaboration among participants with varying levels of expertise but also encourages the reuse of validated resources, which can save time and reduce redundancy. However, the discussion suggests a tension between templated approaches and the need for customization, particularly in complex cases such as major surgery or specific drug formulations. The iterative evaluation phase is critical, as it provides a built-in mechanism for quality control, but success in this phase hinges on the availability of appropriate data and the effective use of diagnostic tools. Overall, the process is both rigorous and flexible—enabling rapid progress while allowing room for adjustments based on real-world findings.
The Phenotyping Process and Methodology session outlined a structured, three-phase approach for developing phenotypes. The process begins with creating detailed clinical descriptions using Gen AI prompts and literature reviews, followed by constructing concept sets and logical frameworks. The final phase focuses on evaluating these phenotypes through cohort diagnostics and iterative refinements. Participants emphasized reusing existing phenotype templates where applicable, while also discussing criteria for developing new definitions when necessary. The approach is designed to balance standardized procedures with the flexibility to address complex or unique cases.
Logistics, Coordination, and Tracking
The discussion on logistics, coordination, and tracking centers on the administrative and technical infrastructure needed to manage the Phenotype Phebruary initiative. Participants deliberate on how to structure shared folders and repositories within Microsoft Teams, debating whether organization should be study-centric or phenotype-centric. There is a consensus that a master tracker (an Excel sheet) should be created to consolidate the list of phenotypes from the initial proposals—transforming Patrick’s original data into a long-form, one-row-per-phenotype format. This tracker is intended to record progress across key steps (e.g., clinical description, cohort development, evaluation) and to track the eventual cohort IDs. Additionally, ensuring that all study leads and community contributors have the necessary access to working group channels is a priority. Discussions also cover how to manage the logistics of version control (e.g., public Atlas versus secured Atlas) and the subsequent migration of completed artifacts into the phenotype library.
-
@Azza_Shoaibi [Advocacy]: Leads the logistics discussion by emphasizing the immediate need for a structured tracker and clear folder organization. Azza provides detailed instructions on how to consolidate and monitor progress using the tracker, and outlines the requirements for data tracking (e.g., cohort IDs).
-
@Gowtham_Rao [Inquiry/Advocacy]: Supports the conversation by probing how to best organize the folders (study-level vs. phenotype-level) and clarifies the importance of having unified tracking for phenotypes, particularly regarding shared definitions and reusability.
-
@aostropolets [Advocacy]: Contributes by highlighting the need for study leads to verify and, if necessary, update phenotype labels in the master tracker. Anna also discusses the coordination of access permissions and stresses the eventual transfer of finalized cohorts into the phenotype library or GitHub repositories.
-
Lana Shubinsky [Inquiry]: Provides input on the technical aspects of folder organization and meeting logistics, seeking clarification on how best to set up recurring Teams meetings and ensure consistent access for all relevant members.
-
Assumption: It is assumed that transforming Patrick’s list into a detailed, long-form tracker will be straightforward and that all study leads have the familiarity to verify and annotate their phenotype labels accordingly.
-
Information Gap: The discussion hints at potential technical challenges—such as coordinating access for external study leads or managing the integration between public Atlas IDs and the secured Atlas—but these challenges are not fully resolved in the conversation.
-
Assumption: There is an underlying expectation that the chosen platform (Microsoft Teams and associated tools) is sufficiently robust to handle the tracking and collaborative tasks without major technical issues.
The logistics discussion is a crucial backbone for the Phenotype Phebruary initiative. The group is focused on establishing a clear, centralized system for tracking progress and managing documents, ensuring that every phenotype is accounted for and accessible to all collaborators. The decision to use a master tracker as a central repository of information highlights the need for transparency and real-time updates in a collaborative, multi-stakeholder environment. However, the conversation also reveals the complexities of integrating multiple platforms (Teams, Atlas, GitHub) and ensuring all participants, including external study leads, have the appropriate access and understanding. This coordination will be key to maintaining momentum throughout the initiative and ensuring that administrative challenges do not impede scientific progress.
In the logistics and coordination session, the group agreed to create a master tracker—an Excel-based tool—to consolidate and monitor progress for each phenotype. The tracker will record key milestones such as clinical descriptions, cohort development steps, and final cohort IDs, ensuring clear oversight across the initiative. Discussions also focused on establishing an effective folder structure within Teams, debating whether to organize by study or phenotype, and ensuring all study leads have access to the necessary resources. This robust tracking and coordination framework is intended to streamline progress and facilitate the eventual migration of completed phenotypes into the centralized library.
Contributor Engagement and Task Allocation.
This segment of the conversation centers on strategies for actively engaging contributors and clearly assigning tasks within the Phenotype Phebruary initiative. The discussion outlines the need to solicit volunteer participation using structured methods such as a sign-up form or Google Form, where potential contributors can indicate their skill sets (e.g., clinical description, concept set building, evaluation) and time commitment. The aim is to build a comprehensive pool of participants—including both study leads and community members—to ensure that every aspect of the phenotype development process is covered. The conversation also clarifies that not every contributor is required to complete all steps; instead, individuals can focus on specific tasks aligned with their expertise. Furthermore, responsibilities are delineated such that study leads are tasked with overall phenotype oversight and validation, while collaborators assist with defined components, ensuring a distributed workload and collaborative synergy.
-
@Azza_Shoaibi [Advocacy]:
Azza leads the discussion by emphasizing the immediate need for a formal contributor sign-up process. She suggests that a form should capture critical details such as email address, time commitment, access to data, and specific skills relevant to the phenotyping process. Her guidance aims to streamline task allocation and ensure that each step in the process—from clinical description to evaluation—is adequately staffed.
-
@Gowtham_Rao [Inquiry/Advocacy]:
Gowtham contributes by discussing the importance of pooling both new volunteers and those already known to the group (e.g., from previous collaborations or responses to Patrick’s earlier call). He raises the idea of leveraging the existing pool of study leads who already have a vested interest in their respective phenotypes.
-
@aostropolets [Advocacy]:
Anna underscores the necessity for study leads to validate and update phenotype labels, which will later feed into the master tracker. She emphasizes that clear communication of roles and responsibilities—both for study leads and the additional contributors—is essential for the smooth progression of the work.
-
Implicit Positioning:
There is a shared recognition that while the initiative must harness collective expertise, clear delineation of tasks is needed to avoid duplication and ensure accountability. Contributors are expected to volunteer based on their strengths, thereby optimizing the collaborative effort.
-
Assumption: The approach assumes that contributors, once engaged through the sign-up process, will commit to specific tasks and that their skill levels will align with the needs of the project.
-
Information Gap: While the mechanism for collecting contributor details is well described, there is less clarity on how conflicts in task allocation (e.g., overlapping interests or skill mismatches) will be managed.
-
Assumption on Flexibility: It is implied that task assignments may evolve as the project unfolds, yet the process for revising or reallocating responsibilities if needed is not explicitly defined.
The contributor engagement and task allocation strategy is designed to harness the diverse skills of the working group while maintaining clear accountability. By utilizing a structured sign-up process, the initiative aims to capture essential information that will inform subsequent task assignments and ensure a balanced workload. This method fosters a participatory environment where study leads receive the necessary support while also opening the door for new contributors to bring fresh insights. However, success depends on clear communication channels and the effective management of the contributor pool, especially as task requirements evolve throughout the initiative.
In the Contributor Engagement and Task Allocation session, the working group outlined plans to streamline volunteer participation by launching a sign-up form where interested members can indicate their skills, time commitment, and access to data. This structured approach is intended to build a diverse pool of contributors who will support various stages of the phenotype development process—from clinical description to evaluation—while study leads maintain overall responsibility for their respective projects. The strategy emphasizes clear role delineation to prevent overlap and ensure effective collaboration across the initiative.
Meeting Scheduling and Communication Protocols.
This segment focuses on how the team plans to coordinate their meetings and manage communication throughout the Phenotype Phebruary initiative. The conversation addresses potentially setting up recurring meetings—including Tuesday demos, Wednesday coordination calls for study leads, and Friday working group sessions—to ensure consistent progress. The group discusses the logistics of using Microsoft Teams for these meetings, including the creation of shared links, calendar invitations, and distribution lists. Key points include determining the optimal meeting times (with some debate over Wednesday’s call start times) and ensuring that all study leads and contributors are added to the relevant Teams channels. There is also consideration given to integrating these meeting schedules with the overarching project tasks, such as reviewing phenotype labels and discussing logistics updates.
-
@aostropolets [Advocacy]:
Anna emphasizes the need for clear communication, ensuring that study leads are informed and have access to the meetings. She suggests using the workgroup’s communication channels to disseminate meeting invitations and agenda details effectively.
-
@Azza_Shoaibi [Advocacy]:
Azza plays a leading role in outlining the specific meeting schedule and clarifying the objectives for each call (e.g., Tuesday demos, Wednesday study lead discussions, Friday open group sessions). She stresses that meeting invitations should include links to supporting documents (such as the forum post) so that participants can easily access relevant information.
-
@Gowtham_Rao [Inquiry/Advocacy]:
Gowtham raises questions about the integration of Teams features—like recurring links—and highlights the need to accommodate different schedules, ensuring that meetings are convenient for all involved. His contributions underline the importance of technical logistics in sustaining smooth communication.
-
Lana Shubinsky [Inquiry]:
Lana queries the practical aspects of scheduling, such as ensuring that all study leads can attend and confirming that the recurring meeting link functions as intended. She also clarifies the process for setting up the invitation, demonstrating attention to detail in meeting logistics.
-
Assumption: The team assumes that using a single recurring Teams link for all meetings (or for specific sets of meetings) will be both efficient and sufficient to reach all intended participants.
-
Information Gap: There is limited discussion on how last-minute scheduling conflicts or changes in availability will be managed, particularly for the externally invited study leads.
-
Assumption: It is presumed that all participants are familiar with Microsoft Teams and its scheduling features, and that the existing distribution list will dynamically update to include new members without manual intervention.
The meeting scheduling and communication protocols are designed to maintain momentum and ensure that every stakeholder remains informed and engaged. The conversation reflects a balance between structured planning (with predefined meeting times and clear objectives) and flexibility (allowing adjustments based on participant availability). Emphasis on linking meeting invitations to supporting materials indicates a comprehensive approach to information sharing. However, reliance on a single communication platform assumes uniform proficiency and may require additional contingency measures for addressing unforeseen scheduling conflicts.
The team established a detailed meeting schedule to coordinate the Phenotype Phebruary initiative, setting recurring sessions on Tuesday, Wednesday, and Friday. These meetings are intended to cover demos, study lead coordination, and broader working group discussions. Participants stressed the importance of using Microsoft Teams to create recurring invitations with shared links to essential documents, ensuring all study leads and contributors are informed. This structured approach aims to foster clear communication and sustained progress throughout the project.
Q&A, Clarifications, and Process Adjustments.
This segment covers the session’s open discussion, during which participants fielded questions, clarified process uncertainties, and contemplated potential adjustments to the phenotyping workflow. The dialogue reveals concerns about the specificity and granularity of phenotype definitions—such as the varying definitions of “major surgery” or nuances in drug exposures—and whether pre-existing phenotypes are sufficient or require modifications. Questions about timeline flexibility, particularly if some phenotypes are not finalized by the end of Phebruary, are raised, highlighting the tension between adhering to deadlines and accommodating real-world constraints. The discussion also touches on the possibility of iterating beyond Phebruary, even though the official process is bounded to that month, emphasizing that iterative refinement is both expected and supported. Overall, this Q&A session serves to address ambiguities and ensure that all participants understand the process and expectations, while also allowing for minor adjustments based on emerging challenges.
-
@aostropolets [Advocacy]:
Anna is proactive in clarifying process steps and the importance of adhering to the timeline, while reassuring participants that iterative adjustments are permissible even after the formal deadline.
-
@zhuk [Inquiry]:
Oleg raises concerns regarding the timing and the possibility of modifying definitions—especially in complex cases such as “major surgery”—which signals a need for flexible boundaries within a structured framework.
-
@Azza_Shoaibi [Advocacy]:
Azza reinforces the timeline by emphasizing that while post-Phebruary modifications are possible, having a clear cutoff is essential for progressing to subsequent phases (e.g., data network studies). He also clarifies that the process is designed to accommodate iterative refinement, even if it means revisiting definitions or cohort specifications later.
-
@Gowtham_Rao [Inquiry/Advocacy]:
Gowtham contributes by underlining the importance of meeting deadlines for progression while acknowledging that the process is inherently iterative. He supports the idea that if phenotypes are not ready by the deadline, those studies might not advance to the next phase.
-
Assumption: The process assumes that the predefined deadlines (end of Phebruary) are sufficient to capture and correct most issues, even though participants acknowledge that complete alignment may be challenging.
-
Information Gap: There is some ambiguity around the criteria for “good enough” phenotypes and how exactly iterative improvements will be integrated post-deadline. While the team signals flexibility, the detailed process for post-deadline modifications is not fully delineated.
-
Assumption: It is presumed that clarifications provided during the meeting will be sufficient to resolve most participant queries, thereby maintaining progress without significant delays.
The Q&A and clarification phase is crucial as it addresses potential friction points in the process and reinforces the balance between structure and flexibility. The conversation reflects a well-calibrated effort to manage expectations: while the initiative sets firm deadlines to drive progress, it also recognizes the need for iterative refinement and adjustments based on real-world complexities. This dual approach is essential for managing a collaborative project with diverse participants and varying levels of expertise. The open exchange of questions and clarifications helps to build confidence and ensures that all stakeholders are aligned on process goals, timelines, and contingencies.
In the Q&A and Clarifications session, the team addressed concerns about phenotype specificity, timeline rigidity, and the process for making adjustments. Participants discussed the challenges of defining complex phenotypes—such as “major surgery”—and debated whether pre-existing templates are adequate. While the official deadline for phenotype completion is set at the end of Phebruary, the team acknowledged that iterative improvements can continue post-deadline. This balanced approach reinforces the need for structured progress while accommodating real-world complexities, ensuring that all contributors understand the expectations and can collaborate effectively.