OHDSI Phenotype Phebruary and workgroup updates

Phenotype Phebruary 2025 Office Hours – February 12, 2025

Topics

  1. Type 2 Diabetes Phenotype Development

    • Approaches to definition: diagnosis codes only vs. inclusion of lab values and medications
    • Sensitivity versus specificity trade-offs
    • Consideration of insulin use and alternate entry criteria
    • Comparison of multiple cohort variants and their impact on incidence rates
  2. Diabetic Retinopathy Screening Cohort Design

    • Differentiating in‐office, telemedicine, and AI-based screening outcomes
    • Analysis strategies: earliest event versus repeated events
    • Implementation of washout periods
    • Challenges with provider specialty mapping and reliance on specific CPT codes
    • Custom SQL versus standard Atlas cohort diagnostics
  3. Antipsychotic Treatment Cohort and Censoring Strategy

    • Exclusion of patients on other antipsychotics prior to index
    • Censoring rules for patients who switch medications post-index
    • Impact of censoring on outcome incidence and potential biases
    • Balancing strict (monotherapy) versus broader real-world cohorts
  4. Technical Implementation in Atlas and Query Logic

    • Indexing criteria: using visit start dates versus condition start dates
    • Handling of visits with embedded diagnoses and setting proper time constraints
    • SQL logic for defining event start/end dates and cohort entry/exit
  5. Naming Conventions and Infrastructure for the Phenotype Library

    • Establishing a standard naming scheme (e.g., prefixes like PP25 or F-25)
    • Integration with Odyssey forums and GitHub for clinical descriptions
    • Best practices for updating and maintaining phenotype definitions
  6. Project Coordination, Communication, and Next Steps

    • Progress tracking and proactive outreach to study leads
    • Scheduling of future office hours and follow-up meetings
    • Volunteer support and addressing any blockers in phenotype development

Topic 1 – Type 2 Diabetes Phenotype Development:

In this segment, the group deliberated on the optimal definition for a type 2 diabetes phenotype. Cindy Cai initiated the discussion by outlining the need for multiple phenotype variants for type 2 diabetes—one relying solely on diagnosis codes and others that incorporate lab values (e.g., high glucose measurements) and medication data (including or excluding insulin). This reflects a core tension: the trade‐off between sensitivity (capturing as many potential cases as possible) and specificity (avoiding misclassification by including only confirmed cases).

For type 2 diabetes phenotype development, the group discussed two primary approaches: one using diagnosis codes only and another that includes additional criteria such as lab measurements and medication exposures. While a sensitive definition is favored to ensure comprehensive patient capture for subsequent retinopathy screening, concerns were raised about potential specificity losses—especially regarding the inclusion of insulin. The consensus is to build and compare multiple phenotype variants and use cohort diagnostics to decide which definition best meets study requirements.

Anna Ostropolets advocated for a more sensitive approach, suggesting that when the primary focus is on downstream diabetic retinopathy screening, it is preferable to “capture everybody” by allowing alternate entry criteria such as lab values and medication records. Her [Advocacy] position emphasizes that maximizing sensitivity is critical for ensuring that the subsequent screening outcomes are not biased by an overly narrow diabetes definition. Conversely, Evan Minty raised questions regarding the inclusion of insulin—pointing out that, for many type 2 patients, the use of insulin may be transient or secondary to other treatments. His inquiry highlights a potential pitfall: including insulin might inadvertently lower specificity by capturing patients whose treatment patterns do not represent typical type 2 diabetes management.

Gowtham Rao further enriched the discussion by noting that, in real-world datasets, some patients may not have a diagnosis code even though lab and medication data indicate diabetes. This observation underlines the importance of using a dual approach to balance sensitivity and specificity. Implicit in these discussions is the assumption that the ideal phenotype should reflect clinical reality across diverse institutions, yet a gap remains regarding standardized thresholds (e.g., how many lab measurements qualify as “repetitive” enough to confirm diabetes).

Topic 2 – Diabetic Retinopathy Screening Cohort Design:

In this segment, the group focused on designing a cohort for diabetic retinopathy screening. Cindy Cai framed the discussion by emphasizing that retinopathy screening is inherently a repeated event rather than a one‐time occurrence. The intent is to capture all screening events over time rather than solely relying on the earliest event. This approach is important because the recommended clinical practice is that patients with diabetes should receive screening at least once a year.

For diabetic retinopathy screening, the group discussed designing a cohort that recognizes the recurring nature of screenings. The debate centered on whether to index solely on the earliest event or to capture all screening events, with a preference for the latter to align with annual screening recommendations. Key challenges include handling variable provider specialty mappings and potentially using custom SQL to integrate appropriate washout periods. The plan is to validate different cohort definitions using cohort diagnostics, ensuring that the final design accurately reflects clinical practice.

Key points discussed include:

  • Repeated Event vs. Earliest Event Approach:
    Cindy highlighted the need to differentiate between the first screening event and subsequent screenings. She proposed analyzing the pattern of screenings over multiple years (e.g., once per year) to assess adherence to clinical guidelines. This naturally leads to a choice between indexing on the earliest event versus capturing all events during a defined time at risk.
  • Implementation Challenges:
    The group noted that while the standard package supports basic cohort diagnostics, answering nuanced questions—like the impact of repeated screening events—may require custom SQL queries. These queries would allow for the integration of a washout period (for example, excluding a patient from being “at risk” for another screening within 365 days of the prior event).
  • Provider Specialty and CPT Codes:
    A challenge emerged regarding provider specialty mapping. In many datasets, especially within the OMOP framework, the specialty of the provider (e.g., ophthalmologist, optometrist) may not be consistently mapped. Consequently, using specific CPT codes for in-office screenings might be insufficient for capturing all relevant events. This led to a discussion about combining office visit codes with condition codes for visual system disorders as a potentially more sensitive method.
  • Empirical Evaluation:
    The speakers agreed that the optimal cohort design should be validated through cohort diagnostics. Running multiple versions of the cohort—each with slightly different criteria—will help determine which definition best captures the intended patient population and aligns with the clinical guidelines for annual screening.

Topic 3 – Antipsychotic Treatment Cohort and Censoring Strategy:

For the antipsychotic treatment cohort, the workgroup discussed strategies to isolate patients on a single antipsychotic. The plan involves excluding patients who have taken other antipsychotics prior to the index date and censoring individuals at the time they switch or add another medication post-index. While this approach helps maintain a homogeneous cohort, concerns were raised about the risk of informative censoring—since treatment changes might be driven by clinical factors that relate to the outcomes. The group recognized this trade-off and underscored the need for careful analysis when interpreting results.

Key points include:

  • Exclusion Prior to Index Date:
    Participants agreed that patients with exposure to antipsychotics other than the target treatment should be excluded before the index date. This step ensures that the study population begins as a monotherapy group, reducing confounding factors that could arise from prior polypharmacy.

  • Censoring Post-Index Date:
    The conversation then shifted to handling patients who switch or add antipsychotic medications after the index date. Anna Ostropolets recommended implementing censoring at the point of switching to maintain the purity of the treatment cohort. In other words, if a patient starts another antipsychotic after initiating the target drug, the patient’s follow-up should end at that moment.

  • Trade-Offs and Informative Censoring:
    Andrew Williams raised an important methodological consideration: censoring patients who switch treatments might introduce informative censoring. This means that the reasons for switching (e.g., side effects, lack of effectiveness) could be related to the outcomes of interest, potentially biasing the results. The discussion acknowledged this trade-off, with some group members noting that while stricter censoring maintains a cleaner cohort, it might limit generalizability if many patients switch medications in real-world settings.

  • Balancing Monotherapy with Real-World Practice:
    The group recognized the tension between an ideal monotherapy cohort—which can closely resemble a clinical trial setting—and the variability inherent in real-world treatment patterns. The consensus was that while excluding patients with post-index changes might lead to a loss of data (and possibly informative censoring), it remains a common approach to ensure that observed outcomes are attributable to the treatment under study.

Topic 4 – Technical Implementation in Atlas and Query Logic:

For technical implementation in Atlas, the team agreed to use the visit start date as the index, with conditions required to occur between the visit’s start and end dates. Given that condition end dates are often missing—especially for outpatient records—the query logic defaults to the condition start date. Although standard Atlas functionality covers most needs, the group acknowledged that custom SQL might be necessary to handle more complex cases and ensure the cohort definition accurately reflects clinical intent.

The discussion centered on several key aspects:

  • Indexing Criteria and Date Logic:
    Participants debated whether the index date should be based on the visit start date or the condition start date. The consensus leaned toward using the visit start date as the anchor, ensuring that any associated condition occurrence falls within the boundaries of that visit. This means the query logic should enforce that the condition’s start date is on or after the visit’s start date and before the visit’s end date.

  • Handling Condition Occurrence Dates:
    A significant technical point was the fact that the condition occurrence table in OMOP may not always populate the condition end date—especially in outpatient data. As a result, Atlas SQL typically defaults to using the condition start date when an end date is absent. The group highlighted that this approach ensures that the condition is appropriately linked to the visit period, even if the full temporal span isn’t available.

  • Query Construction in Atlas:
    The conversation covered how to set up constraints within Atlas. For example, one can configure the inclusion criteria so that the condition occurrence must start within a specified interval relative to the visit occurrence. This may involve adding additional constraints (such as “0 days before and all days after” the visit start date) to capture the full intended window. There was also mention of “inverse logic” and concatenating condition dates where necessary, emphasizing that the SQL logic should ultimately reflect the clinical rationale behind the cohort.

  • Customization and Data Source Variability:
    Several participants noted that while a standard query logic can be defined, adjustments may be necessary depending on the specifics of the data source—such as how visits and conditions are recorded. The group acknowledged that custom SQL might be required for certain questions, particularly when standard Atlas functionality doesn’t fully accommodate complex scenarios.

Topic 5 – Naming Conventions and Infrastructure for the Phenotype Library:

The workgroup discussed the need for standardized naming conventions and infrastructure enhancements for the phenotype library. Proposed conventions include using a prefix (such as “F-25” or “PP25”) to denote the phenotype development period followed by the study name. This system, combined with brief clinical descriptions, aims to improve consistency, discoverability, and ease of collaboration across both Odyssey forums and the GitHub repository. Although strict enforcement may be challenging, adopting these guidelines is expected to enhance the overall management and utility of phenotype definitions.

Key points include:

  • Standardized Naming Conventions:
    Participants proposed using a consistent prefix to denote the phenotype development initiative (e.g., “F-25” or “PP25” representing “Phenotype February 2025”) followed by an underscore and the study or cohort name. This convention would make it easier to identify, organize, and retrieve phenotype definitions across the network.

  • Integration with Existing Infrastructure:
    The discussion emphasized linking these standardized names to both the Odyssey forums (where clinical descriptions and discussions reside) and the GitHub-hosted phenotype library. This dual-system approach leverages the public and searchable nature of the forums while maintaining version-controlled records on GitHub.

  • Collaboration and Documentation:
    There was consensus on the importance of not only naming the phenotypes consistently but also including brief clinical descriptions. These descriptions clarify the clinical intent and study context without requiring extensive documentation. Such metadata will be essential for both study leads and volunteers to understand the nuances of each phenotype.

  • Flexibility and Enforcement:
    While a recommended naming scheme was discussed, it was acknowledged that strict enforcement might be challenging given the volunteer-driven nature of the initiative. However, standard guidelines would enhance interoperability and make it easier for data partners and study leads to locate and run the relevant cohorts.

Topic 6 – Project Coordination, Communication, and Next Steps:

For project coordination and next steps, the workgroup agreed to maintain robust communication through scheduled office hours and proactive email follow-ups to track study progress. Volunteers are encouraged to assist study leads, and standard naming and documentation practices will be enforced to ensure consistency across the phenotype library. These efforts aim to streamline collaboration and ensure that all studies progress efficiently toward their targets.

Key discussion points include:

  • Progress Tracking and Follow-Up:
    Anna Ostropolets noted that while some study leads are advancing well with their phenotypes, there is a lack of visibility into the status of the other half of the group. The plan is to proactively reach out via email—especially on Thursday—to assess each study’s progress and provide assistance as needed.

  • Scheduled Meetings and Office Hours:
    The team confirmed that regular office hours will continue (with the next session scheduled for Friday at 9:00 AM), ensuring ongoing support and timely updates. This recurring communication channel is critical for addressing blockers and sharing updates across studies.

  • Volunteer Engagement and Resource Sharing:
    The group emphasized the importance of volunteer contributions and encouraged study leads to connect with volunteers who have relevant expertise. This collaborative spirit is designed to maximize the effective use of available skills and improve overall study outcomes.

  • Infrastructure and Standardization:
    Alongside coordination efforts, there was an emphasis on aligning study outputs with the established infrastructure—such as ensuring that phenotype definitions are properly named, linked in the progress tracker, and available in both the Odyssey forums and GitHub. Clear documentation and standard naming conventions were highlighted as essential for facilitating both internal review and external dissemination.