Workgroup - Health Economics & Value Assessment (HEVA)

Workgroup Name:

Health Economics & Value Assessment (HAVE) Workgroup

OHDSI Teams Channel
General | Workgroup - Health Economics & Value Assessment (HEVA) | Microsoft Teams

Youtube https://www.youtube.com/@OHDSIWorkGroupHEVA

Continuing from old thread

Vision:

A world where collaborative real-world evidence provides a comprehensive understanding of value, accelerating patient access to innovative therapies within a sustainable healthcare ecosystem.

Mission:

To empower the OHDSI community to improve health by collaboratively generating reliable evidence on comparative value and economic impact. This workgroup will develop and promote open-source tools and standardized methods for evidence synthesis that meet the evidentiary requirements of HTA bodies and payers, informing medical policy and enabling value-based reimbursement.

Core Objectives:

  • Advance Data Standards and Methods: Lead the use-case-driven evolution of the OMOP Common Data Model by proposing and standardizing extensions for cost, payer, and other health economic data. Develop and empirically evaluate robust methodologies for value assessment, incorporating risk adjustment and total cost of care analyses.
  • Develop Open-Source Analytics: Design, build, and maintain a suite of open-source analytics tools within the OHDSI ecosystem that empowers the community to generate evidence for HTA dossiers, formulary management, and clinical pathway optimization.
  • Generate High-Impact Clinical Evidence: Spearhead and support federated network studies that apply our standardized methods and tools to generate real-world evidence on value, informing payer coverage policy, hospital formulary decisions, and value-based contracting.
  • Foster Multi-Stakeholder Collaboration: Cultivate a trusted, neutral forum for OHDSI collaborators from life sciences, payers, and provider systems to co-develop common evidence standards for value assessment.

Sign up for HAVE workgroup Microsoft Forms

HEVA Workgroup OKRs (Q4 2025 - Q1 2026)

Mission: To empower the OHDSI community to improve health by collaboratively generating reliable evidence on comparative value and economic impact.

Objective

Revitalize and re-introduce foundational Health Economics and Value Assessment (HEVA) capabilities into the OHDSI ecosystem (OMOP CDM and open-source analytics) to enable standardized research on comparative value and economic impact.

Key Results

KR 1: Data Standards Advancement and Consensus

Deliver a comprehensive, implementation-ready proposal to the OHDSI CDM Workgroup for the reintegration of 100% of defined previously community approved cost and utilization specifications (modernized from CDM v6.0) into the “proposed CDM v5.5,” achieving documented consensus (defined as zero unresolved critical objections during the community review period) by the end of Q1 2026.

KR 2: Stakeholder Perception Shift and Engagement

Shift industry perception regarding OHDSI’s HEOR capabilities by demonstrating the HEVA roadmap and use cases, securing commitment from at least 3 distinct organizations (representing pharma, payer, and/or academia) to pilot the “proposed CDM v5.5” by the end of Q4 2025.

KR 3: ETL Convention Validation

Establish and publish THEMIS ETL conventions for the “proposed CDM v5.5,” validated by successful implementation (data loaded and verified according to conventions) by at least 1 organization by the end of Q1 2026.

KR 4: Open-Source Analytics Adoption

Launch the alpha version of the OHDSI Cost and Utilization software framework (e.g., CostUtilization package) and achieve successful execution and validation testing by at least 2 external research groups by the end of Q1 2026.

KR 5: Dissemination of Best Practices

Establish foundational community best practices by publishing the “Conducting Cost and Utilization Studies using OMOP” chapter in the Book of OHDSI, achieving release to the editors of the Book of OHDSI for 2025 publication.

OKR 2025.docx

Upcoming Meetings of the HEVA workgroup (Sept 17, 24, Oct 1)

Hello community,

Thank you for the excellent engagement and enthusiasm surrounding the revitalization of the Health Economics and Value Assessment (HEVA) workgroup (Old) Workgroup - Health Economics & Value Assessment (HEVA). To ensure OHDSI can support the requirements of HTA bodies and payers, we have finalized our Objectives and Key Results (OKRs) for Q4 2025 and Q1 2026 (see post above).

This roadmap defines our plan for re-introducing foundational cost and utilization capabilities into the OHDSI ecosystem and shifting the perception of OHDSI’s viability for HEOR research.

Our next three workgroup meetings will be dedicated deep dives into the backbone of economic research: the COST table.

We strongly encourage participation from all interested community members, particularly those with expertise in HEOR data standards, ETL implementation, and economic analysis. Your input is critical as we modernize previous specifications and build consensus.

Please join us for the following sessions:

  • September 17th (Today Wed 9/17/2025 9:00 AM - 10:00 AM EST): Data Standards (Supports KR 1)

    • Topic: Reviewing and finalizing the proposed modifications to the COST table in the “proposed CDM v5.5” (migrated back from previously community-approved v6.0 specifications).
      Join the meeting now
  • September 24th: ETL Conventions (Supports KR 3)

    • Topic: Establishing and reviewing the THEMIS conventions required for standardized ETL of cost data into the “proposed CDM v5.5.”
  • October 1st: Open-Source Analytics (Supports KR 4)

    • Topic: Reviewing the alpha version of the new R package(s) for the Cost and Utilization framework (e.g., CostUtilization package).

Please check the OHDSI MS Teams environment for meeting invitations and calendar details.

HEVA WG Meeting (Sept 17): Data Standards (KR1) – Proposal for the “Proposed CDM v5.5” COST and PAYER_PLAN_PERIOD Tables

Hello everyone,

Today’s meeting (September 17th) focuses on our first Key Result: KR 1: Data Standards Advancement and Consensus. We will review the specific modifications required to reintegrate foundational health economics capabilities into the OMOP CDM.

The central question facing the OHDSI community regarding health economics data is: How can we efficiently incorporate normalization and structural enhancements approved in CDM 6.0 to enable HEOR research into the widely adopted 5.x lineage while maintaining backward compatibility?

The Rationale: Reintegration, Not Reinvention

It is critical to emphasize that the proposals presented today are not novel introductions requiring extensive foundational debate. Instead, they represent the reintegration of work previously approved by the OHDSI community during the development of CDM 6.0.

The OHDSI community previously engaged in extensive discussions that resulted in significant, vetted improvements to the COST and PAYER_PLAN_PERIOD tables, addressing long-standing limitations in the CDM 5.x series. These improvements included:

  1. Normalization of Cost Data: Shifting from a rigid “wide” format (many specific cost columns) to a flexible “long” format capable of handling diverse cost types (e.g., adjustments, rebates).
  2. Improved Linkage (Person-Centricity): Explicitly linking costs to the PERSON table, aligning with the core OMOP design philosophy and improving query efficiency.
  3. Enhanced Temporality: Adding precise date fields to distinguish between incurred, billed, and paid timelines.
  4. Standardization of Payer Information: Utilizing standard concepts for payers, plans, and sponsors.

Although the full adoption of CDM 6.0 stalled (due to breaking changes in OHDSI software, specifically datetime), the value of these specific economic enhancements remains undisputed.

The HEVA workgroup is resurfacing this prior community agreement. We propose their integration into a “proposed CDM v5.5.” This hybrid approach resolves the tension between modernization and stability. It integrates the key features from CDM 6.0 while maintaining backward compatibility with the existing 5.x analytical infrastructure by implementing these changes in a non-breaking manner (i.e., adding new fields and making legacy fields optional).

Below is a detailed breakdown of the proposed changes. (Please refer to the accompanying spreadsheet for a full field-by-field comparison between v5.4, v6.0, and the proposed v5.5.)


Detailed Changes to the COST Table (Proposed CDM 5.5)

The proposed CDM 5.5 COST table incorporates structural additions and normalization fields from 6.0 while modifying legacy fields to facilitate transition. Note - these are previously approved by CDM workgroup.

A. Structural Additions and Linkages (Reintegrated from 6.0)

  • person_id (NEW, Required): Explicitly associates each cost record with a subject in the PERSON table (GitHub Issue #81).
  • cost_event_field_concept_id (NEW, Required): Identifies the source domain/table of the linked clinical event. This enables clearer and more flexible event linking than the legacy cost_domain_id. (GitHub Issue #367)
  • Data Type Updates: To ensure scalability, cost_id and payer_plan_period_id are updated from INTEGER to BIGINT (GitHub Issue #198).

B. Normalization Fields (The Shift to Long Format, from 6.0)

This change transitions the table from a wide format to a long format (one row per cost type), which is the most critical enhancement for HEOR flexibility.

  • cost (NEW, Required): Represents the numeric value of the cost record. This supports negative values (e.g., adjustments or rebates).
  • cost_concept_id (NEW, Required): A standardized concept defining the semantic meaning of the cost value (e.g., Allowed, Paid, Coinsurance, Deductible). The interpretation of these concepts will be rigorously defined in the accompanying THEMIS conventions (to be discussed Sept 24th). See different concept IDs here.
  • cost_source_concept_id / cost_source_value (NEW, Required): Capture the original source system’s code and value, ensuring data provenance and traceability.
  • cost_type_concept_id (NEW, required): Captures provenance. See concept IDs here.

C. Temporal Fields (Reintegrated from 6.0)

Precise tracking of financial activity over time is crucial for time-series modeling (GitHub Issue #81).

  • incurred_date (NEW): Captures when the service or product was received.
  • billed_date (NEW): Date when the bill was generated.
  • paid_date (NEW): Captures when payment was received.

D. Modifications to Legacy Fields (Transitional, Non-Breaking Change)

  • Legacy Cost Fields (Make Optional, these fields were deleted in CDM 6.x): The following fields are made Optional (keeping as optional to ensure no breaking change), as their information content is now represented by the normalized structure (cost + cost_concept_id):

  • total_charge, total_cost, total_paid, paid_by_payer, paid_by_patient, paid_patient_copay, paid_patient_coinsurance, paid_patient_deductible, paid_by_primary, paid_ingredient_cost, paid_dispensing_fee, amount_allowed.

  • cost_domain_id (Make Optional): This field is superseded by the functionality of cost_event_field_concept_id (GitHub Issue #164).

  • revenue_code_concept_id/source_value and drg_concept_id/source_value (Optional): These elements represent clinical or billing classifications that appropriately belong in clinical domains (e.g., PROCEDURE_OCCURRENCE or VISIT_OCCURRENCE), not the COST table. In the follow-up THEMIS conventions, we will describe how to ETL data such that record-level linkage of cost to revenue_code and drg is possible.


Detailed Changes to the PAYER_PLAN_PERIOD Table (Proposed CDM 5.5)

The proposed CDM 5.5 PAYER_PLAN_PERIOD table incorporates standardized concepts and structural enhancements from 6.0 to better characterize insurance coverage and financing mechanisms (GitHub Issue #120).

A. Standardized Payer/Plan Identification (from 6.0)

  • Addition of payer_concept_id and payer_source_concept_id to enable standardized analysis of payer entities.
  • Addition of plan_concept_id and plan_source_concept_id to standardize the representation of health benefit plans (e.g., Bronze, Silver, Gold).

B. Sponsor and Financing Details (from 6.0)

  • Addition of sponsor_concept_id, sponsor_source_value, and sponsor_source_concept_id to capture the entity financing the health plan (e.g., self-insured, small group, large group).

C. Contract and Family Structure (from 6.0)

These changes address ambiguities in representing family insurance scenarios (GitHub Issue #107).

  • Addition of contract_person_id (identifying the primary subscriber/contract owner).
  • Addition of contract_source_value and contract_concept_id (standardized relationship representation, e.g., spouse, child).
  • family_source_value is made optional and deprecated, as its function is replaced by the more precise contract fields.

D. Termination Details (from 6.0)

  • Addition of stop_reason_concept_id, stop_reason_source_value, and stop_reason_source_concept_id to capture the standardized reason for the termination of the coverage period (e.g., employment termination, Medicare entitlement).

E. Structural changes to use bigint


Today’s Objective

Our goal today is to review these reintegrated specifications, confirm their alignment with the previously approved CDM 6.0 structure, and address any critical implementation concerns. By building consensus here, we can deliver a comprehensive, implementation-ready proposal to the broader OHDSI CDM Workgroup.

We look forward to a productive discussion.

A. Structural Additions and Linkages (COST Table)

This section covers the discussions related to adding person_id, introducing cost_event_field_concept_id, and updating data types for scalability.

GitHub Issue #81: Cost table changes (add PERSON_ID, dates and normalize)

Focus: Addition of person_id.

Summary of Discussion:

This issue served as the primary proposal for modernizing the COST table. A key structural change advocated was the direct inclusion of the person_id.

Argued that this addition was necessary to align the COST table with the fundamental design of the OMOP CDM.

  • Person-Centricity: He emphasized that the OMOP CDM is inherently person-centric, requiring every domain table to link directly to the PERSON table. The COST table in CDM 5.x violated this core principle.
  • Query Efficiency: He highlighted the inefficiency of the prior structure, which necessitated joining the COST table through a clinical event table (e.g., PROCEDURE_OCCURRENCE) just to identify the associated person. Adding person_id directly simplifies and accelerates person-level cost analysis.

GitHub Issue #367 (Vocabulary): Create concepts for all tables and fields

Focus: Vocabulary support for cost_event_field_concept_id.

Summary of Discussion:

This vocabulary issue addressed the need for standardized concepts to populate the new cost_event_field_concept_id. This field identifies the source domain/table of the clinical event linked to the cost record, replacing the less precise cost_domain_id.

Clarified the requirements and guided the necessary vocabulary development.

  • Field Purpose: He explained that cost_event_field_concept_id is essential for unambiguously identifying which field in the source table serves as the foreign key (the event ID) linking the clinical event to the cost.
  • Vocabulary Guidance: He requested the creation of standard concepts corresponding to the primary key fields of the relevant domain tables (e.g., concepts for procedure_occurrence_id, visit_occurrence_id).

image

GitHub Issue #198: Change all ID fields to BIGINT

Focus: Data Type Updates for Scalability (INTEGER to BIGINT).

Summary of Discussion:

This critical issue proposed changing all primary and foreign key ID fields across the CDM to address severe scalability limitations.

Providing real-world evidence of the limitations of the INTEGER data type.

  • Scalability Limits: He highlighted that the 32-bit INTEGER type has a maximum limit of approximately 2.1 billion records.
  • Real-World Data Volumes: He pointed out that large administrative claims databases (e.g., MarketScan, Medicare) frequently exceed this limit in core tables.
  • ETL Complications and Workarounds: He described the significant problems that arise when these limits are hit, forcing organizations to implement non-standard workarounds (like using negative IDs) that violate CDM conventions and compromise data integrity.
  • Long-Term Viability: He argued that adopting BIGINT as a standard is essential for the long-term viability and scalability of the OMOP CDM.

image

B. Normalization Fields (The Shift to Long Format)

GitHub Issue #81: Cost table changes (add PERSON_ID, dates and normalize)

Focus: Normalization (Introduction of cost and cost_concept_id).

Summary of Discussion:

A major component of this issue was the transition of the COST table from a “wide” format to a “long” (normalized) format.

Provided detailed justifications for this modernization.

  • Inflexibility of Wide Format: He argued that the “wide” format (which used many specific columns like total_paid, paid_by_patient) was rigid and inefficient. It failed to accommodate the diverse range of cost types found in real-world claims data (e.g., adjustments, rebates, various provider payments).
  • Advocacy for Normalized Structure: He championed the normalized structure using a single cost field (numeric value) and a cost_concept_id (semantic meaning). This “long” format allows for unlimited extensibility and standardization of cost types without altering the schema.

C. Temporal Fields

GitHub Issue #81: Cost table changes (add PERSON_ID, dates and normalize)

Focus: Addition of incurred_date, billed_date, paid_date.

Summary of Discussion:

The proposal included adding distinct date fields to accurately capture the timeline of financial transactions.

Advocated for enhanced temporality.

  • Analytical Precision: He emphasized that distinguishing between when a service was incurred, billed, and paid is crucial for accurate financial modeling and time-series analysis in health economics. The previous model lacked this necessary precision.

image

D. Modifications to Legacy Fields

GitHub Issue #164: Cost table: cost_domain_id

Focus: Deprecation of cost_domain_id.

Summary of Discussion:

This issue scrutinized the cost_domain_id field, highlighting issues with its implementation and standardization, and proposing its replacement.

Articulated the limitations of this field and advocated for a normalized approach.

  • Lack of Standardization and Ambiguity: He pointed out that cost_domain_id was a character field (string) lacking standardized vocabulary control. This led to inconsistent usage across different ETL implementations (e.g., using “Procedure” vs. “PROCEDURE_OCCURRENCE”).
  • Inefficiency: He argued that the field was inefficient for querying.
  • Replacement Strategy: He supported the deprecation of cost_domain_id in favor of the standardized, concept-based mechanism (cost_event_field_concept_id).

image

Payer Plan Linkage and Data Integrity

GitHub Issue #714: The PAYER_PLAN_PERIOD_ID column of the COST table…

Focus: Nullability and Foreign Key constraints for payer_plan_period_id.

Summary of Discussion:

This discussion focused on the linkage between the COST and PAYER_PLAN_PERIOD tables, specifically debating whether the payer_plan_period_id field should be a required (NOT NULL) field in the COST table.

Argued strongly for keeping this field optional (nullable), emphasizing the realities of source data diversity.

  • Diverse Cost Scenarios: He provided examples where costs are not associated with an insurance plan, such as self-pay patients or data sources where insurance details are unavailable.
  • Data Quality Risks: He argued that making the field required would force ETL developers to create artificial or assumed links (e.g., linking to a default “unknown” plan), which compromises data accuracy.
  • Balancing Ideals and Reality: While acknowledging that linking costs to payer plans is ideal for health economics analysis, he stressed that the CDM must be flexible enough to accommodate real-world data limitations without forcing inaccuracies.

image

Meeting recap

Video uploaded to YouTube
This is a summary of the kickoff meeting for the OHDSI Health Economics Value and Assessment (HEVA) workgroup, held on September 17, 2025.

Meeting Overview

  • Workgroup: Health Economics Value and Assessment (HEVA) – a revitalization of the former Cost and Utilization workgroup.
  • Date: September 17, 2025
  • Leads: Gowtham Rao (Lead), Gaurav Dravida (Co-lead), Lana Shubinski (Organizer).
  • Attendance: 12 attendees (of 28 registered members), representing diverse global stakeholders from industry (e.g., Gilead, EPAM), academia, and public health institutions across the US, Europe (Croatia, Portugal), Asia (Korea), and Africa (Uganda).

Vision, Mission, and OKRs

Gowtham Rao introduced the workgroup’s direction, emphasizing the goal of delivering value (highest quality at the lowest cost) and generating evidence that meets the requirements of Health Technology Assessment (HTA) bodies and payers.

Mission: To empower the OHDSI community to improve health by collaboratively generating reliable evidence on comparative value and economic impact.

The Objectives and Key Results (OKRs) for Q4 2025 and Q1 2026 focus on reintroducing foundational capabilities:

  • KR 1 (Data Standards): Propose the reintegration of previously approved cost specifications into a “proposed CDM v5.5.”
  • KR 2 (Perception Shift): Change the perception that OHDSI cannot support HEOR research and secure organizational commitments to pilot the new standards.
  • KR 3 (ETL Conventions): Establish and validate THEMIS ETL conventions for the new cost structure.
  • KR 4 (Open-Source Analytics): Launch an alpha version of the OHDSI Cost Utilization software (R package).
  • KR 5 (Best Practices): Publish a chapter on conducting cost studies in the Book of OHDSI.

Core Proposal: Reintegration of Cost Standards (CDM 5.5)

The central topic was the plan to update the OMOP CDM to better handle cost data. Gowtham emphasized a strategy of “Reintegration, Not Reinvention.”

Historical Context: Significant improvements to the COST and PAYER_PLAN_PERIOD tables were developed and approved by the community years ago for CDM v6.0. However, CDM 6.0 was deprecated due to breaking changes (e.g., the introduction of datetime fields) that disrupted OHDSI software, causing these cost improvements to be lost when the community reverted to the CDM 5.x lineage.

The Strategy: The HEVA workgroup proposes integrating these previously approved CDM 6.0 enhancements into a “proposed CDM 5.5” in a way that maintains backward compatibility and avoids breaking changes.

Key Changes Proposed for CDM 5.5:

  1. Normalization (Shift to Long Format): The current CDM v5.4 COST table uses a “wide” format with rigid, US-centric columns (e.g., allowed_amount, paid_by_patient). This is inflexible for global use. The proposal shifts to a “long” format using generic fields: cost (the numeric value) and cost_concept_id (the standardized meaning, e.g., Paid, Rebate, Adjustment).
    • Backward Compatibility: Legacy wide-format fields will be kept but made optional.
  2. Person-Centricity: Adding person_id to the COST table. This field is missing in CDM 5.4, violating OMOP design principles and hindering query performance.
  3. Enhanced Temporality: Adding specific date fields to track financial timelines: incurred_date, billed_date, and paid_date.
  4. Standardized Linkages: Replacing the ambiguous free-text cost_domain_id with a standardized cost_event_field_concept_id to clearly link costs to specific clinical events.
  5. Payer Standardization: Enhancing the PAYER_PLAN_PERIOD table with standardized concepts for the Payer, the Plan (benefit design), and the Sponsor (financing entity).

Discussion Highlights and Feedback

Participants provided critical feedback on the OKRs and the CDM proposal:

  • International Scalability: Ram Varma questioned how the model would accommodate diverse global financing structures (e.g., US deductibles vs. European systems). Gowtham noted the shift to the long format enables this standardization, which will be the focus of the next meeting.
  • Validation: Ram Varma stressed the need to validate the new cost model against established external benchmarks (e.g., Milliman Care Index). Gowtham agreed to incorporate this feedback into the OKRs.
  • Use Case Scoping: Sebastiaan van Sandijk emphasized that data requirements differ significantly based on the use case (e.g., cost characterization vs. comparative effectiveness studies, which have a higher bar for baseline adjustment due to skewed cost distributions). He suggested defining specific use cases early.

Action Items and Next Steps

The workgroup has adopted an aggressive, weekly meeting cadence to achieve consensus before the OHDSI Symposium (October 9th).

  1. Review and Feedback: All members are requested to review the detailed CDM 5.5 proposal documents (including historical GitHub links) available on the OHDSI Forum thread and MS Teams channel and provide feedback.
  2. Subgroup Formation: A subgroup (including volunteers like Ram Varma and Sebastiaan van Sandijk) will be formed to address ETL conventions and international standardization in preparation for the next meeting.
  3. Next Meeting (September 24th): Deep dive into ETL Conventions and Standardization (KR 3), focusing on global data mapping.
  4. Following Meeting (October 1st): Review of Open-Source Analytics (KR 4) – the CostUtilization package.

Video uploaded to https://youtu.be/OzrMuJyuyA4
MS teams link OHDSI tenant 20250924_second_meeting

HEVA Workgroup Update: Defining THEMIS ETL Conventions for Actuarial-Grade Cost Data in OMOP CDM 5.5

To the OHDSI Community,

The Health Economics Value and Assessment (HEVA) workgroup convened on September 24, 2025, for a deep dive into a comprehensive proposal for new THEMIS ETL conventions specifically tailored for the upcoming OMOP CDM 5.5 Cost table.

The objective of these conventions is ambitious and crucial for the advancement of health economics research within the OHDSI network: to establish strict, unambiguous, and rigorous rules ensuring that data standardized within the OMOP CDM Cost tables achieves actuarial-grade precision. As emphasized during the meeting, the goal is to ensure that any standardized analysis run on this data is reliable enough for actuarial certification and regulatory submission.

This post outlines the governing principles, the significant architectural shifts, the detailed mapping rules discussed, and the strategic decisions made regarding implementation. We invite the community to review and provide feedback on this foundational framework.

The Foundational Framework: Three Governing Principles

The proposed conventions are anchored by three core principles intended to resolve the ambiguity inherent in health economic data, where the meaning of financial terms is often tightly coupled with the source system that generated them.

1. Semantic Accuracy (The ‘What’)
This principle mandates the precise definition of the economic meaning of a financial value, captured in the cost_concept_id. The conventions enforce strict adherence to standardized definitions:

  • Charge: The gross amount a provider asks for (billed/submitted amount).
  • Allowed: The negotiated rate the provider contractually agrees to accept as payment in full. Crucially, this includes both the payer’s share and the patient’s responsibility (deductible, copay, coinsurance).
  • Paid: The amount actually received by the provider (revenue realization).
  • Cost: The provider’s internal economic expenditure required to render the service. This must never be confused with Charge, Allowed, or Paid amounts.

2. Terminal Provenance (The ‘Where’)
This principle defines the origin of the data record, captured in the cost_type_concept_id. The rule is strict: the provenance must reflect the immediate and final source system from which the ETL developer extracted the data—the direct “source of truth.” This enforces a separation of concerns, requiring the CDM to act as an “honest broker” without interpreting or inferring upstream processes.

3. Visit-Centric Architecture (The ‘Context’)
The most critical architectural mandate is that all financial data must be linked to a clinical context. The framework enforces a star-like schema where the VISIT_OCCURRENCE table is the central hub. All cost records must link to an event record (e.g., DRUG_EXPOSURE, PROCEDURE_OCCURRENCE) via the visit architecture.

Key Architectural Shifts and ETL Mandates

The adoption of these principles necessitates significant changes in how ETL processes are structured and executed.

Normalization and the Long-Form Model (CDM 5.5)
While CDM 5.5 is designed to be backward compatible with the legacy 5.4 structure (which used explicit named columns), the new THEMIS convention mandates that ETL processes prioritize the normalized, “long-form” structure (Entity-Attribute-Value approach) using cost_concept_id. The use of legacy fields is strongly discouraged.

The Granularity Imperative: Lossless ETL and No Aggregation
A critical convention introduced is the strict prohibition of aggregating or summarizing data during the ETL process. To maintain terminal provenance and prevent information loss (addressing the historical issue of OMOP being a “lossy ETL” for financial data), the ETL must faithfully represent the source data.

  • Component Mapping: If the source provides components (e.g., Payer Paid, Deductible, Copay), each must be mapped individually to its own cost_concept_id without being summed.
  • The ‘Calculated’ Exception: A narrow exception exists if a required aggregate value (like ‘Allowed’) is missing from the source, but all its components are present. In this specific case, the value may be derived during ETL, but it must be explicitly tagged using a specific ‘Calculated’ concept ID (e.g., “Allowed (calculated)”) to indicate it was derived, not observed.

Micro-Costing and Structural Implications
The visit-centric architecture utilizes both VISIT_OCCURRENCE (for summary/claim-level costs) and VISIT_DETAIL (for detail/line-item costs) to capture granularity. This enables true micro-costing.

A direct consequence of this granular approach is that the COST table is expected to become the largest table in the OMOP CDM.

Domain-Specific Rule: Drug Records
The convention mandates that every record in the DRUG_EXPOSURE table—including dispensations like pharmacy pickups—must have a corresponding VISIT_OCCURRENCE record. Even if a clinical encounter did not occur, a conceptual visit (e.g., “pharmacy visit”) must be created. This ensures all drug costs are tied to a context, though it will significantly increase the size of the visit tables.

Precision in Mapping: Concepts and Provenance

The workgroup reviewed prescriptive guidance and detailed inclusion/exclusion criteria for mapping.

Mapping Economic Concepts (cost_concept_id)
Key nuances discussed include:

  • International Contexts: It was highlighted that “Charge” is often a US-centric concept. In non-US, tariff-based systems (e.g., Germany, UK, France), the national tariff functions as an allowed amount and should be mapped to ‘Allowed’, not ‘Charged’.
  • Defining ‘Cost’: The group reinforced that ‘Cost’ refers strictly to the provider’s expenditure. In the US, this is often derived using Cost-to-Charge Ratios (CCR). In other systems (e.g., UK National Cost Collection), it can be a direct measure from cost accounting systems. Payer-originated data labeled “cost” typically refers to the payer’s payment, not the provider’s expense.
  • Patient Responsibility: Clear distinctions were made between ‘Patient Liability’ (the contractually defined patient share for covered services) and ‘Balance Bill’ (an additional charge above the allowed amount).

Mapping Data Provenance (cost_type_concept_id)
Adherence to the Terminal Provenance principle requires careful mapping of the data source:

  • Adjudicated Claim: Data from a payer’s finalized claims system or remittance advice. Characterized by the coexistence of submitted charges and final payer determinations (Allowed, Paid).
  • Provider Charge Master: Reference data (a price list) containing only gross charges, devoid of payer-determined amounts.
  • Payer/Govt. Fee Schedule (CRITICAL RULE): This concept must only be used when the ETL process is loading the fee schedule itself as reference data. It must not be used for a cost record derived from an adjudicated claim, even if the payment on that claim was determined by a fee schedule; in that case, the correct provenance is ‘Claim’.
  • Provider Cost Accounting: True internal expense data (e.g., from ERP systems or national cost collections).

Discussion and Strategic Decisions

The discussion clarified the boundaries of the ETL process and resulted in a crucial strategic decision regarding implementation.

Clarifying the Boundary: ETL vs. Analytics
The workgroup addressed the challenge of tracking patient journeys across multiple VISIT_OCCURRENCE records. It was affirmed that the ETL layer must remain a faithful, non-aggregated representation of the source data. The logic for creating longitudinal “episodes of care” by grouping related visits belongs strictly in the analytics layer, not the ETL layer.

Strategic Implementation Pivot
Feedback was provided regarding the implementation roadmap. While the framework is designed to support multiple data sources (claims, EHR, patient-reported), achieving the high-stakes goal of actuarial certification requires a focused approach. Attempting to support all sources simultaneously was deemed too complex for the initial phase.

Decision: The workgroup agreed to adopt the full framework conceptually but to phase the implementation. The initial focus for implementation, testing, and validation will be exclusively on claims-based use cases.

Focusing first on claims data—the most common and structured source for health economics research—allows the workgroup to establish a robust, actuarially sound foundation before addressing the greater complexity of EHR and other data sources.

Next Steps

We are presenting this detailed framework to the OHDSI community for discussion and feedback. The full THEMIS convention documentation will follow. We ask the community to review these conventions, keeping in mind the stated goal of actuarial-grade data quality and the initial prioritization on claims data implementation.

We look forward to your input.

Gowtham A Rao MD, PhD

I will post the detailed THEMIS conventions in here. For now, preliminary version is here @gdravida09

1 Like

Hi Gotham, your contributions here are truly impressive. I’m not an informatician but a health economist, so I’m approaching this from the perspective of data use in health economics research.

In Korea’s national single-payer system:

Reimbursed (급여) items: prices are fixed by the government, so they seem to align well with Allowed Amount (Insurer Payment + Patient Liability).
Non-reimbursed (비급여) items (e.g., some cosmetic procedures, happy drugs, etc.): prices vary by provider, so they could map to Charged (typically, only Patient Liability charged by providers).
FYI, if a claim includes both reimbursed and non-reimbursed items, the patient liability may show up under both Allowed and Charged at the same time.

That said, there may also be a case for treating Korea as an exception and mapping these non-reimbursed items to Allowed instead, since the patient’s liability is exactly what has been charged.

I’m not suggesting any change to the OMOP conventions. Just noting how the Korean situation might fit and wondering if this interpretation makes sense. Feedback would be very welcome.

Hello Kyungseon ( @kyseonchoi ),

It’s wonderful to connect with a fellow health economist, especially one bringing such a crucial international perspective to the HEVA workgroup! Welcome.

Your insights into the Korean National Health Insurance (NHIS) system are exactly what we need to ensure these conventions are truly global. I also have a background in the payer space—working closely with actuaries on benefit design and payment models in the US—but I lack direct experience with the Korean system. These discussions often become too US-centric, which I am committed to avoiding. Your expertise here is invaluable, and I am eager to collaborate.

You raised excellent questions about how the new THEMIS conventions can accurately capture the nuances of the NHIS, specifically the distinction between Geupyeo (급여, Reimbursed) and Bi-geupyeo (비급여, Non-reimbursed). I reviewed your interpretation against the latest THEMIS guidance. I believe your understanding aligns perfectly with the core principle of Semantic Accuracy (Section 2.1), and the proposed mapping strategy robustly fits the Korean system within the existing framework.

Here is a detailed breakdown of how the Korean system aligns with the conventions:

1. Reimbursed Items (급여 - Geupyeo)

For Geupyeo items, your observation is spot on:

Reimbursed (급여) items: prices are fixed by the government, so they seem to align well with Allowed Amount (Insurer Payment + Patient Liability).

This is precisely correct. The government-fixed price (the Sugka or 수가) functions exactly as an “International Tariff” as defined in the THEMIS guidance on International Context (Section 4.1).

The convention explicitly states: “CRITICAL RULE: International Tariffs are semantically equivalent to the US ‘Allowed Amount,’ not the ‘Charged’ amount.”

Mapping Confirmation: The government-set tariff must be mapped to the cost_concept_id for Allowed (Detail: 32000). This accurately captures the total recognized economic value of the service, comprising both the NHIS payment (공단부담금) and the patient’s statutory cost-sharing (본인부담금).

2. Non-Reimbursed Items (비급여 - Bi-geupyeo)

This scenario requires more nuance, as Bi-geupyeo items function as “self-pay” scenarios where the provider sets the price and the patient is 100% responsible. You hypothesized:

That said, there may also be a case for treating Korea as an exception and mapping these non-reimbursed items to Allowed instead, since the patient’s liability is exactly what has been charged.

Your intuition here is correct. Importantly, this is not an exception but rather a precise application of the OHDSI definitions (Section 2.1):

  • Charge: The amount a provider asks for.
  • Allowed: The amount the provider contractually agrees to accept as payment in full.

In the case of 비급여 items, there is no third-party negotiation or government tariff. The “contract” is directly between the provider and the patient. Therefore, the concepts converge: the amount the provider asks for (Charge) is the amount they agree to accept as payment in full (Allowed).

If we only mapped this to Charged, we would fail to capture the finalized economic value of the service, which is essential for health economic analysis.

Recommended ETL Implementation for 비급여

When concepts converge (Charge = Allowed = Patient Liability), the THEMIS convention on Long-Form Cost Representation (Convention 2, Section 3.2) provides the ideal solution. To maintain maximum semantic clarity, we must populate all applicable concepts in the COST table, even if the values are identical.

For a 비급여 item priced at ₩100,000, the ETL should generate the following rows:

cost_event_id cost_concept_id (Name) cost (Value) Rationale
1001 31995 (Charged, Detail) 100,000 The provider’s asking price.
1001 32000 (Allowed, Detail) 100,000 The amount accepted as payment in full.
1001 32421 (Patient liability, Detail) 100,000 The patient is 100% responsible.

This approach ensures full transparency and analytical readiness.

3. Mixed Claims (급여 and 비급여 together)

You correctly identified the complexity of mixed claims:

FYI, if a claim includes both reimbursed and non-reimbursed items, the patient liability may show up under both Allowed and Charged at the same time.

This highlights why adhering to the principle of Granularity (Section 3.7) and utilizing VISIT_DETAIL (Section 3.1.1) is critical. Costs must be mapped at the Detail level (line item) rather than the Summary level (claim header).

This allows analysts to clearly distinguish the financial mechanics of the 급여 components (where Allowed is government-set and liability is shared) from the 비급여 components (where Charge=Allowed and liability is entirely the patient’s) within the same encounter.

Summary and Collaboration

In summary, the proposed approach robustly integrates the Korean NHIS structure into the OMOP conventions without requiring special handling or exceptions for Korea:

  1. Map government-set tariffs (수가) for 급여 items to Allowed.
  2. For 비급여 items, recognize the convergence of concepts and map the price to Charged, Allowed, AND Patient Liability using the long-form representation.

This approach ensures the actuarial-grade precision required for robust health economic research across the OHDSI network.

Kyungseon, does this interpretation align with your understanding of the system’s mechanics?

Thank you again for bringing this forward. Contributions like yours are essential for establishing robust international standards. I am very excited to work together on this.

Best regards,

Gowtham

(P.S. I utilized some LLM assistance to help structure these thoughts based on the THEMIS document, blending those ideas with my own review. Please correct any misunderstandings or point out anything that needs further clarification!)

1 Like

Hi Gowtham,

This looks perfect to me. Recording both Charged and Allowed for non-reimbursement items makes sense, and I agree with the long-form representation approach. I also reviewed the other points you outlined and they all look great for a national reimbursement system in Korea.

Really appreciate your thoughtful summary and the way you framed this. It is very clear and constructive. I’m excited to continue working together on this.

Best regards,
Kyungseon

1 Like

I propose working on a paper on how we can model cost from across the world into one format. Anyone interested in collaborating?

If there are specific areas where my input would be useful, I’d be glad to help.

1 Like

Hello everyone,

The Health Economics and Value Assessment (HEVA) workgroup is excited to announce that our meeting today will focus on a major new development for the OHDSI community.

As many of you know from our main workgroup thread (Workgroup: Health Economics Value Assessment (HEVA)), we have made significant progress on the foundational standards for economic analysis. The HEVA workgroup has approved and shared the proposed CDM v5.5 with the OMOP CDM working group, and the corresponding ETL/Themis conventions are now being vetted by the OHDSI Themis workgroup.

With these standards taking shape, we are excited to turn our focus to the next piece of the puzzle: OHDSI’s first standardized software for cost and utilization analysis.

This new R package, developed by Gowtham A Rao, MD, PhD, and Alexander Alexeyuk, MD, is designed to provide our community with a robust, reproducible, and scalable tool for health economic research.

In today’s meeting, our agenda will be:

  • Gaurav Dravida will provide opening remarks and context for this initiative.
  • Alexander Alexeyuk will then lead a detailed presentation of the software package, its architecture, and its intended integration with the broader HADES ecosystem.

The alpha version of this work is now available for community review, and we are eager for your feedback and collaboration. You can access the repository here:OHDSI/CostUtilization (alpha_release branch)

We look forward to a productive and collaborative discussion with all of you. Please join us for the meeting today.

Thank you,

Gowtham A Rao & Gaurav Dravida HEVA Workgroup Leads

Meeting link

Video uploaded to https://youtu.be/9Ui-ivff640
MS teams link OHDSI tenant 20251001_third_meeting

Introducing the CostUtilization Package.pdf (239.1 KB)

The OHDSI Health Economics & Value Assessment (HEVA) workgroup introduced the alpha release of the CostUtilization R package, a tool designed to standardize health economic analysis across the network. The presentation detailed the package’s vision to resolve methodological opacity and integrate economic outcomes into standard HADES pipelines. Lead developer Alexander Aleksiayuk @alexanderalexeyuk demonstrated its visit-centric architecture, HADES-conformant structure, and advanced capabilities like micro-costing. The primary blockers to a full release are the pending community adoption of the OMOP CDM v5.5 and finalization of Themis data conventions . The workgroup made an urgent call for community partners to validate the package with real-world data.

Topics

  1. Workgroup Relaunch and Objectives
  2. Challenges and Vision for Standardized Analysis
  3. CostUtilization Package: Architecture and Design
  4. Core Functionality and Analytical Workflow
  5. Advanced Use Cases: Micro-Costing and Unit Cost Analysis
  6. Testing Framework, Dependencies, and Path Forward
  7. Conclusion and Call for Collaboration

Detailed Topic Analysis

1. Workgroup Relaunch and Objectives

Gowtham Rao @Gowtham_Rao opened the meeting by reintroducing the Health Economics & Value Assessment (HEVA) Workgroup, which has been restarted after a five-year hiatus. He explained that the group aims to better support the Health Economics and Outcomes Research (HEOR) community, which has historically been underserved within OHDSI. The workgroup has consolidated efforts from 2017-2018 related to data standardization, ETL conventions, and vocabulary to tackle the third pillar of the OHDSI mission: standardized software. @Gowtham_Rao positioned the day’s topic, the new CostUtilization R package, as a key milestone in achieving this goal. The meeting also served as a welcome for new members, including Grace Lomax, a recent discoverer of OHDSI; Saba Sohail, a PhD student at Johnson & Johnson interested in health economics; and Shishir Shakya, an assistant professor of economics. @Gowtham_Rao emphasized the collaborative nature of the community, stating, "you’re right in front and center. No corners in the OHDSI. Everybody is equal."

  • Risks / Implications: The success of the relaunched workgroup depends on sustained community engagement and the ability to convert past groundwork into tangible, adopted tools.

2. Challenges and Vision for Standardized Analysis

@alexanderalexeyuk began his presentation by outlining the primary challenges that necessitate a standardized tool for health economic analysis in OHDSI. He pointed to three core issues: methodological opacity, where studies rely on ad-hoc SQL that hinders reproducibility; semantic ambiguity from inconsistent definitions of terms like “cost,” “charge,” and “payment”; and an integration gap that prevents economic outcomes from being used in standard OHDSI packages like PatientLevelPrediction and CohortMethod. The vision for the CostUtilization package is to address these weaknesses by enabling “actuarial-quality” economic analysis at scale. The goal is to provide a HADES-conformant tool that transforms health economics “from a siloed activity into a standard component of OHDSI evidence generation.”

  • Risks / Implications: The package’s vision is ambitious and relies on the community agreeing to and implementing new data standards, which has historically been a slow process.

3. CostUtilization Package: Architecture and Design

@alexanderalexeyuk described the technical architecture of the CostUtilization package, emphasizing its adherence to HADES standards. The package is structured with user-facing R functions and a core analytical engine driven by a large, parameterized SQL script located in the inst/sql/ directory. A key design decision was to develop it as a standalone package rather than integrating it directly into FeatureExtraction. @alexanderalexeyuk explained this was necessary because FeatureExtraction uses “hyper-parameterized” SQL for many simple clinical features, whereas the new package requires a more “monolithic, ‘opinionated’ SQL optimized for the complexities of the normalized COST table.” To ensure seamless integration with the broader OHDSI ecosystem, the package is designed so its output can be formatted as a standard covariateData object, making it compatible with downstream analytical tools like CohortMethod and PatientLevelPrediction.

  • Risks / Implications: The standalone nature of the package, while technically necessary, may create an additional step for analysts. The success of its integration strategy hinges on the ability to perfectly mimic the covariateData object structure.

4. Core Functionality and Analytical Workflow

The package’s analytical framework is strictly visit-centric, meaning all costs must be anchored to a clinical encounter (visit_occurrence_id) to be considered interpretable. The workflow begins with defining a cohort, after which the package calculates person-time at risk (the denominator) by carefully censoring time based on the OBSERVATION_PERIOD table. Alexander highlighted this as a crucial improvement over FeatureExtraction, which he noted “never care about observation period,” potentially creating a “huge bias” if a patient’s observation window is shorter than the analysis window. The package then identifies relevant encounters and links them to the COST table to calculate the numerator, allowing for standardized rates like cost per-member-per-month (PMPM). In response to a question from @ramvarma about enrollment, @Gowtham_Rao clarified that the denominator is calculated dynamically by constraining the analysis window (e.g., one year from cohort start date) to each patient’s observation period, ensuring accuracy.

  • Risks / Implications: The strict visit-centric approach means that any cost data not properly linked to a visit in the source CDM will be excluded, potentially undercounting total costs if ETL quality is poor.

5. Advanced Use Cases: Micro-Costing and Unit Cost Analysis

Alexander detailed the package’s capability for advanced analyses, specifically “micro-costing,” which aims to isolate the line-item cost of a specific intervention (e.g., a chemotherapy drug) rather than the cost of the entire visit. This functionality is essential for detailed economic evaluations like Health Technology Assessments (HTA). However, this feature comes with significant prerequisites: the source data must have the VISIT_DETAIL table populated, and costs must be attributed at the line level using visit_detail_id. The analytical logic shifts from VISIT_OCCURRENCE to VISIT_DETAIL, requiring a more granular data structure than many CDMs currently possess. Following up on this, @ramvarma inquired about calculating unit cost versus utilization to understand cost drivers. @Gowtham_Rao confirmed this is part of the micro-costing roadmap, explaining that by generating both utilization counts and PMPM, analysts could infer whether cost increases are driven by price or volume.

  • Risks / Implications: The micro-costing feature is entirely dependent on the availability of high-quality, granular VISIT_DETAIL data, which is not yet common across the OHDSI network. Without this data, a key use case for HTA cannot be realized.

6. Testing Framework, Dependencies, and Path Forward

A significant challenge for the package is the lack of a standardized OHDSI test dataset that includes cost data or conforms to the proposed CDM v5.5. To overcome this, the team developed an innovative testing framework that “dynamically generates a compliant test environment.” The test suite programmatically creates the necessary COST table structure within the Eunomia dataset, synthesizes and injects linked cost data, and then executes unit tests. Despite this progress, the package’s adoption is blocked by two critical external dependencies: the formal approval and adoption of the OMOP CDM v5.5 normalized COST table structure, and the finalization of Themis conventions for standardizing the semantic meaning of cost data. @shishirshakya question about installation difficulties highlighted its alpha status, with Gowtham explaining it must be installed from a specific development branch.

  • Risks / Implications: The package’s utility is currently theoretical. Without community-wide adoption of CDM v5.5 and Themis, the tool cannot be validated on real-world data or promoted for general use. This makes finding early adopter partners critical.

7. Conclusion and Call for Collaboration

@Gowtham_Rao concluded the meeting by summarizing the workgroup’s rapid progress in re-establishing itself and producing the alpha release of the CostUtilization package, calling it a “very high bar” that the team successfully met before the upcoming OHDSI Global Symposium. He congratulated @gdravida09 and @alexanderalexeyuk for their work. The final message was a strong call to action for the community. @ramvarma offered to explore using real-world data from sources like HealthVerity and Optum, which @Gowtham_Rao welcomed enthusiastically, noting that the CDM Workgroup needs to see real use cases to accelerate the standardization process. He urged attendees of the symposium to meet and discuss the project further, aiming to build partnerships for testing and validation.

  • Risks / Implications: The momentum generated by this alpha release could be lost if the workgroup cannot secure partners for real-world data validation in the near future. The OHDSI symposium represents a key opportunity to build these crucial collaborations.

Q&A Session

Question (@ramvarma ): Are you also looking at unit cost metrics? For instance, when PMPM costs increase, it can be driven by either higher utilization of a service or an increase in the unit cost of that service. Is the package able to distinguish between those drivers?

Answer (@Gowtham_Rao ): Yes, that capability is part of the micro-costing roadmap. The package is designed to calculate both utilization counts and PMPM costs. By trending both metrics, an analyst could infer the driver; for example, if utilization counts remain stable while the PMPM goes up, it would suggest an increase in the unit cost. However, this advanced functionality is not fully implemented because it requires granular, line-level data in the VISIT_DETAIL table, which is not yet available in our testing environments.


Question (@shishirshakya ): I’m having trouble installing the package from GitHub; it’s giving me a “Not Found” error. Is it available yet?

Answer (@Gowtham_Rao ): The package is in an alpha release stage, so you have to install it from a specific development branch, not the main branch. To install it correctly, you need to add the ref argument to the installation command, specifying the “Alpha release” branch.


Question: Is the primary goal of this package to generate summary data at the person or visit level?

Answer (@alexanderalexeyuk & @Gowtham_Rao ): The package can do both. It has an aggregate setting that controls the output. If you set aggregate = TRUE, the package will return population-level summary statistics, including mean, median, and percentiles. If you set aggregate = FALSE, it will output clean, person-level data. This person-level data can then be used for more complex modeling or characterization studies, allowing analysts to apply any method that uses a continuous variable as either a feature or an outcome.