OHDSI Home | Forums | Wiki | Github

Proposal for a Unified Cost Table

Now that CPT/HCPCS codes can end up representing not only procedures but measurements, specimen, etc, it seems that we still need a good way to track the costs for these CPT/HCPCS codes even if they don’t end up in the procedure_occurrence table.

Instead of generating additional cost tables, each tied to a specific domain, perhaps we create a single, unified cost table to capture costs for all domains. This isn’t actually too hard to pull off.

visit_cost and device_cost share identical columns for tracking costs. Yay.

procedure_cost has the same columns as visit/device_cost plus columns for:

  • revenue_code_concept_id
  • revenue_code_source_value.

drug_cost has the same columns as visit/device_cost plus columns for:

  • ingredient_cost
  • dispensing_fee
  • average_wholesale_price.

In light of that, I propose we group all the cost columns into a single table called “costs” and it looks like this:

To highlight a few columns:

  • cost_event_id (feel free to change the name) stores the ID of the thing the cost is associated to (e.g. procedure_occurrence_id from procedure_occurrence or measurement_id from measurement).
  • domain_concept_id stores the domain which this cost is associated with. It essentially tells us which table is associated with the cost (e.g. concept_id 21 represents the Measurement domain and means this cost record is associated with the measurement table)
  • charge - we added this field because some claims information contains the charge amount. Some datasets only provide charges. It also helps with generating costs using cost to charge ratios.
  • We did NOT bring over the revenue_code* columns from procedure_cost. We assert that Revenue Codes should be represented as procedures.
  • We do bring over the extra columns for drug_cost, which is admittedly a bit awkward.

So, if I want to look up costs for my EKG measurements, I just run this query:
select * from measurements m join cost c on m.measurement_id = c.cost_event_id and c.domain_concept_id = 21 /* 21 - Domain Concept ID for Measurement */ where measurement_concept_id = 2617471 /* concept ID for "Electrocardiogram, routine ecg with 12 leads; tracing only, without interpretation and report, performed as a screening for the initial preventive physical examination"

While it is a bit awkward to have some cost columns that are specific to certain domains (e.g. drug_cost columns), I think it is even more awkward to end up having a cost table for practically every domain considering most of those tables have a virtually identical schema.

But either of those solutions at least allows us to store costs for measurements and specimens. Receiving bills and suffering costs of healthcare are very much a part of the patient experience (at least for those of us in the good ol’ US of A)


1 Like

Hi Ryan: That’s a really interesting thought. I was having the exact same
thought after our call earlier today with the apparent conflicts of
measurements which could have costs. I’m not a health economics expert, so
I’d prefer to defer to you and others who are doing more cost-effectiveness
analyses to determine if the same cost fields can be consistently applied
across all domains (you have identified the field level differences in this
thread, but can we safety assume the conventions populating the tables
should b ethe same as well?). If so, then, the proposal seems like a
reasonable approach to consider.



I love the idea as well. However, it would be a CDM revision. Much bigger deal than to assign domains to concepts in the Standardized Vocabularies. Not sure we should do the latter with the expectation that we can place the cost somewhere, or whether we should assume we are stuck with V5.0 for a bit and Measurements cannot cost anything.

I would put it on a list for CDM v6 suggestion.

In fact, in this forum, we could have a thread for possible things to consider for v6. Once we reach a critical number, we package it as v6. Genomic data is another domain for consideration.

I think a new CDM version every 1.5-3 years may be an optimum evolution speed.

We might have to do V5.1, because whe currently have no place to put cost for Measurements and Observations.

One approach that might work for 5.1 is to use a hybrid of @aguynamedryan’s approach and use the procedure cost table to store observation and measurement costs. To make this work we might have to add another column to indicate the domain from which the procedure cost was generated (e.g., procedure, measurement, or observation). Then we have one “procedure” cost table that supports all 3 domains.

@jenniferduryea and I have finalized our proposal for the Unified Cost Table:

Changes since last time:

  • Added the revenue_code_concept_id and revenue_code_source_value columns back in. Revenue codes seem to best fit in the cost table because:
    • All services rendered in an outpatient facility have a revenue code associated with them. Revenue codes provide additional information about the service rendered.
      • So when an outpatient facility claim is ETL’d, its associated revenue code will be placed in a row in the cost table and that row will refer back to the procedure_occurrence/measurement/etc
    • Additionally, revenue codes can be reported without being associated to a particular service.
      • These revenue codes will also be put into a row in the cost table, but the row will refer back to the visit_occurrence generated for that outpatient facility claim
  • Renamed “average_wholesale_price” to “cost”
    • There are some datasets that provide cost information for services rendered, etc, and it would be nice to capture these costs when available
      • E.g. If a hospital provides cost-to-charge ratios, the product of the charge and cost-to-charge ratio can be stored in the cost column
    • Average Wholesale Price can be still be stored in the “cost” column when a cost row is associated to a drug_exposure

@Patrick_Ryan, the columns present in the original cost tables seem to closely mirror the information that is reported in the ANSI 835 specification, which is, of course, behind a paywall, but if you look at loop 2110 of the PDFs linked on this page, you can see most of the fields associated with payments for services.

All major payers in the US use the ANSI 835 specification to transmit electronic Explanation of Benefits (EOB) information to providers and hospitals. Accordingly, we’re reasonably confident that the cost table we’re proposing will be able to capture cost and payment information from claims data and EHRs.

Cost information, when it is provided, is normally reported from the payer perspective. Accordingly, for most claims-based datasets, any given procedure_occurrence, drug_exposure, visit_occurrence, etc will have only a single associated row in the cost table. If your source dataset has payment information for more than one payer, each service would have a row in the cost table for each source of payment (e.g. payer or patient). E.g. if a patient has primary and secondary insurance and you have two payment records for a service (one record for each payer), you’d put two rows into the cost table for that service.

Lastly, @Christian_Reich, @Vojtech_Huser, @Mark_Danese, we definitely need to implement some sort of fix to the CDMv5 in order to accurately capture cost information for our upcoming ETLs of the CMS data. We intend to implement this unified cost table in a sandbox version of the CDM for those upcoming ETLs.

I like this proposal a lot and fully support implementing it as a sandbox
proof of concept. My main concern is this approach is really coming from a
claims perspective, and I’m not sure whether it will work when the costs
are coming from an ehr or other source. I’m not an expert in this space,
so would want to make sure others in the community who have a vested
interest in using the cdm for health economic analyses have evaluated the
proposal. Would you mind presenting this recommendation at an upcoming
Tuesday community call?


We would love feedback from the community because it would be good to get this right. It is a differentiating feature of our CDM compared to others. @jenniferduryea and I have nearly two decades of combined experience in healthcare IT. We’ve seen how practice management and EHR systems store cost information internally and are intimately familiar with the standards (ANSI 835 and ANSI 837) that payers and providers use to transmit claims and payment information. Also, the company we currently work for, Outcomes Insights, has a lot of experience in health economics (in addition to more traditional epidemiology), so hopefully we have anticipated some relevant use cases that work with the proposed table structure. However, we would love to find other members of the community with similar “boots on the ground” experiences for relevant input into this table.

If you think the community would be interested in covering this proposal during an All Call, we’d be happy to put together a presentation on the proposed table with a deeper dive into costs/medical billing. Jen recently prepared a presentation which covers how payers receive, process, and respond to claims from providers and how that process is reflected in how claims data are aggregated into claims datasets. Our in-house researchers found it very informative and it might give the OHDSI community some context on why the proposed cost table is structured the way it is.

Thank you for the teams work on the cost table. I think the unified cost table is an excellent approach.

In the current approach, i.e. V5.01, we recommend that costs for each event from various domains be represented in various columns; each column representing a cost type. These columns are the fields representing charge, allowed amount, paid_coinsurance, paid_copay, drug_ingredient etc. Obviously, there may be many types of costs, many existing and many upcoming. As our payment or billing models evolve - there will more types of costs in the future.

Would it be feasible to generalize the table (at the same time simplifying the representation) - by adding another field called cost_type_id + keeping only one column for value of the cost. I don’t think there is a standard way to reliably represent ‘cost_types’ - that may make this recommendation difficult. Please see the new version of the [Original spreadsheet][1] with a new tab.

Also - could we discuss more about the reason behind the retention of revenue code in the table. The revenue codes would not be useful for drug costs for example, or even professional claims.

Thank you so much


Short answer to all of your questions: We havn’t had anybody from the provider community to help us figuring out the best solution. So, welcome. We are all coming from the analytical end having to eat whatever these claims databases put on the table. :smile:

Sounds like a good idea, but generally we are staying away from these EAV models, where the data tell you what’s in the data. Instead, we normalize as much as we can, so that each record represents one and only one “thing”, and you don’t have to cobble anything together. Because that kills the performance of the analytics, and it makes counting things very hard. On the other hand, the number of columns in that table is pretty burdening, too. So, any good simplification would help.

Also, remember we don’t want to replicate claims data. We want to understand what the patient paid, and what the payer paid. That’s all. So, one way to simplify is to reduce everything to these two numbers.

Revenue codes: Please kick them out. :smile: But where?

@Christian_Reich! Leave my revenue codes alone! :joy: @Gowtham_Rao revenue codes are retained because they do provide some clinical information about the patient’s visit that submitted ICD-9/ICD-10 codes on facility claims do not provide. Inpatient facility claims are notorious for not providing much detail about the patient’s visit (mostly because they do not submit HCPCS/CPT codes, and ICD-9/ICD-10 procedure coding is minimal/scarce at best). But, based on the revenue codes submitted on the inpatient facility claim, the analyst will be able to see if the patient received dialysis (revenue code 0304 or 080x), received EPO (revenue code 0634 or 0635), or went to the emergency room (revenue code 045x).

Should revenue codes should be stored somewhere else in the CDM? Maybe. They provide clinical information which does not seem to align with the purpose of a “cost” table. But, as I’m sure @Gowtham_Rao is aware, revenue codes are assigned to the cost of the claim and sometimes the procedure code as well (for outpatient facility claims and some inpatient facility claims using revenue codes requiring a procedure code). And it is important to retain the linkage between the procedure code and revenue code (if there is one). So right now, revenue codes stay in the Cost Table because that is how the source data represents revenue codes. But I’m open to suggestions on relocating them if they meet use-case requirements.

Here is a use case for revenue codes (I’m just putting this here in case I have to reference this again): a study wants to determine the cost of emergency room services in the U.S. The analyst will need to look at the amount charged for all physician claims billed with a place of service of “emergency room” (or CMS value 23) and all of the charged amounts with revenue codes 045x. The revenue codes will pick up charges from all patients who visited an ER, regardless of whether they were admitted as an inpatient or not. Basically the revenue codes will look at specific ER charges from inpatient and outpatient facility claims.

Hope that helps @Gowtham_Rao!

1 Like

Thank you. I completely agree with you that Revenue code provide both clinical information and economic/cost information. If we continue to relate this to a billed claim - then Revenue codes are claim-line detail information; not many folks would have access to this level of information.

If they don’t have line level financial information and in the source data line level revenue codes has been transposed to an array of revenue codes at the claim summary level: then the relationship between revenue-code and costs is broken. Revenue code is now only a source of clinical information; storing it in cost table become not valuable because we cannot pivot the summary back into line?

If they do have line level financial information, then yes - revenue code will be a source of clinical and financial information – but then the cost_id would be representing claim line-level detail information.

This is causing , atleast in my head, a confusion around claim-header/summary and claim-line/detail. My fear is - the line vs summary issue in cost-table - will make it very difficult to handle KPIs around health economics such as services/1000, claims/1000, PMPM per inpatient claim

@jenniferduryea, @Gowtham_Rao, friends:

This is perfect. We need to figure this out:

  • The OMOP CDM is patient-based information, not payer-based information.
  • However, we need want to support the use cases Jen mentioned.
  • We want to do it such a way that you needn’t have a PhD in claims data analysis to do those use cases
  • We need to preserve the clinical and cost information.

Now, with the two of you, we have sufficient knowledge on the table to do that. Should we add it to the list of things in the CDM WG? Would be wonderful…

Second it

1 Like

@Gowtham_Rao I have never seen a dataset where the revenue codes are summarized. They are always reported at the line-level. Do you know of a dataset that researchers use where revenue codes are reported in a summarized manner? I would love to know!

You have pointed to a bigger problem in the research community about analysts not understanding the difference between header vs line level information. Not just in regards to revenue codes, but in claims data in general. I’m a huge proponent for separating this information to ensure health economic research can still take place. So when we (@aguynamedryan and I) created the Cost Table, we tried to address this line vs summary issue in some ways, though some people may not know it. For example, you can assign the cost to a procedure_occurrence record (aka line-level information) or at the visit_occurrence record (aka header-level information). For outpatient facility claims where the charge is at the line-level but the payment is at the summary level, you can create separate cost records for all of the charge/revenue codes and assign them to the procedure_occurrence records, while payment cost record is assigned to the visit_occurrence record referencing all of those procedures. Or if you have revenue codes without procedure_occurrence records (as in inpatient facility claims), you can create a cost record for every charge/revenue code and then reference them to the one visit_occurrence record representing the inpatient admission.

I am very passionate about teaching the research community about this difference between line and header information, as I have not personally found any other researcher (besides @Gowtham_Rao ) that seems to grasp this concept without a tutorial from me. I’m not sure if this is the right place to teach people about the intricacies of claims data (maybe in an ETL group?). Suffice to say @Gowtham_Rao, I am very happy to have met you and if you want to talk claims data and claims analysis, I’m here all night! :smiley:

@Christian_Reich I’m not sure there really needs to be any discussion on this, unless there is a dataset that reports the revenue codes in a summary fashion.

You are right - but there is not much of a distinction between payer-based information vs patient-based information.

The lowest unit of analysis per OHDSI mission is to support patient-level predictions using data that is collected centered around a persons care journey over time; the next level is population-level estimation where person-centric data is used at cohort/population level. Payers, providers, policy makers, researchers - all have use cases that are in line with this model and OHDSI’s mission.

Payer-based patient information - i.e. Claims data, is centered around billable service units of care provider to a person. Provider-based patient information - i.e. EHR data, is centered around serviced units of care (billable or not) provided to a person. So - the only difference, is billable or not billable and granularity of the service units + the vocabulary used to document the service units.

These service units, their vocabulary and granularity - are addressed by OHDSI common data model and vocabularies. So really - when we look from concept point of view - we cant really say there is patient-based information vs. payer-based information: they are all clinical concept centered around a person’s care journey!

1 Like