OHDSI Home | Forums | Wiki | Github

Proposal for Cost Table Adjustment

Yes, if we sum up across cost-concepts, then we will be double counting.

Total cost may be argued to be a classification concepts of the components of the concept. Total cost may also be argued to be a value available in source data, and so it is a ‘S’ standard concept id not a ‘C’ classification concept-id. Concept’s can be used in a normalized cost table.

Defining total-cost as a classification concept is next to impossible. The reason is because, what components lead to total cost is highly variable and depends on the level of granularity in source data, definitions used in the source data, or the definitions used by the researcher. e.g. Is total cost ingredient+dispensing fee, or ingredient+dispensing+discount?

The proposal uses cost_source_concept_id and cost_concept_id - that allows for some form of standardization. During analysis time - it is up to the investigator to define what set of concep_id’s should be rolled up to total-cost. If we formalize a classification-concept and standard-cost concepts - then it will force standardization.

Just to add a small comment or two on this. The cost table was primarily designed to store the information available in most commercially available claims data, particularly Medicare data. The wide layout makes it easier to output results for analytics, and is particularly easy for humans to use. One thing we have seen in our data is that the total cost provided in the data does not always equal the sum of the costs in the row. In some global sense, it always adds up. but the view that exists in claims data is not complete. Payers have much more granularity than researchers. In their data, it has to add up (I hope).

The wide vs. tall debate is one that we have had several times here at Outcomes Insights. We ended up keeping both, and separating them into two cost tables (payer reimbursement table and cost table). There are many, many costs that organizations have, and the tall cost table, with type information, is much more flexible. The wide payer reimbursement table seemed more amenable to standard analytics. Others may disagree, of course.

Also, the use cases in research are quite varied. The simplest use economic case is simply to average costs by resource type (e.g., the average cost of office visits in 2013). Many published economic studies look at the total (cumulative) reimbursed amount as a function of patient characteristics and/or time. There is a need to have the date of service in order to be able to inflate the cost to the desired year of the analysis. More sophisticated analyses also need location to incorporate geographic inflation factors. In short, costs almost require their own “CDM”.

I can’t comment on what payers want to do, but from what I gather from reading some of the posts on this topic, they are generally quite different. Most likely this has to do with payers having much more detail than most researchers. And also, payers look at things internally without regard to publishing the results.

I guess this is no longer a short note, but we are fully supportive of improving the tables for more use cases. @jenniferduryea is not available for the next week or so, but she will chime in if she has other things to add.

@Gowtham_Rao - Random question, would there be a TYPE_CONCEPT_ID for “total cost” for sources that provide that?

There is a cost_type_concept_id already in the cost table. Is this what you are referring to Erica?

I know there is a COST_TYPE_CONCEPT_ID, but don’t we need one for “total cost” for that TYPE field. Looking at your list I don’t see something for this “total” idea. Something like “total charged by the provider” and “total charged to the patient” would be valuable for PREMIER. Maybe a “total visit cost” I think we need that for Truven and possibly Optum. I would need a cost person to help me out here. I feel like we are missing items in the list below.

  • Copayment amount
  • Coinsurance amount
  • Charged by the provider
  • Recovered by the provider
  • Allowed by the primary payer
  • Paid by the primary payer
  • Allowed by the secondary payer
  • Paid by the secondary payer
  • Allowed by all payers
  • Paid by all payers
  • Charged to the patient
  • Paid by the patient total out of pocket
  • Paid by the patient towards co-insurance
  • Paid by the patient towards copay
  • Paid by the patient towards deductible
  • Fee for pharmacy dispensing
  • Cost of pharmacy ingredient
  • Average Wholesale Price amount

@jenniferduryea, @jweave17 or @clairblacketer help me out here?

I agree with @ericaVoss . A “total cost”, “total reimbursed”, “total charged” would be useful. I don’t think we need to specify “by visit” since you can assign the cost to a visit record using the cost_event_id and domain variables. I think we need to duplicate these types to include line item costs and then “total charged” costs. So if the data has line item costs as well as the total cost for the entire claim, you can store both sets of these values (there are use cases that show the total cost for the entire claim is not merely the sum of the line item costs per service on the claim). That might help the analyst detangle what costs go with what service/day.

@ericaVoss @jenniferduryea

@ericaVoss - following up on our conversation at the symposium on the topic of adding ‘total-cost’. Almost all cost type elements above maybe either a ‘total’ or a ‘component’. e.g. the source data may have several ‘component’ charges to the same patient for the same visit, and there maybe a ‘total’. They are all ‘charged to the patient’. Instead of creating concepts with confusing long names like

‘Component charged to the patient’
‘Total charged to the patient’

We talked about using the concept_class_id to handle this
‘charged to the patient’ with two concept_class_id’s - ‘Component’ and ‘Total’.

So for every new cost concept_id maintained by OHDSI - we will have a component and a total concept_class_id.

This we hope will reduce the risk of someone adding up all the ‘Total charges to the patient’ and double counting both the ‘component’ and the ‘total’, allow for standardized analytics and keep it clean.

Also @ericaVoss - the proposal is to use new fields ‘cost_concept_id’ and ‘cost_source_concept_id’ to hold the concept of the cost. Not cost_type_concept_id. Cost_type_concept_id is a pre-existing field in the cost table ’ provenance or the source of the COST data: Calculated from insurance claim information, provider revenue, calculated from cost-to-charge ratio, reported from accounting database, etc.’ @clairblacketer has to change the original post#1 @clairblacketer - maybe we could delete some content from that post to avoid confusion.

One approach (probably should be the standard way to do it) to differentiate between line-item and the summary-item is to use the new visit_detail table. In claims world, visit_detail is the detail line item of the claim, and visit_occurrence is the summary-header claim. In use, Visit_detail cannot exist without a corresponding record in the visit_occurrence, and visit_occurrence is the required parent of the visit_detail. The record in the visit_detail can then be linked to the cost-table to get the detail/component/line cost - while the header/visit_occurrence would contain the ‘total’ cost.

@Gowtham_Rao I hear what you’re saying, and the visit_occurrence and cost tables are related, to a certain extent.

For line-level costs, the cost table allows you to assign costs to any record in the CDM. So you can assign a line-level cost directly to a procedure_occurrence record. No controversy. Easy for physician claims. And the visit_detail information is not needed.

However, you run into issues with claim-level or aggregated costs. If visit_occurrence records are assigned at the claim-level (as you propose above), then the aggregated or “total” cost could be assigned to the visit_occurrence record. Easy. And the visit_detail table is not needed.

However, since there is some controversy as to how to create visit_occurrence records in different datasets (one visit_occurrence_id could contain information from multiple medical claims as in one hospitalization; or, one visit_occurrence_id could contain only a part of a claim, as in creating ER and Inpatient visits from one inpatient facility claim), a visit_occurrence_id may not represent the aggregated cost. So at this point, the visit_detail table might be useful, in that the visit_occurrence_id represents multiple claims (i.e. a patient’s hospitalization) and then the visit_detail table represents the claims header - where aggregate costs can now be assigned. Then assign line-item costs to those line items in the procedure_occurrence, device, drug_exposure, etc. tables.

In reviewing this “fix” in incorporating the visit_detail table, I’m unsure how analysts will understand when to include line-item costs or aggregate costs or both. In health economics research, analysts are most interested in how much the payer paid for services to understand burden. It seems they would need to know to look at the line-item paid amounts for physician (HCFA) claims, but then the aggregated or “total paid” amounts for facility (UB04) claims. Since, in general, researchers are not familiar with the differences between claim types, I start to get concerned about how easily the cost table can be understood and used.

Waiting for @Gowtham_Rao to answer what @jenniferduryea said (which I agree 100% with).

1 Like

@Christian_Reich the bottom line is, to do health-economic analysis, you really need to know what you are doing. There are so many use-cases and so many variations. We think the cost-table as proposed can address a lot of use-cases, but not all. Also, if you dont know what you are doing - you can mess up.

Overtime - we can put guard rails (using Themis, constrained vocabulary etc). To put the guard rails, we have to first understand and burn our fingers a few time. As we learn, we will make these tables available as part of standard OHDSI tools like Atlas, and the R-packages.

Current state – these tables are rarely used and most people dont know how to use it. This proposal is just the beginning of making them available in standard toolkit - in future we will need a workgroup just for health-economic analysis.

So, we need a specific use case (or use cases) to use to make these decisions. Perhaps that is the best way to start. Then we can evaluate proposals against those use cases.

I don’t know about the controversy - but if the source-data has both claim header and claim-line-detail, then I would say – ETL the claim header to visit_occurrence, and ETL the claim detail to visit_detail. (ensure data lineage/provenance to the source data).

If the source data has costs at claim header, then the cost.cost_event_table_concept_id would be the concept_id of the visit_occurrence table, and the cost.cost_event_id will be the visit_occurrence.visit_occurrence_id. If your source data has cost at the line-item-detail record, you could map that cost.cost_event_id to visit.visit_detail or procedure_occurrence.procedure_occurrence_id or both - depending on how your source data is. Visit_detail is optional - you dont have to use it.

To understand is cost is line-item or summary/total: If the cost_event_id points to a visit_detail or procedure_occurrence then it is a component of the cost. If the cost_event_id points to visit_occurrence or drug_occurrence it is a total/header cost. Plus - we will use the appropriate concept_id’s – that will clearly distinguish between component cost and total cost.

See concept’s proposed as part of the proposal
Proposed concept_id
Proposal on github

Note the use of concept_class to distinguish between cost’s that may be total or component.

The proposed concept_id concepts, use the concept_class to unambiguously differentiate between header-cost and component cost. @ericaVoss and I talked about this approach at the symposium - because this would put a lot of guardrails preventing the confusion of summing over the total/header costs and component/line-detail costs…

I think the controversy you are referring to is that the standard concepts available for use with visit_concept_id is not granular enough. That is out of scope for this proposal, but we need to address creating better concept_id to support claims data. This however needs to be done separately.

Is that general consensus? VISIT_OCCURRENCE=header and VISIT_DETAIL=detail? You get only one header for an entire hospital visit? Not two or more? Never? And VISIT_DETAIL is is not aggregated the way the provider happened to be organized commercially and split up the claims?

I feel like you are alluding to - me! :smile:

Do you have them listed somewhere? That would help.

Just a few simple use cases. I am using reimbursed amount to be cost, but that might vary by dataset.

Calculate the annual average cost for patients in a cohort
Calculate the cumulative cost for patients in a cohort
Estimate the effects of covariates on the cumulative cost in a cohort
Calculate the annualized cost (cost / observation period duration during year)
Calculate the cost of specific utilization per person: drugs, hospitalizations, office visits, emergency room visits, etc

Some of the above might be by subgroup (e.g., diabetes, cancer, osteoporosis, etc)

@Mark_Danese:

Sounds all like cumulative stuff, which means, you sum up everything that comes after an index date. Correct? If so, why all the hassle with the header and detail?

If we focus this discussion to claims data I would say this should be the general consensus (among payers) - definitely a US medical claims weighted opinion. I have no clue about Germany. We discussed this when the CDM WG arbitrated this very topic, when I had proposed the visit_detail table.

Not really. This was something @ericaVoss and I discussed at the 2017 OHDSI symposium. Then at the CDM WG meeting in October and Novemeber, we discussed this same topic, and we all agreed that 'if you dont know what you are doing - you can mess up". Remember @Patrick_Ryan saying something similar with respect to using data about conditions or procedures - if you dont have clinical knowledge, you cant do meaningful analysis; if you dont have health economics knowledge, you cant do meaningful analysis.

Let’s focus on data representation ETL use-case and later an analytic use case. The below would be a common and generalizable scenario A person has a 3 night inpatient stay in a University hospital for appendicitis. The sequence of events are

  1. Ambulance ride to emergency room of the hospital. Day 1. Seen by ER physician.
  2. The emergency room admission, stabilization and transfer/admission to hospital. Day 1. Seen by admitting physician.
  3. Hospital takes him to operating room – appendicetomy. Seen by Surgeon. Day 1.
  4. Transfer to Intensive care unit. Seen by Surgeon, ICU doctor, admitting physician, Infectious Disease Physician. Day 1 and Day 2.
  5. Transfer to step-down unit Day 2. Seen by Rehab team including physical therapist, admitting physician, Infectious Disease Doctor.
  6. Transfer to Floor Day 3. Admitting physician, Infectious Disease Doctor.
  7. Discharge to home Day 4 morning. Discharged by admitting physician.

How is this billed? Lets keep this simple, and tolerate some lack of precision to ensure simplicity. There will be two types of claims and there will many billed claims.In US, the two main types of claims are

  1. Professional claims 5010 X222 or CMS 1500 - Billed by a human like the physician or nurse practitioner or physical therapist.
  2. Institutional Facility claims 5010 X223 or CMS 1450 (see page 9 to 14, and 18). Billed by the organization - generally the hospital.

The number of bills are dependent on number of legal billable entities the person encountered during the 3 night stay. If the ER and Hospital are two different legal entities, then there will be two institutional claims. Generally, in the case above – there will be one Institutional Facility claim because most University hospitals will have their own ER’s. Each clinician who saw the person will file their own claim. Infact there may be a different claim for each day (sometimes each encounter ) the clinician had with the patient during the stay. So, the surgeon may have two claims - one for the surgery and second for the ICU visit. The ER physician would have only one claim. The Admitting physician probably 4 claims. The dates for each of these claims may be different e.g. the Infectious disease physicians first date of service is actually day 2. Ambulance would be a professional claim by itself.

The two claim types above, will have their own header (summary), and detail (line) in a parent-child hierarchy. The line is the detail. For Institutional claim, the details like the ER, OR, ICU, Rehab, Floor etc are identified by Revenue code. The detail of the professional claims would have EnM CPT codes like http://www.ohdsi.org/web/atlas/#/concept/2514424

I am sorry - can you elaborate?

How are the costs linked to the claims? Charges are filed by the providers. Allowed, Paid, Coinsurance, Copay, etc. are determined by the payer. So the charge has no relation to contract, allowed/paid/coinsurance/copay are dependent on the contractual relationship between payer and provider. Charges are ‘free for all’ – charges may come at header level or line level. e.g. a hospital may charge for the entire stay, or may charge for each of the ER, ICU, OR, Rehab, Floor etc. The contractually determined costs are attached to the claim by the payer thru the process of adjudication. Pre-adjudicated claims may not have this cost information.

Adjudication depends on what the contract says. If the contract is a pure traditional fee-for-service, every line item will be adjudicated based on contracted rates and the sum goes to the summary/header. If there is no contract, then the payer may use some form of average rates or usual-and-customary rates. Newer payment methods are changing it. DRG is one form of global payment, where the entire care may be covered by one DRG like http://www.ohdsi.org/web/atlas/#/concept/38001158 . In this case, adjudication may happen at the header-level and not the line level.

So, yes to @Mark_Danese comment below. It add’s up

– but you need to know how the adjudication of the claim happened. The adjudication depends on benefit design and contractual relationship – OMOP CDM does not capture those pieces of information, and these types of information are not available. I have tried to address some of those in the recently accepted payer_plan_period . If we really need to balance line records to header records, you have to know benefit design of the health plan. The plan_concept_id, plan_source_value and plan_source_concept_id is our first step towards that.

These are classic payer business intelligence use cases. They are analytic use cases, not data modeling/ETL/data representation use case as above. They are easy to do – with the below steps.

  1. Build a cohort.
  2. Thru temporal association - find the records in cost table for the person in the cohort.
  3. Limit the records in costs table based on the cost_concept_id – do we want allowed, paid, charged, coinsurance – they are all concept_id in the proposed concept list and used to populate cost_concept_id
  4. If we want aggregate by total cost – then use concept_class = ‘Total’ and if we want to aggregate by component cost – then use concept_class ‘Component’ from the same concept_id
  5. Decide if you want to aggregate person level or cohort level. Do you want to aggregate temporally? All of thee are nicely supported by Feature Extraction 2 of @schuemie
  6. If you want to subset the cost table by some condition, or procedure or other – use the cost_event_id and cost_event_table_concept_id as proposed here Use the concept hierarchy.
  7. Use temporal for cumulative costs
  8. To estimate the effects of covariate – use featureextraction 2 and plp package.
  9. To calculate annualize cost – calculate the timeinccohort and costs using feature extraction 2 – and then divide (total cost/total days)*30 to get per person per month cost, or *365 to get per person per year cost!
  10. Cost specific of utilization – easy – same – subset cost records by linking them to drug, visit etc.

Because they are parent and child records. You don’t double count them.

Also – sometimes, depending on the data quality of the source, sum of child may not always add up to parent. Infact, if you are secondary data user – you may not know what is more accurate - detail or header. Thats a source data problem, not a OMOP CDM problem. So you use both, and don’t mix them up

I am not against it, I wasn’t clear that’s what the consensus was (and neither seemed @jenniferduryea). So, again, not arguing, but asking: If we do that and tell folks “if you have claims put the header into OCCURRENCE and the detail into DETAIL” will this work well? Will @MPhilofsky’s use cases work, where she wants to study patients going from ER to ICU to Rehab and the merits of the various variants?

The way I understand you is that the institutional claims would feed the two VISIT tables, and the individual claims would be placed inside there, correct?

We may all benefit from a little Tutorial here, @Gowtham_Rao. But this is already a good summary.

How do you use both? Right now, in the OMOP CDM, there is no way to indicate “don’t double count this cost record”. Or is there?

We should try to for sure. I said, there is general consensus among payers, not OHDSI community.

Thats the point - visit_detail is trying to unify the claims data world and EHR data world. Take the above scenario we discussed about appendicitis.
Claims data – We will get many US claims, with each claim having many details. Each US claim should usually have only one place of service – inpatient hospital, ER, etc. Each of those claims (and its details) may have different dates, different providers, different conditions and different procedures – but they are all part of the same episode of care that started at the ambulance and ended with discharge. The episode needs to be inferred during analytic time using temporal association of person level records. Visit_occurrence should have provenance/lineage to claim summary/header, while visit_detail should have provenance/lineage to claim detail.

EHR data – usually one episode of care has one medical record number for the patient from beginning to end. Every thing else is a detail - with its own place of service, condition, procedure, dates etc. There is no need to infer the episode using temporal association, because the source data links the episode using the same medical record number. The medical record number ties together all the details starting from ambulance ride to discharge. If there is no medical record number, then we have a problem (similar to claims data), and in those cases visit_occurrence may need to be derived using an algorithm during ETL (these are tough choices, and hopefully rate choices), while visit_detail will have provenance to the source data.

Yes - institutional claims header would go to visit_occurrence, institutional claim detail to visit_detail.

Details go to visit_detail, header goes to visit_occurrrence.

Claims: claims summary/header goes to visit_occurrence, claims line/detail to visit_detail.

EHR: medical record number level summary information to visit_occurrence, details to visit_detail. Some of the visit_occurrence in EHR may have to be ‘derived during ETL’.

Because, from visit_occurrence table description

At any one day, there could be more than one visit.
One visit may involve multiple providers, in which case the ETL must specify how a single provider id is selected or leave the provider_id field null.
One visit may involve multiple Care Sites, in which case the ETL must specify how a single care_site id is selected or leave the care_site_id field null.

Visit_detail will capture what visit_occurrence cannot. e.g. you dont have to select a single provider, or care_site in visit_detail.

At the end of the day, ohdsi standard tools like Atlas rely on building era’s based on temporal association and use collapse strategy like era-fy to build cohorts @chris_knoll. So, we can use circe-be to build cohorts using combination of visit_occurrence and visit_detail.

Check out the concept_id’s i am proposing here as part of the cost table proposal - concepts. The use of concept_class unambiguously differentiates between total/summary cost and component cost. Summary cost concepts should be linked only to summary table like drug_occurrence and visit_occurrence, Component/Detail costs should only be linked to detail tables e.g. visit_detail, procedure_occurrence, condition_occurrence. I would also argue that with visit_detail - we should not relate cost table to procedure_occurrence anymore @jenniferduryea - I think this will be a point of contention. I would argue that we join detail cost from cost to visit_detail.visit_detail_id and visit_detail.visit_detail_id, and then to procedure_occurrence.visit_detail_id. The only exception would be when we have costs without visit information (rare and non standard).

The proposed concept’s will put the guard rail. The rule that we should only link the the detail concepts to detail, and summary to summary tables will add guard rails.

t