It doesn’t have to be the financial date it can be rather the same date which is used in the record identified with cost_event_id.
The reason is the same as for patient_id. The design principle for all data tables suggests a person_id and a date. This makes searching by iterating over records more efficient.
Ok - we could add service_date to represent the start date of a visit, procedure, condition, specimen or other domain represented in cost table. I don’t know how it will
However, we recently changed from representing dates to datetime. Should we use datetime or date? It is clean for billed_datetime and paid_datetime, since they are new without legacy implications. We dont need _date and can start with _datetime for these two. But what about service_date vs service_datetime vs both?
See proposal
If you sort the events by date within a patient makes finding events more efficient.
Personally, I prefer to use the same structure as in the other tables: required date and optional datetime.
@Gowtham_Rao - great discussion today! I’m hoping we can continue the conversation about pivoting the cost data. I like the pivot idea but I’d like to think more about storing both the summary cost data as well as its parts in the same table. I get the TYPE will help you split it up and regardless of what we decide we just need to be clear. My initial concern is that people aren’t careful about reviewing types and I’m I’m nervous what people might blindly do if summing up costs. But I’m open to hear what other people think.
Also, I would like if @jenniferduryea weighed into this.
Yes, if we sum up across cost-concepts, then we will be double counting.
Total cost may be argued to be a classification concepts of the components of the concept. Total cost may also be argued to be a value available in source data, and so it is a ‘S’ standard concept id not a ‘C’ classification concept-id. Concept’s can be used in a normalized cost table.
Defining total-cost as a classification concept is next to impossible. The reason is because, what components lead to total cost is highly variable and depends on the level of granularity in source data, definitions used in the source data, or the definitions used by the researcher. e.g. Is total cost ingredient+dispensing fee, or ingredient+dispensing+discount?
The proposal uses cost_source_concept_id and cost_concept_id - that allows for some form of standardization. During analysis time - it is up to the investigator to define what set of concep_id’s should be rolled up to total-cost. If we formalize a classification-concept and standard-cost concepts - then it will force standardization.
Just to add a small comment or two on this. The cost table was primarily designed to store the information available in most commercially available claims data, particularly Medicare data. The wide layout makes it easier to output results for analytics, and is particularly easy for humans to use. One thing we have seen in our data is that the total cost provided in the data does not always equal the sum of the costs in the row. In some global sense, it always adds up. but the view that exists in claims data is not complete. Payers have much more granularity than researchers. In their data, it has to add up (I hope).
The wide vs. tall debate is one that we have had several times here at Outcomes Insights. We ended up keeping both, and separating them into two cost tables (payer reimbursement table and cost table). There are many, many costs that organizations have, and the tall cost table, with type information, is much more flexible. The wide payer reimbursement table seemed more amenable to standard analytics. Others may disagree, of course.
Also, the use cases in research are quite varied. The simplest use economic case is simply to average costs by resource type (e.g., the average cost of office visits in 2013). Many published economic studies look at the total (cumulative) reimbursed amount as a function of patient characteristics and/or time. There is a need to have the date of service in order to be able to inflate the cost to the desired year of the analysis. More sophisticated analyses also need location to incorporate geographic inflation factors. In short, costs almost require their own “CDM”.
I can’t comment on what payers want to do, but from what I gather from reading some of the posts on this topic, they are generally quite different. Most likely this has to do with payers having much more detail than most researchers. And also, payers look at things internally without regard to publishing the results.
I guess this is no longer a short note, but we are fully supportive of improving the tables for more use cases. @jenniferduryea is not available for the next week or so, but she will chime in if she has other things to add.
@Gowtham_Rao - Random question, would there be a TYPE_CONCEPT_ID for “total cost” for sources that provide that?
There is a cost_type_concept_id already in the cost table. Is this what you are referring to Erica?
I know there is a COST_TYPE_CONCEPT_ID, but don’t we need one for “total cost” for that TYPE field. Looking at your list I don’t see something for this “total” idea. Something like “total charged by the provider” and “total charged to the patient” would be valuable for PREMIER. Maybe a “total visit cost” I think we need that for Truven and possibly Optum. I would need a cost person to help me out here. I feel like we are missing items in the list below.
- Copayment amount
- Coinsurance amount
- Charged by the provider
- Recovered by the provider
- Allowed by the primary payer
- Paid by the primary payer
- Allowed by the secondary payer
- Paid by the secondary payer
- Allowed by all payers
- Paid by all payers
- Charged to the patient
- Paid by the patient total out of pocket
- Paid by the patient towards co-insurance
- Paid by the patient towards copay
- Paid by the patient towards deductible
- Fee for pharmacy dispensing
- Cost of pharmacy ingredient
- Average Wholesale Price amount
@jenniferduryea, @jweave17 or @clairblacketer help me out here?
I agree with @ericaVoss . A “total cost”, “total reimbursed”, “total charged” would be useful. I don’t think we need to specify “by visit” since you can assign the cost to a visit record using the cost_event_id and domain variables. I think we need to duplicate these types to include line item costs and then “total charged” costs. So if the data has line item costs as well as the total cost for the entire claim, you can store both sets of these values (there are use cases that show the total cost for the entire claim is not merely the sum of the line item costs per service on the claim). That might help the analyst detangle what costs go with what service/day.
@ericaVoss - following up on our conversation at the symposium on the topic of adding ‘total-cost’. Almost all cost type elements above maybe either a ‘total’ or a ‘component’. e.g. the source data may have several ‘component’ charges to the same patient for the same visit, and there maybe a ‘total’. They are all ‘charged to the patient’. Instead of creating concepts with confusing long names like
‘Component charged to the patient’
‘Total charged to the patient’
We talked about using the concept_class_id to handle this
‘charged to the patient’ with two concept_class_id’s - ‘Component’ and ‘Total’.
So for every new cost concept_id maintained by OHDSI - we will have a component and a total concept_class_id.
This we hope will reduce the risk of someone adding up all the ‘Total charges to the patient’ and double counting both the ‘component’ and the ‘total’, allow for standardized analytics and keep it clean.
Also @ericaVoss - the proposal is to use new fields ‘cost_concept_id’ and ‘cost_source_concept_id’ to hold the concept of the cost. Not cost_type_concept_id. Cost_type_concept_id is a pre-existing field in the cost table ’ provenance or the source of the COST data: Calculated from insurance claim information, provider revenue, calculated from cost-to-charge ratio, reported from accounting database, etc.’ @clairblacketer has to change the original post#1 @clairblacketer - maybe we could delete some content from that post to avoid confusion.
One approach (probably should be the standard way to do it) to differentiate between line-item and the summary-item is to use the new visit_detail table. In claims world, visit_detail is the detail line item of the claim, and visit_occurrence is the summary-header claim. In use, Visit_detail cannot exist without a corresponding record in the visit_occurrence, and visit_occurrence is the required parent of the visit_detail. The record in the visit_detail can then be linked to the cost-table to get the detail/component/line cost - while the header/visit_occurrence would contain the ‘total’ cost.
@Gowtham_Rao I hear what you’re saying, and the visit_occurrence and cost tables are related, to a certain extent.
For line-level costs, the cost table allows you to assign costs to any record in the CDM. So you can assign a line-level cost directly to a procedure_occurrence record. No controversy. Easy for physician claims. And the visit_detail information is not needed.
However, you run into issues with claim-level or aggregated costs. If visit_occurrence records are assigned at the claim-level (as you propose above), then the aggregated or “total” cost could be assigned to the visit_occurrence record. Easy. And the visit_detail table is not needed.
However, since there is some controversy as to how to create visit_occurrence records in different datasets (one visit_occurrence_id could contain information from multiple medical claims as in one hospitalization; or, one visit_occurrence_id could contain only a part of a claim, as in creating ER and Inpatient visits from one inpatient facility claim), a visit_occurrence_id may not represent the aggregated cost. So at this point, the visit_detail table might be useful, in that the visit_occurrence_id represents multiple claims (i.e. a patient’s hospitalization) and then the visit_detail table represents the claims header - where aggregate costs can now be assigned. Then assign line-item costs to those line items in the procedure_occurrence, device, drug_exposure, etc. tables.
In reviewing this “fix” in incorporating the visit_detail table, I’m unsure how analysts will understand when to include line-item costs or aggregate costs or both. In health economics research, analysts are most interested in how much the payer paid for services to understand burden. It seems they would need to know to look at the line-item paid amounts for physician (HCFA) claims, but then the aggregated or “total paid” amounts for facility (UB04) claims. Since, in general, researchers are not familiar with the differences between claim types, I start to get concerned about how easily the cost table can be understood and used.
@Christian_Reich the bottom line is, to do health-economic analysis, you really need to know what you are doing. There are so many use-cases and so many variations. We think the cost-table as proposed can address a lot of use-cases, but not all. Also, if you dont know what you are doing - you can mess up.
Overtime - we can put guard rails (using Themis, constrained vocabulary etc). To put the guard rails, we have to first understand and burn our fingers a few time. As we learn, we will make these tables available as part of standard OHDSI tools like Atlas, and the R-packages.
Current state – these tables are rarely used and most people dont know how to use it. This proposal is just the beginning of making them available in standard toolkit - in future we will need a workgroup just for health-economic analysis.
So, we need a specific use case (or use cases) to use to make these decisions. Perhaps that is the best way to start. Then we can evaluate proposals against those use cases.
I don’t know about the controversy - but if the source-data has both claim header and claim-line-detail, then I would say – ETL the claim header to visit_occurrence, and ETL the claim detail to visit_detail. (ensure data lineage/provenance to the source data).
If the source data has costs at claim header, then the cost.cost_event_table_concept_id would be the concept_id of the visit_occurrence table, and the cost.cost_event_id will be the visit_occurrence.visit_occurrence_id. If your source data has cost at the line-item-detail record, you could map that cost.cost_event_id to visit.visit_detail or procedure_occurrence.procedure_occurrence_id or both - depending on how your source data is. Visit_detail is optional - you dont have to use it.
To understand is cost is line-item or summary/total: If the cost_event_id points to a visit_detail or procedure_occurrence then it is a component of the cost. If the cost_event_id points to visit_occurrence or drug_occurrence it is a total/header cost. Plus - we will use the appropriate concept_id’s – that will clearly distinguish between component cost and total cost.
See concept’s proposed as part of the proposal
Proposed concept_id
Proposal on github
Note the use of concept_class to distinguish between cost’s that may be total or component.
The proposed concept_id concepts, use the concept_class to unambiguously differentiate between header-cost and component cost. @ericaVoss and I talked about this approach at the symposium - because this would put a lot of guardrails preventing the confusion of summing over the total/header costs and component/line-detail costs…
I think the controversy you are referring to is that the standard concepts available for use with visit_concept_id is not granular enough. That is out of scope for this proposal, but we need to address creating better concept_id to support claims data. This however needs to be done separately.
Is that general consensus? VISIT_OCCURRENCE=header and VISIT_DETAIL=detail? You get only one header for an entire hospital visit? Not two or more? Never? And VISIT_DETAIL is is not aggregated the way the provider happened to be organized commercially and split up the claims?
I feel like you are alluding to - me!
Do you have them listed somewhere? That would help.
Just a few simple use cases. I am using reimbursed amount to be cost, but that might vary by dataset.
Calculate the annual average cost for patients in a cohort
Calculate the cumulative cost for patients in a cohort
Estimate the effects of covariates on the cumulative cost in a cohort
Calculate the annualized cost (cost / observation period duration during year)
Calculate the cost of specific utilization per person: drugs, hospitalizations, office visits, emergency room visits, etc
Some of the above might be by subgroup (e.g., diabetes, cancer, osteoporosis, etc)
Sounds all like cumulative stuff, which means, you sum up everything that comes after an index date. Correct? If so, why all the hassle with the header and detail?