OHDSI Home | Forums | Wiki | Github

Proposal for Cost Table Adjustment

Hi Jennifer,

Ideally I would like to have the same event that is used in the data table, for example drug_exposure_start_date in the case of the drug_exposure table.

Best,
Klaus

1 Like

@DTorok:
I see your valid point.
However, if we want to focus on normalization, we wouldn’t design the cost table that way. We would rather use a cost_type and one cost_value instead of many COST entries, which are partly used in databases outside the U.S.

Having that massive de-normalized COST table an additional person_id would not add much size but would rather enable us to process huge datasets without the need of spending cost expensive hardware and software resources for building up huge databases for analyzing the data.

@klaus @chris_knoll I support the idea is putting person_id for multiple reasons including the performance in database appliances that use hashing and to continue the person centric convention of every table.

We should make this a proposal that is a topic for the F2F. We really like the cost table, but the absence of the person_id makes it difficult to use.

Is it true that OHDSI applications like Atlas don’t use the cost table because of the absence of person_id? Currently it is not possible to use Atlas to build a cohort who have had visits with amount > 100,000$s for example.

I think the actual service dates e.g. procedure date, visit date or drug dispensation date should be in their respective tables as is.

I do however see value for two dates in this table. The date billed and date paid. These are important information for health plan actuarial analysis that use it to estimate IBNR. https://en.m.wikipedia.org/wiki/Incurred_but_not_reported

Happy to make a proposal in a different thread to request this. There are a lot of use cases for this date enhancement from health economics and actuarial standpoint.

No, the tools could use cost ID, it would just have to join to the appropriate domain table to figure out who the cost is associated with. It just hasn’t been implemented yet because there hasn’t been a defined analytical use case for it and that sort of thing drives the priorities.

Thank you @chris_knoll and group

In addition to what you already mentioned above, what are the reasons to add person_id to cost table?

  • query efficiency and ease of use from analyst point of view
  • hashing in appliances that distribute based on person_id
  • general convention of representing person_id in OMOP tables

What else? Would like to propose adding person_id at F2F.

I think all of those are reason enough. I think for it to get incorporated into the actual CDM schema, we just need to drum up enough support for it and push it through the WG.

Yup - see proposal

http://www.ohdsi.org/web/wiki/doku.php?id=documentation:next_cdm:add_person_to_cost

Please make changes/improve

@DTorok
when you say normalized cost table - are you refering to something like this?

Normalized cost table

How would the table look like?

@Gowtham_Rao

Added Klaus’ last name. Other than that - pretty clean. The only other caveat is the possibility of database integrity conflict: The person_id passed through the event table might be different from the one in the COST table. We need to add it to the constraints (slow), a warning in the description or data quality tools like ACHILLES HEELS, which should watch that kind of thing. Otherwise it will happen somewhere.

Can you add that to the proposals? I think it’s the right thing to do? I hate those strings. Same in FACT_RELATIONSHIP. Which I also hate.

Will add normalized table to cost table proposal. Agree, this is probably how it should be

Can we make cost it an agenda item at F2F next week?

You got it.

Thanks for supporting this.
Can we make one of the datetime fields mandatory to fully satisfy our design principle of having a person_id and a date?

1 Like

I would say that majority of secondary data sources will not have financial transaction dates. If we make it mandatory, then a default date will need to be forced thru some form of imputation.

Why is date mandatory?

It doesn’t have to be the financial date it can be rather the same date which is used in the record identified with cost_event_id.
The reason is the same as for patient_id. The design principle for all data tables suggests a person_id and a date. This makes searching by iterating over records more efficient.

Ok - we could add service_date to represent the start date of a visit, procedure, condition, specimen or other domain represented in cost table. I don’t know how it will

However, we recently changed from representing dates to datetime. Should we use datetime or date? It is clean for billed_datetime and paid_datetime, since they are new without legacy implications. We dont need _date and can start with _datetime for these two. But what about service_date vs service_datetime vs both?

See proposal

If you sort the events by date within a patient makes finding events more efficient.
Personally, I prefer to use the same structure as in the other tables: required date and optional datetime.

@Gowtham_Rao - great discussion today! I’m hoping we can continue the conversation about pivoting the cost data. I like the pivot idea but I’d like to think more about storing both the summary cost data as well as its parts in the same table. I get the TYPE will help you split it up and regardless of what we decide we just need to be clear. My initial concern is that people aren’t careful about reviewing types and I’m I’m nervous what people might blindly do if summing up costs. But I’m open to hear what other people think.

Also, I would like if @jenniferduryea weighed into this.

Yes, if we sum up across cost-concepts, then we will be double counting.

Total cost may be argued to be a classification concepts of the components of the concept. Total cost may also be argued to be a value available in source data, and so it is a ‘S’ standard concept id not a ‘C’ classification concept-id. Concept’s can be used in a normalized cost table.

Defining total-cost as a classification concept is next to impossible. The reason is because, what components lead to total cost is highly variable and depends on the level of granularity in source data, definitions used in the source data, or the definitions used by the researcher. e.g. Is total cost ingredient+dispensing fee, or ingredient+dispensing+discount?

The proposal uses cost_source_concept_id and cost_concept_id - that allows for some form of standardization. During analysis time - it is up to the investigator to define what set of concep_id’s should be rolled up to total-cost. If we formalize a classification-concept and standard-cost concepts - then it will force standardization.

Just to add a small comment or two on this. The cost table was primarily designed to store the information available in most commercially available claims data, particularly Medicare data. The wide layout makes it easier to output results for analytics, and is particularly easy for humans to use. One thing we have seen in our data is that the total cost provided in the data does not always equal the sum of the costs in the row. In some global sense, it always adds up. but the view that exists in claims data is not complete. Payers have much more granularity than researchers. In their data, it has to add up (I hope).

The wide vs. tall debate is one that we have had several times here at Outcomes Insights. We ended up keeping both, and separating them into two cost tables (payer reimbursement table and cost table). There are many, many costs that organizations have, and the tall cost table, with type information, is much more flexible. The wide payer reimbursement table seemed more amenable to standard analytics. Others may disagree, of course.

Also, the use cases in research are quite varied. The simplest use economic case is simply to average costs by resource type (e.g., the average cost of office visits in 2013). Many published economic studies look at the total (cumulative) reimbursed amount as a function of patient characteristics and/or time. There is a need to have the date of service in order to be able to inflate the cost to the desired year of the analysis. More sophisticated analyses also need location to incorporate geographic inflation factors. In short, costs almost require their own “CDM”.

I can’t comment on what payers want to do, but from what I gather from reading some of the posts on this topic, they are generally quite different. Most likely this has to do with payers having much more detail than most researchers. And also, payers look at things internally without regard to publishing the results.

I guess this is no longer a short note, but we are fully supportive of improving the tables for more use cases. @jenniferduryea is not available for the next week or so, but she will chime in if she has other things to add.

t