**Please see Gowtham's [updated Cost table proposal](https://github.com/OHDSI/Co…mmonDataModel/issues/81#issuecomment-333811290) in the comments below**
# Cost Table Changes (Add Person_id, Dates and normalize)
**Proposal Owner:** Gowtham Rao, Chris Knoll, Klaus Bonadt
**Discussion:** [forum post](http://forums.ohdsi.org/t/proposal-for-cost-table-adjustment/1842/12)
**Cost table description:** [COST](https://github.com/OHDSI/CommonDataModel/wiki/cost)
**Proposal overview:**
- Add person_id
- Add billed_datetime and paid_datetime
- Change Cost table structure - to "normalized" version
- Add new field cost_domain_concept_id
**1. Add Person_id to Cost table**
* **Design justification:** We have a design principle that all domain related data tables are person-centric where for each record the person_id and date are captured at a minimum. We deviate from this design principle in the COST table. Addition of person_id will make the SQL statement in database systems faster. A missing person_id prevents other lightweight non-database systems to process all transaction tables chunk wise. In certain database environments (specifically Massively Parallel Processing (MPP) that we see in redshift and MS PDW (and possibly HiveDB on Hadoop)) we want to be able to hash on a field of the table. By NOT having a person id in the cost table, we can't hash the data such that the data for a given person can be co-located on the same cluster. By HAVING this person_id field, we can define the table as hashed on person_id.
* **Discussion for and against:** However, adding the person id and date will further de-normalize the OMOP CDM Schema. Local deployments of OMOP may add person_id on their own. This was countered by: but that will fail the standardization principle. The query optimizer may not leverage a hashing column if the column isn't applied in the query - standard OHDSI applications will not use person_id if it is not a standard OMOP CDM standard. While trying to calculate an average cost of something by person from the cost table, we have to go to the drug_exposure table, join back to cost (on the domain="drug") and the drug_exposure_id (neither of these are hashed). This will cause a sort of 'shuffle' move of data from the cost table (which can only be hashed on a cost_id) UNLESS we say where cost.person_id = drug_exposure.person_id. Cost-table is already denormalized with many cost entries (allowed amount, paid by payer, paid by patient etc). Alternative would be to redesign the cost table to cost_type and one cost_value instead of many COST entries. Addition of person_id may not impact the already denormalized table, but would rather enable us to process huge datasets without the need of spending cost expensive hardware and software resources for building up complex queries thru multiple inefficient joins.
* **Data integrity checks:** To avoid the possibility of database integrity conflicts, e.g. the person_id passed through the event table might be different from the one in the COST table. We need to add it to the constraints (slow), a warning in the description or data quality tools like ACHILLES HEELS, which should watch that kind of thing.
**2. Add incurred_date, incurred_datetime, billed_datetime, paid_datetime:**
* Add three new datetime fields in the cost table. billed_datetime and paid_datetime
* Incurred_date or service_date (match event table) - This is in-line with the design principle.
* Both fields are optional (Required = No), they are date time fields (Type = DateTime - following OMOP conventions)
* Costs are associated with a visit_occurrence, procedure_occurrence, drug_occurrence, observation, device etc.
* There are generally three types of dates associated with costs. Incurred date, Billed Date and Paid date.
* Incurred date is the date of the service. They are captured in the respective visit, procedure etc.
* Billed date and paid date are not captured in OMOP CDM. Use cases for these are listed below
**Use cases:**
* Health economics and actuarial analysis use incurred date, billed date and paid date in the formulas
* Incurred But not reported: https://en.wikipedia.org/wiki/Incurred_but_not_reported amount owed by an insurer to all valid claimants who have had a covered loss but have not yet reported it. Very important for claim reserves estimation (represent the money which should be held by the insurer so as to be able to meet all future claims arising from policies currently in force and policies written in the past.)
* Completion factor trend analysis
**Analytic questions:**
* Trend analysis for operational efficiency - claims adjudication rate, completion factor, IBNR, claims reserve estimation
**Importance:**
* New use cases around financial and actuarial departments of organizations
* This will expand the OHDSI/OMOP footprint
**Consequence of doing it:**
* Adoption of Cost table may increase.
* New use cases that serve the needs of financial entities in an organization will expand the adoption of the OMOP CDM.
**Consequences of not doing it:**
* Query optimization is difficult.
* Analyst has to write complex queries. Development of standardized tools may be delayed
**3. Leverage cost_type_concept_id and remove cost type columns**
* Proposed structure of new table is below
* The key idea here is that instead of making each cost-type a column (i.e. wide representation), lets convert to long representation. We will leverage cost_type_concept_id for this.
* cost_type_concept_id will be used to identify standard cost types: total_charge, total_cost, total_paid, paid_by_payer, paid_by_patient, paid_patient_coinsurance, paid_patient_deductible, paid_by_primary, paid_ingredient_cost, paid_dispensing_fee, payer_plan_period_id, amount_allowed. This allows to generalize the cost table and we represent arbitrary number of cost types. We can also represent international and custom use cases. The concept_ids that would need to be created are:
- Copayment amount
- Coinsurance amount
- Charged by the provider
- Recovered by the provider
- Allowed by the primary payer
- Paid by the primary payer
- Allowed by the secondary payer
- Paid by the secondary payer
- Allowed by all payers
- Paid by all payers
- Charged to the patient
- Paid by the patient total out of pocket
- Paid by the patient towards co-insurance
- Paid by the patient towards copay
- Paid by the patient towards deductible
- Fee for pharmacy dispensing
- Cost of pharmacy ingredient
- Average Wholesale Price amount
**4. Add new field cost_domain_concept_id**
* The original cost_domain_id is 'character' field, we want a new integer field called cost_domain_concept_id which is an integer that currently may be one of the following:
* 8 (Visit)
* 10 (Procedure)
* 13 (Drug)
* 17 (Device)
* 19 (Condition)
* 21 (Measurement)
* 27 (Observation)
* 36 (Specimen)
Field | Required | Type | Description
:----------------|:-----------------|:------------|:----------------------------------- |
| cost_id | Yes | integer | A unique identifier for each COST record. |
| person_id | Yes | integer | A unique identifier for each person. |
| cost_event_id | Yes | integer | A foreign key identifier to the event (e.g. Measurement, Procedure, Visit, Drug Exposure, etc) record for which cost data are recorded. |
| cost_domain_id | Yes | string(50) | The concept id representing the domain of the cost event, from which the corresponding table can be inferred that contains the entity for which cost information is recorded. |
| cost_domain_concept_id |Yes| integer| A foreign key identifier to a concept in the CONCEPT table representing the domain of the cost event|
| cost_type_concept_id | Yes | integer | A foreign key identifier to a concept in the CONCEPT table for the provenance or the source of the COST data: Calculated from insurance claim information, provider revenue, calculated from cost-to-charge ratio, reported from accounting database, etc. |
| currency_concept_id | Yes | integer | A foreign key identifier to the concept representing the 3-letter code used to delineate international currencies, such as USD for US Dollar. |
| total_cost | Yes | float | The actual financial cost amount |
| incurred_date | Yes | date | The first date of service of the clinical event as in table capturing the information (e.g. date of visit, date of procedure, date of condition, date of drug etc). |
| incurred_datetime | No | datetime | |
| billed_datetime | No | datetime | The date and time a bill was generated for a service or encounter |
| paid_datetime | No | datetime | The date and time payment was received for a service or encounter |
| revenue_code_concept_id | No | integer | A foreign key referring to a Standard Concept ID in the Standardized Vocabularies for Revenue codes. |
| drg_concept_id | No | integer | A foreign key referring to a Standard Concept ID in the Standardized Vocabularies for DRG codes. |
| revenue_code_source_value | No | string(50) | The source code for the Revenue code as it appears in the source data, stored here for reference. |
| drg_source_value | No | string(50) | The source code for the 3-digit DRG source code as it appears in the source data, stored here for reference. |