Mapping multiple payers claims files into OMOP

Tstaub · April 23, 2019, 9:16pm

I am working on mapping multiple different kinds of payer files into OMOP and did not see this topic already in here. Wanted to know if anyone else has done this and if I could get any tips/best practices or examples.

The main file types I am concerned with are:

Medicare: CCW files

Medicaid: HCFA 1500/ 834/ UB92/ NCPDP

Commercial Payers: 834/ 837i/ 837p / NCPDP

gregk · April 24, 2019, 11:56am

we have done many OMOP CDM ETL conversions of just about every data type available, including linking separate data sets and converting it into a single integrated OMOP CDM instance. We could definitely provide you with hints and best practices but since it is a quite broad question and it would be a bit difficult to advise on specifics without knowing more details.

Could you please share a bit more details? Also, would be happy to jump on a call to better understand what you are trying to do and share our experience/tips.

tom.white.md · September 24, 2019, 2:22pm

As a Health Plan (payer) I have a similar set of questions.

Are there documented best practices/FAQ for mapping from US claims to OMOP? There are are several important business decisions that need be made about the mappings which may impact the reproducability of analyses if different claims data owners/submitters are selecting different mapping strategies.

I reviewed the Book of OHDSI, and the Wiki (https://www.ohdsi.org/web/wiki/doku.php?id=documentation:vocabulary:data_etl), but don’t feel that I’ve seen a definite answer on best practices for claims data. If such documentation already exists, please point me to it.

For example:
(1) When coded claims data (e.g. ICD-9, ICD-10, CPT4, HCPCS, NDC, GPI) do not map through concept_relationship to a standard concept, is _concept_id set to 0 vs. set to an ancestor code vs. set to the source_concept_id (which then makes queries more complicated)
(2) When concepts Maps To multiple SNOMED codes, should/must all of them be stored in the relevant tables, or can organizations pick which single SNOMED code to use?
(3) Are there quality assurance techniques to ensure that ICD, CPT, and HCPCS mappings to standard codes land in the correct tables (e.g. Observation vs. Measurement vs. Drug Exposure).
(4) Are there plans to enhance the Data Quality Dashboard to assess the appropriateness of the mapping of source_code_ids to standard_codes and target tables? Or are there tools planned to automate that process to avoid the risk of inconsistent ETL strategies?

Christian_Reich · September 24, 2019, 4:00pm

Should be the Book. Will use your questions to improve.

That’s the one. Just to relax you: Only really silly codes that are mostly useless for analytics fall into this rare category.

The former. Otherwise the analytics cannot find them.

That’s part of most of the DQ methods, including the DQ Dashboard.

Mapping is an exercise to determine semantic equivalence, and unfortunately no typical quality measure can check the correctness. At least not one that works on simple quality metrics.

Not sure what you mean. It is pretty straightforward:

Take a code
Look up its Concept
Look up what it maps to (could be itself)
Place it where the domain_id tells you to.

This can be completely automated.

Adam_Black · November 22, 2019, 3:22pm

I am working on a similar ETL (Maine State All Payer Claims data). One of the challenges I’m facing is the creation of visits from claim lines. This is important because many questions require that I know the provider and care site where a procedure or service occurred. One claim can contribute to multiple visits (e.g. physical therapy billed monthly) and multiple claims can contribute to a single visit (e.g. dual payer). Furthermore for outpatient services a visit may only have a provider’s professional bill without any care site in which case I need to make a best guess about the care_site provided I am somewhat confident in that guess. My plan is to map every claim line to a visit_detail record and then roll them up to visits using dates, provider, and visit_type to define which visit_detail records should belong to one visit. If anyone has implemented this type of logic before I would love to discuss ideas and prior experience.