Hi all,
I am reaching out to seek clarity on a topic that I find somewhat confusing, and I believe your expertise could greatly help me understand it better.
I’m currently working on a healthcare utilization segment for the OMOP CDM 5.4 database, working through how to define visit_occurrence grain when source claims bundle multiple real-world encounters under a single claim_id. We’ve identified three distinct data patterns in our medical claims detail lines (claims usually nested) and want to validate our proposed approach with the community.
Context
Our source data has one row per claim detail line, with columns for claim_id, medical_claim_detail_id, service_from_date, service_to_date, place_of_service_code, medical_code_type (ICD-10, CPT-4, HCPCS, REV), and medical_code. A single claim can contain a mix of ICD-10 diagnosis headers and CPT-4/HCPCS/REV service lines.
The naïve approach of one visit_occurrence per claim_id collapses distinct encounters — for example, 47 separate psychotherapy sessions billed under one claim would become a single visit spanning six months, but it should be 47 visit occurrences…
Proposed encounter grain
We are trying to refine a new visit grain one visit_occurrence per unique combination of person_id + claim_id + place_of_service_code + service_from_date, derived only from service lines (CPT-4, HCPCS, REV). ICD-10 lines never drive visits — they fan out as condition_occurrence rows (or whichever table the vocabulary domain_id dictates) linked to every visit on their parent claim.
But we’ve observed three patterns
Pattern A : POS present on service lines (most common): ICD-10 headers have no POS and spanning dates, but CPT/HCPCS lines carry discrete dates and POS codes. The four-part grain works cleanly. Example: 47 CPT lines across 47 dates at two different POS codes produce 47 visits, while the single ICD header fans out to all 47.
Pattern B : POS present on service lines, single date: An ER encounter where all 32 service lines share the same date and POS. The grain collapses to 1 visit. Three ICD headers fan out to that single visit.
Pattern C : POS missing on all lines: Every line on the claim — including CPT/HCPCS service lines — has a null POS. The grain drops to person_id + claim_id + service_from_date, and visit_concept_id must be inferred from the E&M or service codes (e.g., CPT 99285 signals ER visit → concept 9203, HCPCS G0378 signals observation).
My questions for the community
-
Is the code-type-based classification (ICD-10 = always header, everything else = service line that drives visit grain) a sound general rule, or are there known edge cases where ICD lines should drive visits?
-
For the ICD fan-out, attaching each diagnosis to every visit on the same claim. is this consistent with how other CDM builders handle claim-level diagnoses that lack encounter-level attribution? We recognize this may overcount condition prevalence at the visit level but see no reliable way to attribute a diagnosis to a specific encounter within the claim.
-
For Pattern C (no POS anywhere), is inferring
visit_concept_idfrom E&M codes an accepted convention? We’re considering a hierarchy: ER E&M (99281–99285) → 9203, inpatient indicators → 9201, default → 9202. Has anyone implemented something similar? -
The core question: what constitutes an encounter?
Any experience or guidance would be appreciated.
Thanks in advance.