I’m not sure how to provide standardized analyses when we allow the idiosyncrasies of the source data into the model. I like how you depicted the different elements in your picture, Christian, but i’m inclined to go the #1 route by introducing CDM concepts that we can all standardize our analytics on, and leave the idiosyncrasies of the data an ETL exercise to standardize it into the CDM form.
You are right. If we could, that is. The problem is that we usually cannot. The way a claim comes in you can’t tell wether it came from a hospital, a department of a single provider (which color it is). Therefore, it is better to keep them as is (like we do in DRUG_EXPOSURE) and then figure out a standardized procedure to convert them. That will get us regular critique, but at the end of the day everybody uses it.
I’ll draw the options out. You’ll see they are not that terribly different.
Friends,
The discussion has dropped off, so I will continue with my opinion
I agree with @bailey:
Especially the impact on performance and query overheard if the ADT data is merged into the VISIT_OCCURRENCE table.
Let’s continue the conversation or fine tune this as a proposal on the wiki. I’d like to get this nailed down in the next week or so. The F2F is only a couple weeks away.
I think we as a community, need to first agree on some conventions (applicable when using claims data): I would like to propose a convention based on the questions “Whose point of view?”
Please see Google docs Please review, edit or offer comments. This point of view based approach, maybe generalize-able to US and non US settings
Matching to HL7 world: https://www.hl7.org/fhir/encounter.html#examples
Encounter = “An interaction between a patient and healthcare provider(s) for the purpose of providing healthcare service(s) or assessing the health status of a patient.” A patient encounter is further characterized by the setting in which it takes place. Among them are ambulatory, emergency, home health, inpatient and virtual encounters. An Encounter encompasses the lifecycle from pre-admission, the actual encounter (for ambulatory encounters), and admission, stay and discharge (for inpatient encounters). During the encounter the patient may move from practitioner to practitioner and location to location.
For example, each single visit of a practitioner during a hospitalization may lead to a new instance of Encounter, but depending on local practice and the systems involved, it may well be that this is aggregated to a single instance for a whole hospitalization.
We may need to word-smith the ideas in Google doc above to be as close to the descriptions in HL7. I think, as a community, agreeing on conventions is important - and this will help us represent the data with semantic accuracy while retaining lineage to source.
I think this is tough to be sourced from claims data
Agree - we need to preserve claims level referencing.
i.e. whose point of view? Claims will help answer some of the points of view as posted above.
@bailey @MPhilofsky Reading your use-cases, i think the unit of analysis is care_site (site of care) , where the care-site generally refers to the ‘level of care’ e.g. ICU vs step down.
When representing data in OMOP common data model - I think it is critical to represent the data in a such a way that is semantically represents the source data and is at the same atomic level of granularity as the source data. This will retain lineage/provenance to the source 1:1. Any ‘inferred’ information should be done at the analytic time. If we represent inferred or calculated information in the OMOP CDM, then we are saying that information came from the source - which is not true.
@Mark_Danese - can we use the Contexts and Collections approach to derive visits, encounters, services as described above? @jenniferduryea this is an opportunity to be able to represent claims data accurately in OMOP CDM like @Patrick_Ryan said
The visit vs. micro-visit lingo may confuse some - what do you think about service, encounter, visit framework above using whose point of view? More word-smithing?
You can know the following:
- Facility claim vs professional claim atleast in USA by using UB04 vs CMS 1500 claim information or pharmacy/vision/dental claim etc.
- Other important things in claims are HIPAA place of service, Type of bill, revenue code, admit type that help define it further
Yes finally!
I think we should first agree on conventions, then find an efficient way to represent the data
Just to answer the question you directed at me. Yes, visits can be derived in the way we handle data. For EHR data, a Collection is a visit. For claims data, a Collection is a claim. If we need to count unique visits for a patient for some reason, we group items using specific rules related to the research question (e.g., using unique dates, places of service, and providers).
In fact, in my experience a uniquely defined visit is almost never needed in research. Usually, we want a specific clinical event – a diagnosis, procedure, drug, or lab – and the date it occurred. The notion of a visit in most data models is generally used as a convention to link information together.
I think the process begins with establishing what we are trying to do with visits.
Thanks @Mark_Danese
I think, if if i understand you correctly, your way allows flexibility to the community of ‘deriving’ the visit, encounter, service etc. based on rules defined by researcher at the analytics time rather than ETL time.
Is the collection/contexts something we an work more on?
If the community wants to go this direction, I am happy to help. We built our own data model so we could more easily convert raw data to OMOP and Sentinel. If you want to see how it all fits together, it is publicly available: GitHub - outcomesinsights/generalized_data_model: Outcomes Insights' Data Model for Clinical Research
Purely selfishly, the more similar the data models are, the easier my job is. But what we are doing is very specific to our needs and I recognize that OMOP has a fundamentally different goal. That is, OMOP is designed to be an “analysis ready” representation of the data. By definition, decisions related to analyses are baked into the data model. Otherwise it isn’t “analysis ready”.
Our mission is to be able to do reproducible, collaborative, large scale research. We use a common data model to standardize the structure, content, representation of data. I don’t think that implies “analytics ready”. We have packages and tools that convert the data in CDM into analytics ready, but the CDM itself is not analytics ready. I don’t think our mission calls for baking in analytic use-cases into CDM so that those use cases are easier; what we do is, collaboratively understand multiple usecases and enhance the CDM such that multiple use cases maybe supported.
Atleast that’s my understanding. @Patrick_Ryan thoughts?
You are absuing @MPhilofsky’s thread to define visits and its hierarchy here for a general ideological discussion. I am going to move it to a new Forum posting, if you don’t mind. See you there.
Uses cases: For tracking health care resource utilization (Health plans) are interested in being able to calculate the following from OMOP common data model
- Visits per 1,000 per year (Numerator: number of unique visits in a certain period Denominator: Person-time during that period)
- Encounters per 1,000 per year (Numerator: number of unique encounters in a certain period Denominator: Person-time during that period)
- Claims per 1,000 per year (Numerator: number of unique claims in a certain period Denominator: Person-time during that period)
- (Inpatient) Days per 1,000 per year (Numerator: number of unique (inpatient) dates in a certain period Denominator: Person-time during that period)
-
(Inpatient) average length of stay (Numerator: total number of unique (inpatient) dates in a certain period Denominator: Number of persons during that period)
[Dates may be visit days, encounter days or claim days - if using from and to dates]
These are generally then stratified by site of service, condition/diagnosis (primary), primary procedure, patient demographics, etc. The metric and stratifications are used for reporting and visualization in Business Intelligence platforms.
I think health-plans would be very interested in being able to consistently and reproducibly measure this.
That is the problem. An encounter can be a 5 minute stop-by in a walk-in clinic to get a flu shot, or a month-long stay in a hospital with tons of diagnostic procedures and treatments. Within such a jumbo encounter you can have little encounters embedded, because those providers bill separately. And each encounter has a bunch of claims attached.
Bottom line: We have to either infer the true healthcare experience of the patient from how the claims are submitted, and that depends on how the provider is organized. Or ignore it and keep it the way it is (like in your use cases).
Or both. My proposal 2. is supporting that: Put your encounters in a new ENCOUNTER table where we dump everything as it is given to us. That’s where you would do your use cases. In the VISIT_OCCURRENCE table we infer the standard visits.
Works?
@Christian_Reich Thank you - trying hard to
Did you see this Conventions
Can we use Visit - Encounter - Service (wordsmithing?), where visit is our visit occurrence table. Encounter is new (table). Service is represented in visit and/or encounter as a person-level concept for a datetime.
Ok - @Christian_Reich now i see clearly about this. Makes sense - there is precedence to represent inferred information in the CDM.
The more I think about it - i dont think we need another encounter-table. We could use the existing visit_occurrence table, and concept_id’s to represent the level (visit, micro-visit, mini-visit, encounter etc.).
See Proposal
visit_level_concept_id: A foreign key to the predefined concepts in the Visit Level Vocabulary reflecting the granularity concept of a nest of information.
visit_level_relationship_concept_id: A foreign key to the predefined concepts (HL7’s partOf links) Vocabulary
related_level_visit_id: A foreign key to the VISIT_OCCURRENCE table record of the visit that is related to visit but not in the same level/nest.
visit_level_definition_id: A foreign key to a new Definition table that records descriptions and syntax that led to the instantiation of the visit_occurrence record.
Alternatively, we could use an encounter table like proposed above - but why an additional table when visit_occurrence table is able to solve the use-cases?
@Christian_Reich - may i get a spot to propose this to the CDM?
This does not solve the need for ‘services’ i.e. service_concept_id used by PEDSnet. That would be a different proposal
How does this:
Solve this use case:
More specifically:
@bailey also has a use case:
The impetus for my original proposal at the top of this thread is to solve level of care, location of care and transfer of care within an inpatient visit in greater granularity than is currently available in the VISIT_OCCURRENCE table. The proposal @Gowtham_Rao links above is very thorough, but neglects my use case. Is there a way to solve everyone’s use case with one change?
To answer @Gowtham_Rao’s question:
Patient + severity of medical condition.
Explanation: The level of care a patient received. Yes, the patient may continue to have the same provider in an ICU or on a standard medical unit, but the level of direct care (staffing ratio, certifications of direct care staff, complexity and number of procedures, measurements and drugs, etc.) given to a patient is vastly different. We’d like to look at outcomes associated with these different levels of care. We would also like to research attributes of patients that went from the ER > medical floor > ICU within a short period of time. What do they have in common? Maybe certain attributes should be considered in the level of care placement of a patient within an inpatient hospital stay.
Per @Gowtham_Rao Proposal and by others, a consequence of combining this new data with the current VISIT_OCCURRENCE table is “Performance impacts for extremely large data”. Those of us with EHR data will be adding a lot of data to our instances of the CDM. Not every query will be asking for this level of granularity, so I propose we keep the new table separate to lighten the load.
@MPhilofsky I think we should distinguish between a service and location - and that will probably show how the proposal addresses most of your use case.
The proposal does not address the use-cases that involve the concept of a ‘service’ . But it does address two of your three focus areas - it does not address level of care, it does address location of care and transfer of care.
An example: When i was training as an internist in the hospital, we would have many days when the ICU beds would be full. These days, if the ER is still open and a patient comes in to the ER who needs ICU level of care - it would make it so difficult for everyone. Since the ICU had no physical space, the patient would stay in the ER and the ICU nurse/physician would take care the patient in the ER. This is an example of ICU service in the ER location. In this example - care_site_id would capture the physical location i.e. ER, and service_concept_id would capture the ICU service. In real-world, there will be many such examples: such as dialysis service in a rehabilitation floor location, or surgery service in the ER location etc.
The visit_occurrence table with the proposal addresses the use cases listed that involve location or transfers between locations: e.g.
These are not addressed:
To address these we have to vet out what is a service and what is a level-of-care or care-intensity. We will have to introduce service_concept_id and have a new domain, concept_id etc for service. This is important, happy to support you on that work.
This is also addressed in the proposal - because the change is physical location (a micro visit).
Yes - I am no datamodeler, there are others who would know best. Its a trade-off
You make some good points. Were you the debate club captain in high school?
Our unit of analysis is physical location and level of care. Generally, the questions have asked about patients receiving ICU level care which includes the medical ICU, neuro ICU, etc. Sometimes they ask about a specific care_site (who received aaa procedure in the xxxICU vs the yyyICU). For us, the service_concept_id could be a way to group the differing levels of care (ICU care, acute care, rehab) and the care_site would be a specific site within a location. Unfortunately, the current vocabularies available for use aren’t granular enough to represent the care_site at the level our investigators require. The service_concept_id value set that the PEDSnet folks created would help us distinguish the level of care.
I’m going to let this rest while our friends mull it over. Hopefully, this can be finalized at the F2F so we can add this data to our instance of OMOP.
Melanie
Yes, sounds like a good idea. Let me understand: Essentially, instead of a fixed Visit-Encounter-Procedure hierarchy we use one table and Concepts to clarify what it is, and the hierarchical relationships between these Concepts are predefined and live outside, correct?
If so, I like it. It’s no different than in DRUG_EXPOSURE. There, we have only one Concept for the Drug, but that concept tells us whether it is a product, or a Component, or an Ingredient. And you have the full hierarchical relationships to ask for “Give me all records with teatment with Ingredient XYZ”, and all you have to do is to join the CONCEPT_ANCESTOR table. We could do the same thing here: “Give me all the records with a hospitalization”, which would pull up hospital-based services.
Except: The derived DRUG_ERAs are in a different table, and there are predefined on the Ingredient level.
We may think of replicating this approach. Here is the reason: There are all these @MPhilofsky and @bailey use cases trying to figure out what goes right and wrong inside the hospital. But 90% of the use cases for the visit table is “Give me all patients with a hospitalization for Condition XYZ”, where “hospitalization” really is a measure of severity. @MPhilofsky has that even inside her hospital world.
So, how about this: We create a new table for any of the different levels Episode, Visit, Encounter, Service (not Procedure), and folks dump in there whatever they find, together wiht a Concept that defines the level. This is the equivalent of DRUG_EXPOSURE. And VISIT_OCCURRENCE gets inferred and cobbled together, and becomes (stays really) the equivalent of the DRUG_ERA. In all the event tables we add another field: newtable_occurrence_id, which is the equivalent of visit_occurrence_id.
Thoughts?
A new table is fine I guess, but do we really have to add a new column to every data table (drug, condition, etc.)? It doesn’t meet the need anyway if the encounters and services are not hierarchical. Ie, you both switch services while on a floor and switch floors while on the service. Therefore one column is not enough to link to both encounters and services. Isn’t this what the fact_relationship table is supposed to do?
Similar issues are surfacing as we figure out storing microbiology results, although that one is more hierarchical. In our clinical data repository, we have “component” tables that nest with child rows pointing to parent rows within the table. It has worked for almost 30 years, but you have to infer the hierarchy. With fact_relationship we can be more explicit about the parents and children and it need not be a strict hierarchy. Nevertheless, everyone is nervous about relying on the fact_relationship table to store micro results, and I wonder if it is the same for visit information.
George