In evaluating data quality, the visit date disparity metric right now is looking at all rows that have a different date from the date of the affiliated visit_occurrence_id. However, in some situations, especially in ambulatory care setting, the date of an exposure could be 1 or 2 days off from the visit date. Below are 3 examples that fall under this:
In the case of telehealth, the patient might call in for prescription, where the actual drug start date is not until the next day
For an ambulatory visit, which could be fairly short, there might be medications to be taken a day before the actual visit
For care management visit, sometimes the medication does not get started until the day after
Since these cases are all valid data, should the visit date disparity metric be altered to accommodate a window of acceptable time between data points and encounter, especially for the ambulatory care visits? Should we look at the ambulatory care visits separately in terms of visit date disparity?
This is timely for me because we are looking closely in to the Visit Date Disparity issue right now.
Another scenario we’ve run into is when a patient dies, but is kept on life support to harvest organs. The discharge date may be, say 6/2, but the organs are harvested on 6/4. The patient is discharged on the 2nd, but has multiple measurements, procedures, drugs, and notes after “discharge”. However, they are all tied to the inpatient Visit record in the EHR.
I know this is addressed in the Visit Exposure After 30 days error, but it also crops up in Visit_Date_Disparity as well.
We also have a considerable number of Inpatient instances that happen over midnight. For instance, the first blood pressure of an Inpatient visit is on 2020-05-29 23:50:00, but the Visit_Start_Date (admission) is on 2020-05-30 00:24:00. Or a 57751-0:Hemoglobin test is on 2019-04-24 20:00:00, but the admission starts 2019-04-25 01:23:00.
Well, as an ETL developer, I can think of four ways to handle the Visit Date Disparity:
Since Visit_Occurrence_ID is not required in any of the “clinical data” tables, simply don’t send it. Poof. No disparities.
Problem: there won’t be any way to tie clinical data to visits
Check dates of related clinical data measures against the Visit_Occurrence table, and extend the visit start/end dates to encompass them.
Problem: The visit date range will be an artifact, artificially created to avoid this data error.
Check the dates of related clinical data measures against the Visit_Occurrence table and only send Visit_Occurrence_IDs for those measure records which are within the visit start/end dates, and set any outside that range to NULL.
Problem: Possible to lose the relation of data that is tied to a visit simply because the date is outside the range. I don’t know how much research is done where the visit is important.
Suppress any clinical data measures that fall outside the visit start/end date range.
Problem: Obviously, you are losing valuable data, much of which may be valid, just to exclude possible errors.
This issue may also be EHR dependent. For instance, Epic will often show a drug that was prescribed at an office visit with a start_date on the day following. However, another EHR could show it starting on the date of the visit (I don’t know that is so, but it’s possible.)
I’ve also got cases where a lab is ordered 10 days before an office visit, but it is ordered for that visit and is tied to the visit.
It just feels like the tail is wagging the dog here. While it makes logical sense that a clinical measure tied to a visit should fall within the visit start/end date range, there appear to be legitimate cases where it does not. Enforcing the rule could invalidate valid data.
Here is the situation: The OMOP CDM generally does not engage much in these explicit crosslinks between tables. The overall logic is to link tables through person_id and timing. That is because observational data rarely have crosslinks like that. For example, people are always asking to have Drugs linked to the Conditions they are supposed to be treating. We rarely get that from the source data, and it is not a clean one-to-one anyway. For Visits we decided to have them, because it appeared straightforward to link whatever happened during the Visit to the Visit.
Turns out it is not that easy either. We could do what you said: Drop the crosslinks and push the problem to the analyst. Makes life easy for the ETLer. I personally am getting more and more in favor of this solution since these crosslinks are more hassle than value, and we never have them reliably, which means the analysts ignore them.
But there are downsides. For example, you can have easily two or more ambulatory Visits in a day, and to know in which circumstance you got the flu shot vs the chemo might feed a use case.
Agreed. And it violates the idea that the Visit is what happens to the patient, rather than in an organization.
Right. That’s arbitrary. Then we may as well do solution 1.
Agreed. Not good.
Agreed. The data per se are the important information, not the perfect representation of the processes behind the healthcare system.
Bottom line: Sounds like we should make a decision between dropping the links (1.) or leave them as is (no clear logic). Thoughts?
This touches on another issue I’ve been struggling with, namely what constitutes a valid face-to-face visit.
When I first created my OMOP instance, I made visit_occurrence_id a required field in all of the clinical data tables. I couldn’t imagine that you would want data without a visit. I was very surprised when I was told visit_occurrence_id was not required and there can be data without a visit.
It was a very visit-oriented perspective instead of person-oriented. But that’s because our EHR is visit (or rather encounter) oriented. Every clinical event has to have an encounter. Some are real visits, but some are just internal events like Clinical_Support, Orders_Only, or Documentation. So I was including a lot of encounters that were not really face-to-face visits.
But without those encounters, it’s extremely difficult to valid my OMOP data against the EHR. This isn’t an OMOP standards issue so much as it is a process issue, but it’s real for me nonetheless (I’m getting to the point, I promise)
So I am left with several options:
Leave the non-visit encounters in the OMOP database with a type_concept of 0. My thinking was that researchers could just ignore any “visits” with a type concept of 0 for any queries that needed visits. I’m told that isn’t feasible.
Have an intermediate OMOP database that left the visit_occurrence_id in each table so I can use that for validation, and then extract that to another OMOP instance without those non-visit encounters and nullifying visit_occurrence_id in related clinical data records.
Modify my OMOP instance to include visit_source_value in every clinical data table, so I can do my validation with that. That is, of course, an extension of the OMOP standard, and I’m not suggesting it for wider acceptance.
I know that other people are also struggling with this. Neither one of your bottom-line solutions helps this problem, but it would be nice if some solution could be devised that would solve both.
We do a combination similar to #2 & #3. Colorado adds MANY columns to the OMOP tables for validation efforts. Then we create proper CDM views for use by others and keep the very wide OMOP tables for internal use only. Our extra columns generally consist of keys, ids, and source tables names to tie the CDM row back to the source table, field, and row.
The philosophy of OHDSI is to keep your CDM pure. By using a combo of 2 & 3, we have been able to keep our CDM pure (with special Colorado modifications, of course ) and retain the ability to validate our CDM data.
I’m interested to hear other solutions from the community.