Visit date disparity in ambulatory setting

hongjue.wang · October 14, 2020, 3:50am

Hi,

In evaluating data quality, the visit date disparity metric right now is looking at all rows that have a different date from the date of the affiliated visit_occurrence_id. However, in some situations, especially in ambulatory care setting, the date of an exposure could be 1 or 2 days off from the visit date. Below are 3 examples that fall under this:

In the case of telehealth, the patient might call in for prescription, where the actual drug start date is not until the next day
For an ambulatory visit, which could be fairly short, there might be medications to be taken a day before the actual visit
For care management visit, sometimes the medication does not get started until the day after

Since these cases are all valid data, should the visit date disparity metric be altered to accommodate a window of acceptable time between data points and encounter, especially for the ambulatory care visits? Should we look at the ambulatory care visits separately in terms of visit date disparity?

@Christian_Reich @mvanzandt @DTorok @cukarthik

Christian_Reich · October 16, 2020, 7:51am

@hongjue.wang: Agreed. This needs to be settled. Either all events have to be inside the visit period or we allow them to fall out. Will take it on the list.

roger.carlson · October 16, 2020, 3:12pm

This is timely for me because we are looking closely in to the Visit Date Disparity issue right now.

Another scenario we’ve run into is when a patient dies, but is kept on life support to harvest organs. The discharge date may be, say 6/2, but the organs are harvested on 6/4. The patient is discharged on the 2nd, but has multiple measurements, procedures, drugs, and notes after “discharge”. However, they are all tied to the inpatient Visit record in the EHR.

I know this is addressed in the Visit Exposure After 30 days error, but it also crops up in Visit_Date_Disparity as well.

roger.carlson · October 16, 2020, 5:10pm

We also have a considerable number of Inpatient instances that happen over midnight. For instance, the first blood pressure of an Inpatient visit is on 2020-05-29 23:50:00, but the Visit_Start_Date (admission) is on 2020-05-30 00:24:00. Or a 57751-0:Hemoglobin test is on 2019-04-24 20:00:00, but the admission starts 2019-04-25 01:23:00.

Christian_Reich · October 17, 2020, 11:25am

Can you put the Visit end date to the 4th? It’s not actually discharge date, it’s a Visit end date. When the current healthcare setting ends.

Yes, will include that in the debate.If you ask me then we allow events outside the formal Visit period, but let’s have the conversation.

roger.carlson · October 28, 2020, 5:17pm

@Christian_Reich

Well, as an ETL developer, I can think of four ways to handle the Visit Date Disparity:

Since Visit_Occurrence_ID is not required in any of the “clinical data” tables, simply don’t send it. Poof. No disparities.
Problem: there won’t be any way to tie clinical data to visits
Check dates of related clinical data measures against the Visit_Occurrence table, and extend the visit start/end dates to encompass them.
Problem: The visit date range will be an artifact, artificially created to avoid this data error.
Check the dates of related clinical data measures against the Visit_Occurrence table and only send Visit_Occurrence_IDs for those measure records which are within the visit start/end dates, and set any outside that range to NULL.
Problem: Possible to lose the relation of data that is tied to a visit simply because the date is outside the range. I don’t know how much research is done where the visit is important.
Suppress any clinical data measures that fall outside the visit start/end date range.
Problem: Obviously, you are losing valuable data, much of which may be valid, just to exclude possible errors.

This issue may also be EHR dependent. For instance, Epic will often show a drug that was prescribed at an office visit with a start_date on the day following. However, another EHR could show it starting on the date of the visit (I don’t know that is so, but it’s possible.)

I’ve also got cases where a lab is ordered 10 days before an office visit, but it is ordered for that visit and is tied to the visit.

It just feels like the tail is wagging the dog here. While it makes logical sense that a clinical measure tied to a visit should fall within the visit start/end date range, there appear to be legitimate cases where it does not. Enforcing the rule could invalidate valid data.

Christian_Reich · October 29, 2020, 10:02am

I so like your “poof” solution, man.

Here is the situation: The OMOP CDM generally does not engage much in these explicit crosslinks between tables. The overall logic is to link tables through person_id and timing. That is because observational data rarely have crosslinks like that. For example, people are always asking to have Drugs linked to the Conditions they are supposed to be treating. We rarely get that from the source data, and it is not a clean one-to-one anyway. For Visits we decided to have them, because it appeared straightforward to link whatever happened during the Visit to the Visit.

Turns out it is not that easy either. We could do what you said: Drop the crosslinks and push the problem to the analyst. Makes life easy for the ETLer. I personally am getting more and more in favor of this solution since these crosslinks are more hassle than value, and we never have them reliably, which means the analysts ignore them.

But there are downsides. For example, you can have easily two or more ambulatory Visits in a day, and to know in which circumstance you got the flu shot vs the chemo might feed a use case.

Agreed. And it violates the idea that the Visit is what happens to the patient, rather than in an organization.

Right. That’s arbitrary. Then we may as well do solution 1.

Agreed. Not good.

Agreed. The data per se are the important information, not the perfect representation of the processes behind the healthcare system.

Bottom line: Sounds like we should make a decision between dropping the links (1.) or leave them as is (no clear logic). Thoughts?

roger.carlson · October 29, 2020, 1:26pm

This touches on another issue I’ve been struggling with, namely what constitutes a valid face-to-face visit.

When I first created my OMOP instance, I made visit_occurrence_id a required field in all of the clinical data tables. I couldn’t imagine that you would want data without a visit. I was very surprised when I was told visit_occurrence_id was not required and there can be data without a visit.

It was a very visit-oriented perspective instead of person-oriented. But that’s because our EHR is visit (or rather encounter) oriented. Every clinical event has to have an encounter. Some are real visits, but some are just internal events like Clinical_Support, Orders_Only, or Documentation. So I was including a lot of encounters that were not really face-to-face visits.

But without those encounters, it’s extremely difficult to valid my OMOP data against the EHR. This isn’t an OMOP standards issue so much as it is a process issue, but it’s real for me nonetheless (I’m getting to the point, I promise)

So I am left with several options:

Leave the non-visit encounters in the OMOP database with a type_concept of 0. My thinking was that researchers could just ignore any “visits” with a type concept of 0 for any queries that needed visits. I’m told that isn’t feasible.
Have an intermediate OMOP database that left the visit_occurrence_id in each table so I can use that for validation, and then extract that to another OMOP instance without those non-visit encounters and nullifying visit_occurrence_id in related clinical data records.
Modify my OMOP instance to include visit_source_value in every clinical data table, so I can do my validation with that. That is, of course, an extension of the OMOP standard, and I’m not suggesting it for wider acceptance.

I know that other people are also struggling with this. Neither one of your bottom-line solutions helps this problem, but it would be nice if some solution could be devised that would solve both.

MPhilofsky · October 29, 2020, 4:52pm

We do a combination similar to #2 & #3. Colorado adds MANY columns to the OMOP tables for validation efforts. Then we create proper CDM views for use by others and keep the very wide OMOP tables for internal use only. Our extra columns generally consist of keys, ids, and source tables names to tie the CDM row back to the source table, field, and row.

The philosophy of OHDSI is to keep your CDM pure. By using a combo of 2 & 3, we have been able to keep our CDM pure (with special Colorado modifications, of course ) and retain the ability to validate our CDM data.

I’m interested to hear other solutions from the community.

MPhilofsky · October 29, 2020, 4:57pm

Is the ‘visit date disparity metric’ from Achilles or DQD or?

roger.carlson · October 30, 2020, 3:47pm

In our own data quality checks, we categorize failures as follows:

Fatal : records that won’t even load into an OMOP database
- Null in a concept_id field
- Non-standard concept in a concept_id field
Warning : May or may not be wrong.
- usually source_to_concept_map mapping issues
- if these jump rapidly, there may be new concepts to map
Expected : errors I know will always happen
- Usually zero in * _source_concept_id fields that I know our system does not have.
Invalid Data : data that will physically go into the database, but will throw a Data Quality Dashboard error
- End Date before Start Date
- Visit Date Disparity

Like this:

Run_date	QA_Metric	Metric_field	Error_Type	Standard_Data_Table	QA_Errors
10/29/2020	zero concept	specimen_concept_id	warning	specimen	6
10/29/2020	zero concept	specimen_type_concept_id	warning	specimen	7722
10/29/2020	zero concept	unit_concept_id	warning	specimen	7722
10/29/2020	Zero Concept	discharge_to_concept_id	Warning	visit_occurrence	419435
10/29/2020	EndBeforeStart	drug_exposure_end_date	invalid data	drug_exposure	925
10/29/2020	Visit_Date Disparity	drug_exposure_start_date	invalid data	drug_exposure	9999
10/29/2020	Visit_Date Disparity	measurement_date	invalid data	measurement	13112
10/29/2020	Visit_Date Disparity	observation_date	invalid data	observation	5
10/29/2020	Visit_Date Disparity	procedure_date	invalid data	procedure_occurrence	44691
10/29/2020	zero concept	cause_source_concept_id	expected	death	33
10/29/2020	zero concept	device_source_concept_id	expected	device_exposure	900
10/29/2020	zero concept	value_as_concept_id	expected	measurement	5508403

So I would change the Visit_Date_Disparity metric to include both the day before and the day after. Those seem to be the most commonly reported validation errors involving valid data.

Then I would categorize it as a Warning error. One you should keep an eye on for trends, but not to be concerned about on a daily basis.

I know the Data Quality Dashboard does not have those sort of failure categories, but I think they would be a useful addition.

MPhilofsky · October 30, 2020, 4:09pm

You should post this to the DQD GitHub! It’s a great enhancement idea.

roger.carlson · November 2, 2020, 5:32pm

I added an issue in the GitHub

github.com/OHDSI/DataQualityDashboard

Proposal to Sub-categorize Failures

opened 05:22PM - 30 Oct 20 UTC

closed 04:37PM - 24 Feb 22 UTC

RogerJCarlson

In our own data quality checks, we sub-categorize failures four ways: - **Fat…al**: records that won’t even load into an OMOP database > - Null in a concept_id field > - Non-standard concept in a concept_id field - **Warning**: May or may not be wrong. > - usually source_to_concept_map mapping issues > - if these jump rapidly, there may be new concepts to map - **Expected**: errors I know will always happen > - Usually zero in *_source_concept_id fields that I know our system does not have. - **Invalid Data**: data that will go into the database, but will throw a Data Quality Dashboard error > - End Date before Start Date > - Visit Date Disparity This way we can easily see those errors which are most important to us. This sub-categorization should be editable like thresholds, so individuals can decide which errors are most important to them.

roger.carlson · November 4, 2020, 2:42pm

Back to Valid data which causes an error: Drug_exposure

A doctor writes 3 prescriptions on 6/3. One starts on 6/3 and ends 6/30. The second starts on 7/1 and ends 7/31. And the third on 8/1 and ends 8/31.

I find that these prescriptions are mostly for controlled substances like oxycodone, zolpidem, ritalin, and the like where refills are problematic.

The second and third prescriptions are definitely tied to that visit, but the start dates are month(s) afterwards.