(THEMIS 2) How do you handle missing visit end dates?

MPhilofsky · October 13, 2020, 2:25pm

Yes, agreed. That is now the stated convention. The above quoted post is from last year before the convention was formalized

bailey · October 13, 2020, 2:44pm

FWIW, option 1 seems to make the most sense to me. Right now, we have discharge_to_concept_id NULL until a patient is actually discharged to somewhere, so that might also be an easy way to identify ongoing visits without assigning a value to this field.

Chris_Knoll · October 13, 2020, 3:13pm

With all respect, I do not agree with this assertion: if I’m in the hospital for 7 days and I’m still in the hospital, my total hospital time is 7 days … and if I hand over my patient level data to someone for analysis as of that day, they should know that i was in the hospital for 7 days regardless if the analysis was performed a week later of a month later.

If we want to know that the patient is ‘still in hospital setting’, i’d recommend some sort of visit_status (like we have with condition-status) to indicate that. Or an observation if we feel that for the most part, visit_status will always be ‘discharged’.

I’m not clear on the use case where we need to know that the visit is completed and only completed visits are valid? A person who is in the hospital for 90 days doesn’t count for anything until they discharge?

yes, but ongoing as of when? See prior points: I run the query to get the total time person is in hospital, using get_date() on a null column to ‘fill it in’ gives me different duration based on the day I run the query…If I want to know how much time people are spending in the hospital, is your argument that people currently in the hospital don’t count?

Maybe the answer is to not populate the CDM with incomplete visits…(your Option 3). Not a fan of that choice tho, I don’t like ignoring data…

Why? I’m not trying to be obtuse, I am generally interested in knowing this reason because there’s some gap in my analytical thinking that this is an important point that I don’t understand.

Sorry, I looked up that ID, and it’s from the UB04 Pt dis status vocabularyID, I don’t see how that indicates ‘still patient’. I think that if we have a discharge_* column on the visit, any visit with a NULL discharge value could be interpreted as ‘not discharged’ but for ETL’ers who aren’t populating that column, everyone would appear ‘not discharged’ but I think this is where the value of NULL (or lack of a value) would make logical sense. (like @bailey described)

Christian_Reich · October 13, 2020, 9:39pm

I think this is the source of confusion, @Chris_Knoll. The reason is simple: The length of a visit is a typical use case for outcome of a disease (the longer, the more severe the condition, or the higher the healthcare utilization). While a patient is still in the hospital we don’t know how long that is going to be case, so having the NULL value kick out those cases is exactly what we need. So, the length is not from beginning to some arbitrary “now”, but from beginning to the actual discharge.

The discharge_to_concept_id is not meant to indicate whether or not a patient is discharged. That’s what the visit_end_date should do. The discharge* field indicates whether or not (NULL in this case) the patient maintains engagement with the healthcare system. If the patient goes home, or bolts against medical advice to go home, by definition there is no healthcare. Otherwise, if the patient goes to a rehab, or hospice, or some kind of home service the concept from the Visit domain will indicate so.

Chris_Knoll · October 13, 2020, 11:21pm

Still not clear why you require a discharge to know that a person has been in the hospital at least X days in order to certify a certain level of healthcare utilization. In the prior example, I was in the hospital for 7 days, at which point I developed a AE of a GI bleed, and we’re going to say ‘must have at least 3 days of healthcare utilization’. Wouldn’t I qualify? (rhetorical) I feel like looking for the end date in the future of some outcome leads to a certain type of bias…

Another way of saying: I don’t think the only use case of visits is ‘until they discharge’. You could start a study (and I have) looking for VTE/DVT events after completion of a surgery (TKA). If the context of this was prospective capture of the data, people go from ER to a hospital bed in the same week, which I want to study. But I can’t because they haven’t been discharged yet. I’m baffled, but I don’t want to beat a dead horse…if the model doesn’t support the use case, that’s fine, just need to be aware of it.

Christian_Reich · October 14, 2020, 10:12pm

@Chris_Knoll:

If the use case is about things happening during a Visit or after admission than it’s fully supported. You wouldn’t look for the end date. You’d take all the Procedures/Measurements/Conditions affiliated with the Visit. If the cross-link isn’t there, like in some data assets, you can still use the open-ended visit, because no end_date means the Visit was still going on at observation_period_end_date. Don’t see an issue.

MPhilofsky · October 19, 2020, 1:46am

‘Total visit time’ = Admission to discharge.

Your use case for an ongoing visit is valid and analysis can be completed on the unfinished Visit.

My use cases: length of stay, healthcare utilization, disease outcome, further utilization of healthcare services with a discharge to SNF or rehab, etc. Who leaves the hospital to go home and who goes to the morgue? These can’t be determined until the Visit ends.

concept_id = 32220 has concept_name = ‘Still patient’ with domain_id = ‘Visit’ and per conventions we are to use standard concepts with domain_id = ‘Visit’. Obviously, the ‘standard’ part is an issue since the concept is non-standard, but I’d like to fix this

According to Christian, discharge to a non-healthcare institution will not be represented in this field because it is assumed the Person went home. Also, this is NOT a required field and not all collaborators have this data.

Option #2 should satisfy all use cases. It’s just against the rules right now. And I would like the rule to be changed to allow NULL Visit end_dates ONLY when a Visit is ongoing. If the Visit is not ongoing, then an end_date needs to be populated. Populating the date of ETL as the Visit end_date for an ongoing Visit is not necessary and will affect analysis

Chris_Knoll · October 19, 2020, 3:48am

Whew I’m glad that we’re on the same page that this is indeed a valid use case.

yes, of course, if your analysis depends on the visit ending, then of course the visit must end, but the primary point of this thread is leaving visit_end_date null, and for your study those records wouldn’t have an impact. I would prefer that you could use discharge status to determine…discharge status, and I appreciate the logical attraction to ‘visits that didn’t end don’t have an end date’, but I need some help resolving the following:

I’m calculating an incidence rate where I’m going to calculate the number of cases / total follow up time. We’re only going to use visits for total knee replacement, and we’re looking for an outcome of deep vein thrombosis.

In this data cut, I have people that are discharged and people who are ongoing recovery. For one person, they stayed in the hospital for 7 days without DVT issues, so they contribute 7 days of follow up. For another person, they developed DVT on day 4 of their visit, and left on day 8. This person contributes 4 days of follow up, and is a case.

But, I find a person who started their visit, and developed DVT on day 3…this person hasn’t been discharged, so I am able to find their follow up time (3 days) and put them into the denominator of my IR calculation. However, there is another person who started their visit, did not develop a DVT…and there is no visit end date. I don’t think I can just ignore the person, as to not include their follow up time will bias my IR calculation. But they are contributing time without a DVT…How many days should they contribute to my IR calculation?

Ok, so taking this direction, we would take the observation_period_end_date as the visit_end_date if visit_end_date is null. If we want to make this a rule, we should make sure that our tools apply this logic.

roger.carlson · October 28, 2020, 6:46pm

I’ve been following this issue, but have refrained from commenting so far because, as an ETL developer, I’m more interested in getting data in than getting data out. Our standard is not to put any hospital visit into OMOP until it is completed (discharged).

I have a background in Quality Improvement, and we did the same thing there. Our model was to use Statistical Process Control charts to track retrospective trends in data to avoid special cause variation. We required between 18-24 data points (usually monthly), so patients currently in the hospital were statistically insignificant.

It seems to me, research is much the same. It’s looking at data statistically; not so much individually. Christian has often said that OMOP is observational data for research, not to run your hospital or insurance company. Is it also fair to say it’s not for near-term clinical decision making?

Now, if you’re using an OMOP database as a reporting database, then you certainly have a use case for including patient who are not yet discharged. You can use it however you like, but I don’t think that should influence the standard.

guyt · January 15, 2024, 5:33am

Sorry to be re-opening this topic after more than 3 years but it seems (as far as I can google) that it is still unresolved. Concept 32220 is still not standard and NULLs are still not permitted according to the 5.4 documentation. I think the consensus is that for many use cases (albeit not all) is to allow ongoing visits.

There seem to be a variety of use cases for the data but we are looking for one rule for ETL so can I suggest option #6:

NULLs are allowed for visit_end_dates only for ongoing visits (similar to several previous suggestions but without making any other change to ETL)

This means that discharge_to_concept_id is not used to indicate the visit status so we don’t need to worry about 32220.

Because the researcher is the only one that knows their own use cases, it is their responsiblity to decide what to do with such visits. If they want to exclude them, then that’s easy to do. If they want to include count the LoS from admission date to the ETL date then they can coalesce with the ETL date from the METADATA table.

KSimon · July 15, 2024, 4:56pm

I would use the CDM_Source table’s source_release_date column since it is a required table that all implementations should include vs the MetaData table that may or may not be utilized by everyone.

Although I understand that the Discharge To Concept ID is not meant to hold a patient’s actual status, it seems as if this column would more accurately describe the patient’s status than the Visit Type Concept ID.

If we were to make the Discharge To Concept ID column required (0 if unknown) and then have a standard concept for “still patient” it would enable us to write a data quality check that requires the Visit End Date to be populated unless it’s an ongoing visit.

MPhilofsky · July 16, 2024, 2:20pm

Good point. I don’t know how many collaborators use the Metadata table.

Correct, the visit_type_concept_id, and all type concept_ids, hold the provenance of the record. Where did the record originate? This field is not about the person or their status.

The CDM only requires fields necessary for analytics: start dates for all clinical events, length of exposure to a drug, person_id in all clinical event tables, length of stay, etc. Where a person goes after a visit isn’t necessary for the majority of use cases and many data sources don’t have this information. We could make “still patient” a standard concept_id and allow the visit_end_date to be NULL. Or we could make “still patient” a standard concept_id and still require the visit_end_date to be populated. Or we could come up with another solution. The one thing we as a community need to do is decide on a solution which satisfies the use cases, doesn’t break any OHDSI tools and doesn’t break any studies. Or if something must be broken, then we must give it very thoughtful consideration from a wide audience to ensure it is absolutely necessary.

@KSimon, Thank you for your input! This has been an ongoing issue for quite some time. Let’s continue the discussion on this CDM GitHub issue.

kescox · September 30, 2025, 4:59pm

Reviving this issue again because I’ve taken on Themis sponsorship of – specifically – the issue of what we do visits that are ongoing at the time of ETL.

The current convention we have for this appears to date to 2018, and has not changed in that time despite ongoing discussion. It is:

Set the Visit End Date to the date of data extraction.
Set Visit Type Concept ID to 32220, ‘Still patient.’

Issues with this convention:

32220 is not a Type concept and is non-standard, so its use in the type concept field breaks CDM standards.
32220 additionally doesn’t have the “meaning” of a Type concept, which are meant to describe the provenance of the data (e.g. from EHR records).
A valid date (vs NULL) in Visit End Date for ongoing visits may cause researchers to include patients in their study where this is not appropriate – e.g. where length of hospitalization before discharge or death is an outcome, ongoing visits should not be included. Current documentation on this convention might not be clear enough to prevent researchers from including these visits.

Proposed alternatives:

Do not bring ongoing visits into the Visit table.
Allow NULL Visit End Dates to cover all cases of missing end dates for visits – inclusive of ongoing visits.
Allow implausible Visit End Dates (‘2099-12-31’ or similar) as a sentinel value for a missing end date – inclusive of ongoing visits.
Allow NULL Visit End Date only in cases of ongoing visits.
Allow NULL Visit End Date only in cases of ongoing visits and set Discharge To Concept ID to 32220, which should be made a standard concept.
Populate Visit End Date as the date of data extraction for ongoing visits and set Discharge To Concept ID to 32220, which should be made a standard concept.

After reviewing discussion in this thread, the alternative that would seem – to me – to break the fewest (current) studies, not introduce unnecessary error to future studies, AND be the easiest to update OHDSI tools for is to do the following for ongoing visits:

Set the Visit End Date to NULL.
Set Discharge To Concept ID to 32220.
(Optionally?) Add a convention that directs researchers to use COALESCE(visit_end_date,CDM_source.source_release_date) when determining visit duration for patients still in hospital at the time of data extract.

In order to support this change, we would need to:

Change CDM standards to permit NULL Visit End Dates in the case of ongoing visits in v5.4/v6 (CDM WG).
Standardize concept 32220 (Vocabulary WG).
Update existing conventions (Themis WG).
Update any OHDSI tools that assume a non-NULL Visit End Date, and/or add options to treat the Visit End Date for NULLs as the source release date per the convention above.
Develop a data quality check that confirms NULL Visit End Dates always correspond with concept 32220 in Discharge To Concept ID.

Would appreciate all feedback on this suggestion – including studies this suggestion might break, and/or uses cases not served by this convention.

prasida767 · October 16, 2025, 11:15am

Apologies to bring this up again.
Any official changes to how we handle missing visit end dates? I can see lots of suggestions being brought up in the thread.
Currently using extraction end dates for patients without visit end dates.
OHDSI still recommends visit_type_concept_id = 32220 for unfinished admissions however it still is a non-standard concept.
Love to hear the summary of the progress here, or any alternatives.

MPhilofsky · October 16, 2025, 2:10pm

@prasida767

@kescox Is sponsoring this issue. That means she is gathering and organizing the use cases for:

Do you have a need to know if a person is still in the hospital / the visit is ongoing at the time of data extraction? If yes, connect with Kim. If not, keep using

The following is incorrect documentation and needs to be updated:

The type_concept_id is the provenance of the record. Where did the record come from. It shouldn’t be recording the “status” of a visit.

kescox · October 16, 2025, 3:12pm

@MPhilofsky

It seems like at the very least getting the documentation updated to move 32220 to discharge_to_concept_ID vs visit_type_concept_id is an easy first target – who do we contact to make that documentation update?

MPhilofsky · October 16, 2025, 6:31pm

@kescox

The CDM v5.4 specifications are under the CDM WG’s domain. I have an open issue on their GitHub located here. I’m pretty sure this is on their roadmap for updating. I have an upcoming call with @clairblacketer to sync on CDM & Themis issues. I’ll be sure to bring it up then. Stay tuned!