Person location over time?

rtmill · June 9, 2016, 8:19pm

Greetings.

My question is an extension of a discussion found here: CDM person record not structured to capture patient mobility

I am trying to work out the details of how GIS capabilities can be applied to the CDM.

From how I understand it, the static nature of the location_id within the Person table does not allow multiple residencies for a single patient. We want to be able to know where a person’s residence was for a given date. A basic use case would be ‘where did x live the day they were diagnosed with y’. Our source data (ETL in the works) provides a record for each residency a person has had, along with starting and end dates.

In a separate health demographic system I’m working on, this is solved using a Residency table (residency_id, person_id, location_id, start_date, end_date). My thought was that you could take the date from a given observation and use the residency table to get the person’s location for that point in time.

Is there an easier or more conventional way of doing this?

jon_duke · June 9, 2016, 8:26pm

I would second the value of having more temporally granular location data. I would also be interested in having the option for lat / lng information rather than address for applications where this is the more relevant means of documenting location.

Christian_Reich · June 9, 2016, 9:38pm

Friends:

So, here is what you want to do: Put the proposal (with fields and all) into the Working Group page. And then we can discuss it.

@rtmill’s idea: We could do something smooth: the location_id would either point to an entry in the LOCATION table, containing the location, or the latest location, and then we would add a location_history_id in the location table, which would point to a new LOCATION_HISTORY table with location_ids (pointing back to LOCATION) and time intervals. That would be the most backwards compatible way to deal with that. What do you think?

@jon_duke: Put your lat/long right there.

Chris_Knoll · June 9, 2016, 10:29pm

I’d drop the location_id, introduce a location_history table, with a start_date and end_date. ‘current location’ doesn’t have any meaning when there’s nothing in the cdm to say what the ‘current timeindex’ is. I’m using the same reasoning here as to why we don’t have an ‘age’ column in the person table…age as of when? So, I suggest we treat the location the same way: a location id, and a timewindow that the person was located at that location in a location_history table.

Christian_Reich · June 10, 2016, 11:26am

@Chris_Knoll:

Well, I was thinking that in 90% of the cases there is no history. Only some “current” location. For those implementations, the history table would be a complete overkill. And even if the history exists, most analytical use cases will only want the current address to do some stratification.

So, leave it all intact for the existing functioning use cases, and add one little field in the LOCATION table plus another rather small optional LOCATION_HISTORY table to get it done. Called “backwards compatible”.

Chris_Knoll · June 10, 2016, 2:02pm

@Christian_Reichgasp you’re playing the 'backwards compatable" card after the last cost table update?!?

rtmill · June 10, 2016, 2:53pm

@jon_duke It would appear we are on the same page. Adding lat / long to the location table we view to be the most straightforward way to enable GIS capabilities. I’m a bit new to the OHDSI realm so I wasn’t sure if I should bring it up in this discussion or not.

@Chris_Knoll Are you considering location_history to be equivalent to the residency table described above? From how I understand it, you are considering a location to be a residency, not a distinct location itself. In other words, what if a person moves from location a to location b, then back to location a. Would that be two or three separate records in the location table?

@Christian_Reich : I agree that it would make sense to leave the location_id within the person table, most recent address or something similar, for general purposes. However, I don’t see putting a location_history_id in the location table to be the right move. I view the location_history (or residency) as a many to many relationship from person to location.

I appreciate the help with putting the proposal in order but on our end we are still working through the details of how this could work. This post was intended to get some advice to help prepare for a conversation and/or proposal towards the end of the month. If you would like me to submit just this portion of it, either the residency table with or without the lat/long, please let me know and we will put something together.

Christian_Reich · June 10, 2016, 4:07pm

Those cost tables didn’t work. There is nothing compatible to “not working”. So, we fixed things.

Yes, yes, you are totally right. It wouldn’t work the way I suggested. We’d have to put a key to a LOCATION_HISTORY table into the PERSON table, in addition ot the location field. Ugly.

That is totally ok. The proposal is being worked on. You don’t have to have a final perfect solution to start the discussion. Just start it.

Chris_Knoll · June 10, 2016, 6:09pm

@rtmill, Yes, the location_history I’m describing is more of a ‘residency’ table where a person_id has a location_id, start_date, end_date, it’s one to many (one person has many residencies), and the foreign key would be to a LOCATION table which is the unique set of locations (physical location a residency resides), and if a person goes from a to b and then back to a, it would be three separate records, representing the duration of resedency at each location of the person.

I’m not sure if we should handle or could handle a case where a person has two active residencies (like a main home and a vacation home) but I’m assuming we’d make this the person’s ‘primary residency’ (a person can only have a single primary at a time), so that would eliminate the possibility of a person existing at 2 locations during the same time window.

Gowtham_Rao · October 12, 2016, 11:36pm

Location history is very important - thank you for proposing this @rtmill .

@Christian_Reich agree with keeping location_id in person table - to represent current location. This may make it easier for simple analysis + the backward compatibility.
@jon_duke lat/long in location table would be amazing.

Has this been discussed in prior workgroup meetings? I did not find it at http://www.ohdsi.org/web/wiki/doku.php?id=documentation:next_cdm

This would be very valuable to health plans or hospitals - who have location history.

rtmill · October 12, 2016, 11:58pm

I put in a proposal to the workgroup which can be found here, though it hasn’t left the ‘Need Preparation’ list (title is ‘Dealing with more than one location over time’).

@Christian_Reich If there is specific content that the proposal is missing please let me know and I’ll adjust accordingly.

Christian_Reich · October 13, 2016, 12:35pm

Looking good. We’ll bring it on.

Gowtham_Rao · October 14, 2016, 12:44pm

@rtmill @Christian_Reich

Thank you for bringing this forward. Location_history is definitely relevant and important around a person. But it may also be relevant around a provider, a care site; maybe even a device, or many other entities that may move with time.

Could we generalize this table by adding a column that represents ID type - to represent care site, provider etc?
What are the thoughts on increasing the granularity of the period by adding optional time to the table (start time and end time)

Andrew · October 14, 2016, 1:19pm

@Gowtham_Rao
I agree that change in location over time is relevant to more than patients and a more general solution is preferable to one that is exclusively tied to patient-specific records.

Re granularity: If there is to be a general shift toward date and time across the CDM, it makes sense to build this in now. I don’t know of use cases where the time would matter, but perhaps you are thinking of things like devices which we didn’t consider in our initial thinking about this.

DTorok · October 14, 2016, 2:17pm

Agree with leaving location id in person table. With the idea that the location is last know (or something like that). Then can have add a person_location table with the fields:

person id
location_id
start_date
end_date

Maybe we need to define optional tables that present a consistent way of handling certain requirements that are not universal enough to require the tables be added to the ‘Standard’ CDM, but documented so that the same problem is not solved 10 different ways.

Gowtham_Rao · October 14, 2016, 3:51pm

@rtmill could you please review my updates to your original proposal - hope this was acceptable and reflects our discussion in this thread.
here

Group, especially @Christian_Reich -I am not sure how to update the proposal to reflect the need to generalize to device, provider or care site location history. What do you think?

Christian_Reich · October 14, 2016, 4:22pm

@gowtham_rao:

Do it right here.

Gowtham_Rao · October 14, 2016, 4:31pm

@Christian_Reich I updated it for all the topics we discussed on this thread, except for generalizing it to device, provider or care site. I dont know how to represent vocabulary for device, person, provider, caresite (still learning)

Chris_Knoll · October 14, 2016, 4:52pm

So, sorry to throw some water on a part of this idea, but having the ‘last known’ value in the person table is akin to putting their last known age in the table. when doing longitudinal research, you are going to be asking for their location at a time index in the past, never the ‘last known’ for the same reason you would be looking at someone’s age at the time index and not their last known age. So, I’m all for leaving the location field in the table for backwards compatability, but eventually it should be phased out for a one-to-many relationship in the CDM to reflect that locations change over time. I woudln’t put a ‘last known’ field in the table and have people depend on that.

-Chris

Christian_Reich · October 14, 2016, 5:32pm

@Chris_Knoll:

Hm. I hear you. From a modelling perspective you are right. The problem is this: 90% of the use cases don’t care whether somebody moved 5 years ago from Arizona to North Dakota. They want to summarize or stratify data by location. Yes, if the data are 5 years old, than the patient was living in Arizona and should be counted there. But hardly anybody does that, and the resulting numbers for the few movers wouldn’t change much. That’s why the “last known location” is an approximation good enough for most use cases, but much much easier to handle. Plus: Age does change over time, the location doesn’t. I have been here in the 02138 zip code for 10 years.

@Gowtham_Rao:

Not sure what the problem is. The location table is the same for all dimensions. It’s kind of a reference table, and it gets referred to by location_id.

BTW: Why do you need location in a device?