OHDSI Home | Forums | Wiki | Github

Themis Topic: Location Table: Non U.S. address/locations

Is it possible/desired to modify the location table to capture non-US locations?

Does having a latitude and longitude field solve this?

We have previously discussed this briefly in the GIS WG. I think establishing country-specific conventions would go a long way. We could leave the field names in the US format but I’d lean towards using more universal terms.

What about something along the lines of :

Paired with country-specific conventions detailed in the wiki :

@Gowtham_Rao Whether needed in this context or not, adding coordinate fields makes sense to me so I added them in (maybe they wont notice?)

1 Like

Robert and Gowtham, Thank you for the feedback. I will bring your input to the Themis working group as a topic for consideration.

http://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:themis:focusgroup4

I think it would be nice to refer to the GADM system.

Database of Global Administrative Area (GADM) is spatial database of the location of the world’s administrative areas, which including 294,430 administrative areas in about 250 countries (virtually whole world). Most administrative areas in GADM classified into three hierarchical levels.

In the United States, there are classified into three levels. 1 - Nations(eg US), Level 2 - States(eg Texas), and Level 3 - Counties(eg Houston)

What would be the license considerations for gadm. Can it be used by non academic organization?

@Gowtham_Rao
The phrase from the gadm site
"This dataset is freely available for academic use and other non-commercial use. Redistribution, or commercial use is not allowed without prior permission. You are free to create maps and use the data in other ways for publication in academic journals, books, reports, etc. "

1 Like

Thank you @SCYou.

Any experience using GADM for commercial purposes? Are there fees for the license? How updated is the data source - e.g. the website does not seem to regularly updated.

@Gowtham_Rao
Unfortunately, I didn’t have those experiences for commercial use.
Though they told that version 3 was expected to be available in August 2017, the version 3 has not been updated, yet.
We don’t know how regularly these data has been updated, but we found that two level 3 (smallest) district are missing in GADM and send them a message to fix it. We can see whether it does work or not.

If OHDSI decided to use GADM, then we can contact GADM to suggest collaboration. Then, we can figure out how we can work and develop global location system together.

1 Like

Thank you @scyou. I know we are deviating from topic of this thread - but, this is an important topic.

Have you tried other mapping API services. i.e. if your addresses are already geocoded with lat and long, could you use API services of Google map, openstreetmap map, Baidu map, mapbox etc to render the background tiles for your map.

Hello @Gowtham_Rao !

We are already using google map as backgraound tile, leaflet is also considering using.

example :

And spatial data use GADM. GADM is easy to download and manipulate in R, and has the advantage that it can be used for various Spatio-temporal statistics(raster package etc.) in conjunction with CDM. (This part is under development)

1 Like

All,

Will you guys be able to attend the next THEMIS working group? We can talk a bit more about this if you guys have the time this Thursday. The meeting is this coming Thursday at 2 PM EST/. @Tom_Galia Can you please provide the meeting info.

@SCYou @Jaehyeong_Cho

Refer to GADM in what way? I think it could make sense if we are leveraging their coding system to create standard concepts for countries (although the proposal here appears to be leaning towards using ISO).

From my point of view, referencing a specific region within the location the table causes things to get messy in a hurry. Which region type do you choose as the associated region of a location? What if the regions don’t roll up cleanly? Assuming you choose the most granular region type, GADM would not viable for US implementations as census block/tract are common representations for region statistics and are not included in GADM. It’s a similar situation as the relationship between person and location right now, trying to force a 1:1 relationship when really its many:many.

Unless you mean having a different field for each region level? (e.g. state_gadm, country_gadm etc).

@rtmill
We suggest one way how to capture non-US locations.
We should consider how to associate non-US GIS data with CDM,.

Currently, we’re using fact_relationship table to link location information in person table and GADM system. We’ve already linked air pollution and CDM on the basis of GADM in Korea. And this model can work worldwide.

We didn’t suggest that OHDSI should use GADM as the location system. But OHDSI can take a look on GADM.

@SCYou @Jaehyeong_Cho

I misunderstood, I had thought you wanted to maintain references to regions within the location table itself.

I’m hoping you can elaborate on this a little more. I understand it as using the FACT_RELATIONSHIP table to store relationships between records in the person table and region identifiers.

  • I don’t see any date fields in FACT_RELATIONSHIP. How is the temporal nature of the relationship maintained?
  • Is there a relationship record for every record type or do they roll up? (n = # person * # region type VS. n = #person)
  • How is GADM referenced within FACT_RELATIONSHIP? Are you storing the GADM identifiers as concepts or putting them directly into the domain_concept_id_2 field or something else?

@rtmill I’m sorry I’m late

If the location table contains coordinate information, etc., it will be too complicated and I think it will be different for each institution. (For example, if you want to put the local information about New York, putting the centroid coordinates of New York?, or putting the coordinates for all the boundaries of New York? It’s too complicated.)

So, I think it’s good to keep the current form in the location table.

In case of our air-pollution study, We are proceeding by creating a temporal-table arbitrarily.

I still think I need a table to record the change of location.

The reason we use GADM is as follows.

However, since the identification number for the region is different between the location table and the GADM database, the fact_relationship table is used for analysis as follows.

example (arbitrary value) :

@Jaehyeong_Cho @SCYou

How to best represent regions was a hotly debated topic in the GIS WG that was set aside while we work on schema and vocabulary decisions. Due to the current focus on location data from THEMIS, we’ll plan to revisit this conversation in our next meeting and relay the results back to the forums.That said…

I believe there should be a strong distinction between location and region. A location is a place, represented by a point, where a region is an area, represented by a polygon. Trying to store and reference the two as equivalent unnecessarily creates issues, such as…

If the location table solely contains locations, or points, this is not an issue.

The plan is to submit a detailed proposal once we nail things down but in the meantime, here’s a snippet from an old schema draft:

First, the biggest obstacle in designing this was that (I believe) a majority of the OHDSI community uses the location table to refer to regions. In other words, their source data does not contain street level address data. I have encountered ETLs where the location table only stores unique zip codes and maps thousands of individuals to the same ‘location’. Initially I attempted to reconcile this by including a ‘location_type’ field in the location table which designated the location as either a place or a region. Building off of this, queries and attribute assignment gets messy and inefficient.

I’m proposing a different model in that any record in the location table refers to a specific location, never a region. If the source data only contains zip codes, then a record in the location table should not refer to the region itself but to ‘a location somewhere within the region’. It’s seems like a trivial distinction but the effects echo throughout the rest of the design.

If you have a location table that contains only locations, a region table that contains only regions and a mapping table to go between the two, it significantly simplifies things (we then don’t have to check the location table for regions, store varying data types, reference polygons vs. coordinates, etc).

2 Likes
t