Non-US Location/Geography Hierarchy

dmyers3 · August 22, 2016, 9:31am

Hello-

I am based in Sweden, and we are working on converting some Swedish pharmacy data into OMOP. We would like advice for location mapping since Sweden doesn’t fit into the same city-state-Zip as the US system.

The Swedish system is organized very well hierarchically with 6-digit codes.

The first 2 digits represent the county, but this is more like state in the US. In other words, it is the way the whole country of Sweden is divided.
The next 2 digits represent community and could be considered similar to city in the US. For example, Stockholm proper would have one 4 digit code. Each of its suburbs would have its own.
The final 2 digits represent the local district within a city. A small city may only have one, whereas a large city like Stockholm has almost 30. Using NYC as an example, these would be more like the Upper East Side, Midtown West, East Village, etc. than the entire borough of Manhattan.

Each prescription will come with the full 6 digit code and we have a comprehensive mapping table of all 6 digit codes.

While one approach could be to put the 6 digit code in the zip field, use state for the 2 digit level, county for the 4 digit level, and city for the 6 digit level, that seems rather crude.

Is there any other approach that could be recommended? Anything that’s been done for other non-US data converted to OMOP?

Thanks for any insights!

Christian_Reich · August 22, 2016, 12:56pm

@dmyers3:

The LOCATION table has all these mostly US-derived fields, but in essence it is intended to work in such a way that you could stratify by location. So, each location field is some reasonable geographic aggregation of patients or providers. In other words, you would write a something like group by location_id. If you put in there individual addresses, apart from making the data very sensitive because of identiable information, it will kill that idea.

Having said all that: I would make a location_id either for each of the full 8 digit postal codes, or only for the first 6. The choice will depend on what you think is a reasonable level of aggregation. Given the in the US the zip codes have 5 digits, and there are a whole lot of more Americans than Swedes on this planet, I’d go with the 6. In the US, folks often even roll up to the first 3 digits of the ZIP code. So, you may even consider only 4 digits (county + town).

What we don’t have is a way to do those aggregations in a database-independent way. The addition of European data are nicely pushing this idea, since from an US-centric point of view all you do is think ZIP codes.