OHDSI Home | Forums | Wiki | Github

Location table: uniqueness and inclusion requirements

(Andrey Ilatovskiy) #1

Hello all,

I’m trying to refine the existing Location table for better conformance with the CDM v5.3.1. The current implementation maintains 1:1 correspondence with the Person table.

There are several requirements in the specification that are a bit unclear:

  • Each address or Location is unique and is present only once in the table.
  • All fields in the Location tables contain the verbatim data in the source, no mapping or normalization takes place.
  • Analytical methods should expect and utilize only the first 3 digits of the ZIP code.
  • No country information is expected as source data are always collected within a single country.

Specific questions:

  1. The majority of the source data is for the US, but there are a few cases of non-US locations. Should these location be removed from the table?
  2. The source data is not normalized (e.g. ST vs Street). The DDL doesn’t provide the explicit uniqueness constraints. How the uniqueness should be defined in this case?
  3. If the locations are not unique (by the ZIP code or otherwise), what impact does it have on the common use cases / existing tools?

Thank you.

@clairblacketer @rtmill @Andrew @krfeeney

(Robert Miller) #2

Hi Andrey,

There’s a lot here but I’ll try to touch on each point.

Yes, hence the need for the location_history table as this is not a static 1:1 relationship in the real world and data. If I remember correctly, the location_id FK in the person table was left in for backwards compatibility.

This was meant as more of a best practice recommendation rather than a requirement. In other words, I would not try to enforce this via database constraints or similar mechanisms. The recommendation is for efficiency purposes to avoid duplicating locations, most easily tested for equivalence by comparing the coordinates.

As I understand it, this was decided long ago by the CDM working group and is just to say “we don’t map these strings to concepts”. Normalization can take place and, more often than not, is needed if you wish to geocode your location data.

This suggestion didn’t come from the GIS group so I’m not positive, but the three digit restriction is rooted in HIPAA.

There have been discussions around changing the fields of the location table to an international representation but they haven’t resulted in anything thus far. If this is an appealing change we can revisit the topic.

Happy to elaborate on anything.