Hello all,
I’m trying to refine the existing Location table for better conformance with the CDM v5.3.1. The current implementation maintains 1:1 correspondence with the Person table.
There are several requirements in the specification that are a bit unclear:
- Each address or Location is unique and is present only once in the table.
- All fields in the Location tables contain the verbatim data in the source, no mapping or normalization takes place.
- Analytical methods should expect and utilize only the first 3 digits of the ZIP code.
- No country information is expected as source data are always collected within a single country.
Specific questions:
- The majority of the source data is for the US, but there are a few cases of non-US locations. Should these location be removed from the table?
- The source data is not normalized (e.g.
ST
vsStreet
). The DDL doesn’t provide the explicit uniqueness constraints. How the uniqueness should be defined in this case? - If the locations are not unique (by the ZIP code or otherwise), what impact does it have on the common use cases / existing tools?
Thank you.