Hello everyone,
We are currently working on an OMOP CDM implementation and need guidance on how to properly model a set of geographic and environmental variables. We recently found the GIS workgroup and would greatly appreciate your help in understanding the best practices.
Our dataset contains a large number of geographic, demographic, social, and environmental attributes. Each patient is assigned to two different types of municipalities:
- A national municipality based on the administrative division used in their country (field: cusec).
- A European-level municipality based on a different spatial classification system (field: eu_cusec).
For each municipality, we have several associated characteristics such as population size, living conditions, socioeconomic indicators, air quality metrics, and other environmental exposures. Consequently, for most environmental variables we have two parallel values—for example:
- nox = NOx annual average (national municipality)
- eu_nox = NOx annual average (European municipality)
Our goal is to construct the external_exposure table using these data.
Our main questions:
1. Can municipalities themselves be considered exposure factors?
Since cusec and eu_cusec represent geographic locations rather than exposures per se, we are unsure whether they should be recorded as exposures in external_exposure or modeled differently.
2. Should we create separate exposure records for each variable (e.g., NOx, EU_NOx)?
Our initial idea was to create a separate record for each exposure factor relevant to each patient (e.g., one entry for nox, one for eu_nox, etc.).
However, this raises concerns because environmental measures like NOx conceptually represent the same phenomenon, even when derived from different geographic classification systems.
How should such cases be represented to avoid duplicate exposures with identical concept IDs?
Is the recommended approach to use the same standard concept ID but differentiate the source or method using:
exposure_source_value,exposure_concept_idvs.exposure_source_concept_id,- or by adding geographic context as an attribute?
We are not sure what the community considers best practice.
3. How should geographic distinctions be encoded within the external_exposure structure?
If exposure type is the same (e.g., NOx), but the geographic basis differs (national vs. European municipality), what is the intended way in OMOP to represent this difference?
Is there a standard for storing spatial granularity, administrative levels, or region definitions?
4. How can we handle variables or exposure types that do not have concept IDs in Athena?
We would appreciate suggestions on:
- How to identify relevant concepts when none seem to exist,
- Is there a place that all GIS concept ids exist?
- Maybe map them to existing broader categories?