Guidance Needed on Modeling Geographic and Environmental Variables using the GIS Workgroup

Hello everyone,

We are currently working on an OMOP CDM implementation and need guidance on how to properly model a set of geographic and environmental variables. We recently found the GIS workgroup and would greatly appreciate your help in understanding the best practices.

Our dataset contains a large number of geographic, demographic, social, and environmental attributes. Each patient is assigned to two different types of municipalities:

  1. A national municipality based on the administrative division used in their country (field: cusec).
  2. A European-level municipality based on a different spatial classification system (field: eu_cusec).

For each municipality, we have several associated characteristics such as population size, living conditions, socioeconomic indicators, air quality metrics, and other environmental exposures. Consequently, for most environmental variables we have two parallel values—for example:

  • nox = NOx annual average (national municipality)
  • eu_nox = NOx annual average (European municipality)

Our goal is to construct the external_exposure table using these data.

Our main questions:

1. Can municipalities themselves be considered exposure factors?

Since cusec and eu_cusec represent geographic locations rather than exposures per se, we are unsure whether they should be recorded as exposures in external_exposure or modeled differently.

2. Should we create separate exposure records for each variable (e.g., NOx, EU_NOx)?

Our initial idea was to create a separate record for each exposure factor relevant to each patient (e.g., one entry for nox, one for eu_nox, etc.).
However, this raises concerns because environmental measures like NOx conceptually represent the same phenomenon, even when derived from different geographic classification systems.

How should such cases be represented to avoid duplicate exposures with identical concept IDs?
Is the recommended approach to use the same standard concept ID but differentiate the source or method using:

  • exposure_source_value,
  • exposure_concept_id vs. exposure_source_concept_id,
  • or by adding geographic context as an attribute?

We are not sure what the community considers best practice.

3. How should geographic distinctions be encoded within the external_exposure structure?

If exposure type is the same (e.g., NOx), but the geographic basis differs (national vs. European municipality), what is the intended way in OMOP to represent this difference?
Is there a standard for storing spatial granularity, administrative levels, or region definitions?

4. How can we handle variables or exposure types that do not have concept IDs in Athena?

We would appreciate suggestions on:

  • How to identify relevant concepts when none seem to exist,
  • Is there a place that all GIS concept ids exist?
  • Maybe map them to existing broader categories?

Hi @manoshatzak:

You know that the likelihood of a response is reverse proportional and the response time is proportional to the length of your post? :slight_smile:

That is not a problem. The OMOP model is used to that. You can have drug exposure records derived from e-prescribing, the pharmacy, EHR administration records or patient medication lists. Theoretically, you could have the same drug exposure 3 or 4 times in parallel. We have the DRUG_ERA table is designed to deduplicate that, but in reality most analytics decide to take care of that problem themselves.

The OMOP way is the the former. You generate the exposure records from the geographical location of the patient and your knowledge of the “municipality” (I take this as a geographic region). It is up to the ETL to get that done, but you may want to publish guidelines.

You can do the latter, but then you kick the can down to the poor analyst, who will have to cobble together the exposures. We tend to want the data to be analysis ready.

This usually gets encoded in the concept. You have the same thing in MEASUREMENT records, for example. You can have glucose measurements by test strip, glucometer, etc. In reality, these method details end up not essential in detecting associations between interventions or exposures and outcomes. Which means you would have a concept “NOx”, “NOX by national municipality method”, “NOx by European municipality method”, where the latter two would be hierarchical descendants of the former.

You get them added though the Community Contribution mechanism.

Does that answer the questions?