OHDSI Home | Forums | Wiki | Github

New GIS Workgroup

Greetings,

We would like to announce the start of a GIS workgroup!

This summer, a collaboration of the University of Southern Maine and Maine Medical Center Research Institute started the process of bringing GIS capabilities to the OHDSI platform. We have reached the point in the project where we would like to reach out to see if anyone would be interested in joining in on the endeavor. A short presentation regarding where we are with the project is available on the September 6th OHDSI Collaborator meeting (relevant portion starts ~2:50)

Qualities of our dev team:

  • Supported by consultation from a cross-disciple academic cluster, including faculty from bioinformatics, statistics, computer science, public health and GIS
  • Ambitious, creative, team players
  • Aim to make significant improvements to OHDSI
  • There is one of us so far

While I’m sure most anyone can find a way to contribute, here is a list of skills we are actively seeking:

  • Database design/integration - how does all of this fit together?
  • OHDSI tool implementation - how can we leverage existing functionality?
  • Geocoding - which APIs should we incorporate? how to approach data privacy concerns?
  • Mapping/Visualization software - QGis, Geoserver, etc
  • Geospatial variable generation - which function is the best fit for situation x?
  • R programming, specifically package design
  • OHDSI design standards

Options for more information:

  • Watch the WebEx video linked above
  • Flip through slides of aforementioned presentation
  • Contact via email (robert.miller@maine.edu) or forum message
  • Come say hello at the symposium

Thanks!

1 Like

I’m interested in helping if I can, yet our data is fully de-identified,…
The question or use case is how to utilize/integrate de-identified data…

I have worked with Geoserver in the past and state based GIS data. I am a novice yet enjoy the space…

Hello, @rtmill

I’m also very interested in developing GIS system on OHDSI platform.
We have two databases in OMOP CDM, one is from a tertiary hospital, and the other is from Korean national administrative data.
I read your poster and abstract on OHDSI symposium 2016.

I’m still a novice in this area. I’ll be so happy if I can join this project

Cheers!

@rtmill

Great workgroup meeting and discussion last friday. Some thoughts for the workgroup. We need to create a framework for geo-referencing. e.g.

  • Take an address and parse into components. Here is a good solution that we fork and adopt: https://github.com/datamade/usaddress
  • Take the parsed address and map to Census address https://www.census.gov/geo/maps-data/data/geocoder.html This will require ability to download-census files into a local environment - PostGis?. Do some form of probablistic matching. Get the latitude and longitude for the most accurate match based on the probablistic matching. Use a hierarchy starting with lowest granularity of match, and go up (roof, street, block, census tract, city, zip, county, etc.). Create some form of metric around confidence around the match. Get the latitude and longitude.

Hi ,

I am also interested in geocoding. While researching on the geocoded I found out the Openstreetmap(OSM) data that could be downloaded and installed on the internal server. They have their on API Nominatim, which is a docker container, can be installed and setup on internal server which can remove the data privacy issues. The only cons of the OSM is that the lat and long data has the lag by one mile as the data is based on the tagging the address. We were planning to do batch coding so, we stored the data in CSV file formats. Using the same API we parsed the address in the file and looped the request to the internal server for all the addresses. The program was written in python to read the CSV and make request to API.
I would like to be part of the group and contribute.
Thanks.

Hello @rtmill

I am a research manager at vanderbilt university involved with our EMR research database. The de-identified version is called BioVU and its linked to patient DNA . The identified version is called the research derivative. We are in the initial stages of geocoding pt addresses and linking GIS data to patient records. We are also in the process of converting our data bases to OMOP/OHDSI. I would be happy to contribute to this WG in any way I can. I also have questions about where OHDSI currently stands regarding patient location data as well as geodata. From the slide deck on v5.0.1, will ‘location_id’ be the domain entity where geographic identifier variables will be housed? Or will it be a different domain entity? It states that location histories are not available. Are there plans to allow for multiple locations associated with a person_id? And finally, has there been discussion regarding the vocabularies of geocoded data? census data, for example? Thanks for your time. Happy to chat over the phone or provide you with more background if needed.

@dconway We’d love to have you on board.

Those are about as pertinent of questions as you could be asking at this stage. Your timing is also quite on point as the topic of our next meeting (Monday) will be incorporating the AEGIS group into the WG where we will likely start to discuss the content of your inquiry.

I’ll send you a direct message with my contact information so we can get caught up beforehand.

Hello everyone, hope you are doing well! My name is Sneha Ravi, I’m a Biomedical Informatics Data Scientist at Stanford and new to the GIS-OHDSI working group! I am just trying to get an understanding of the geocoding tools available, and had a few questions:

I was just wondering if:

  • There were any geocoding / census tract tools in development by the GIS workgroup?
  • What tools you are currently using that you like?
  • And any experience working with the degauss geocoder (https://degauss.org/)? We are currently testing that here.

Thank you so much for your help!

Hi @sneha,

I know its been almost a year since your post, but we have just set our 2023 Q1 OKRs and one of them deals specifically with our geocoding recommendations:

Overview
Develop a privacy-preserving geocoding mechanism that runs locally to relate geographic and person-level data. This process attaches longitude and latitude to patient residence data from electronic health records or others sources. With proper metadata, this can be used on arbitrarily defined sources of data on geographic attributes.

Highlights

  • Finalize a geocoding method and recommendation
  • Define a test to compare group’s original work with an open-source solution (degauss.org)
  • Documentation for geocoding methods (at least documenting the fitness-for-use test)
  • Define what a successfully geocoded data point is
  • Strategize: How to define returns in an international unit in levels of increasing granularity from lat long; International jurisdictional units

The group did develop a functional privacy-preserving geocoding mechanism (see this branch of our GH repo) that essentially dockerizes the PostGIS/Tiger geocoder. Some have found our homegrown solution to be a little cumbersome (~100GB of data) especially compared to Degauss (~6 GB).

If you wanted to share your experience testing Degauss (any criteria used for your test, general results, if you’re using degauss now) or other experience with geocoders, we’d greatly appreciate it!

Also, consider this an invitation to re-join the OHDSI GIS WG :grinning: the group meets every Friday at 9 ET, but subgroups meet sporadically throughout the week.

t