Implement OMOP CDM Bioimage Extension + Python Backend/Frontend interface

Eduard_Korchmar · February 20, 2024, 9:18am

The normal mode of working with OMOP CDM “backend” is OHDSI family of R packages. Which is not ideal – R is a wonderful language when it comes tabular data operation and statistical analysis, but it is not a good general purpose language, like Python is. And in 2024, it is way easier to find a Python developer than an R one. I would imagine that image data is also not a prime use-case by R libraries.

I am currently gathering use-cases for a Python toolset for purposefully interacting with an OMOP CDM instance. There are a number of OMOP space projects that rely on logic and libraries which are not the best supported in R, and as such are written in Python. Two I know of:

PyOMOP which interfaces LLM for writing queries against OMOP-converted data
Jackalope core which provides semantic post-coordination support in OMOP CDM following SNOMED model

Both use SQLAlchemy ORM to abstract interaction with the database. There are also some in-house developed tools that use simple dbt-style SQL templating. I think, for a general use-case, Jackalope’s implementation of database interface is generic enough to be copied and extended for other use-cases. Otherwise, SQL templating can take you surprisingly far – 99.9% of the time people choose to use an ORM for compatibility purposes find it that they never actually use compatibility features.

But there absolutely should be a Python library to interface with OMOP. There already are existing Python tools to interact with OMOP, each of which has to invent a boilerplate-clad bicycle to make their tool connect to the OMOP instance. And there are many more other universal use-cases to be fulfilled which could use this library:

Local 2bil concept management
Simplifying updating vocabularies from Athena
Vocabulary authoring support for community contributions
Transitioning custom mappings from source_to_concept_map to modern management model
AI interfacing

I would argue that most of the code for such tool is actually already written, and just needs a PyPI home and a license. I plan to raise this issue on the next Open-Source Workgroup call, but I’ll tag people who could be interested for now:
@Daniel_Smith @beapen @Denys_Kaduk @Alexdavv @yupengli