OHDSI Home | Forums | Wiki | Github

Representing OMOP in a graph database

(Luke Rasmussen) #1

I am collaborating on a project that is looking at representing OMOP patient data in Neo4J, and we were hoping to learn experiences from others in the OHDSI community that have done this already. In particular, lessons learned with different models for nodes/edges/attributes.

I came across a 2017 software demonstration abstract (https://www.ohdsi.org/web/wiki/lib/exe/fetch.php?media=resources:jose_alvarado_rd2gd_ohdsi_submission_2017.pdf) that described this general approach, but after looking at the RD2GD repository on GitHub (https://github.com/Sapphirine/RD2GD), it seemed that while this was demonstrated for clinical data, it is not in the OMOP format.

I’ve seen a few other posts where graph databases are mentioned, and was hoping to see what experiences the community might have and be willing to share.

Why don't relationship concepts belong to different vocabularies beyond 'Relationship'?
(Keesvanbochove) #2

Dear Luke,

If you have a lot of OMOP type data I’m not sure if a graph database would add much value, in fact you are probably better off with an RDBMS or possibly OLAP database as these are more optimized for the types of queries you would do on OMOP data. But of course it’s technically possible to express the OMOP model in Neo4J.

That being said, if you also have other type of data to integrate this could be very interesting. Please check out the webinar from my colleague Ilaria Maresi last week where she discussed a graph model for clinical trials data and also answers a question about the relation to OMOP - basically I think you can have high volume medical history data in OMOP but interlink it with a broader context in a graph database.

The webinar recording is at https://vimeo.com/411366798 and slides are here: https://www.slideshare.net/pistoiaalliance/knowledge-graphs-ilaria-maresi-the-hyve-23apr2020.



(Mark Seal) #3

Thank you for looking into this. I have personally found frustration after frustration of mapping non-relational data into a relational data structure( if all one has is a hammer, everything looks like a nail).

Are you wedded to Neo4J? The reason I ask is that OrientDB can run in two different modes: Pure Graph or a Graph/Document hybrid. The OMOP data looks like it would map easier into a Graph/Document hybrid structure; allowing for, theoretically, faster lookups as well. I am not for sure how this would affect graph traversals though.

(Christian Reich) #4

@Mark and @lrasmussen:

Do you have a particular use case you want to develop by using graph databases? I understand your frustration since this subject has been brought up over and over again and nothing seems to move forward, but you must have some purpose. You mentined “theoretically, faster lookup”. Lookup of what?

(Mark Seal) #5

TLDR: The vocabulary is graph data and should be stored as such. The current recursive lookups are both slow and complex to write general purpose queries to find all matches.

My job is to build an automatic ETL of our EHR to OMOP instance. My frustration is that EHR data, outside of the actual demographic information, is not relational data, it is document data. The vocabulary data is not relational either, it is graph.
Whist all of the EHR and Vocabulary data can be modeled in a graph structure or a relational structure, both take more steps and add complexity, this is why I suggested OrientDB as it allows one to use a Graph-Document hybrid model. This would mean that data can live in it’s natural environment. This would allow more logical searches of data and easier ETL.
We( Cherokee Health Systems) are one of the primary grant receipts for AOU, this means that we have to run the entire ETL process every 6 weeks ( it appears that this is going to move to a faster iteration). This is a tremendous amount of data to reprocess on a regular basis.

As to the speed comment I made, I was referring to OrientDB, according to benchmarks, should be faster than NEO4J as to it allows Graph transversals from one node to another or direct lookups using a an index much like standard relational systems. @ Christian_Reich I am sorry, I was not clear on that point.

I realize that our pain point is not the same that most will have; I do not expect the entire process to change to fit into our needs. I am encouraged that others are interested in cleaning up the process.

(Luke Rasmussen) #6

Thanks all for your replies and helpful resources!!

@keesvanbochove - I will definitely check out that presentation, thank you! And we are integrating other types of data along with the OMOP, sorry for not clarifying that.

@Mark - the team I’m working with has past experience working with Neo4J so I think we will be progressing with that. But thank you for the suggestion of OrientDB. I’ll mention it to the team and investigate some myself.

@Christian_Reich - we’re hoping to be able to traverse and discover more complex relationships/patterns across a graph structure, but I certainly appreciate your question “why?” We are tracking performance and fit to purpose of this as we progress with the project, but getting started with some hands-on experience will help us validate it further.

(Andrew Williams) #7

I suspect that the current work in N3C to link COVID data in OMOP form to the knowledge bases and ontologies used in NIH Translator projects will build some of what’s needed to do the work you are interested. Whether OMOPed vocabs get represented in Neo4J or not, this work will enable analyses that require a graph representation to utilize clinical data in OMOP form. It’s evolving work but I think has a variety of important use cases of interest to the OHDSI community. Other at your shop are involved. It would be great to keep on top of its implications and value for OHDSI and help this community decide whether and how push mature solutions developed in N3C and Translator toward new or extended OHDSI tools, CDM modules etc.