Hi all, I just started to learn about the OMOP CDM and I am trying to start working with it.
I downloaded the vocabularies from Athena, they come as TAB separated files, I have a PostreSQL database. This brand new database I build following the scripts provided in GitHub for V5.
I am trying to import them, however when importing the vocabulary file I get the message of the constrain vocabulary_concept being violated, and when I try to import the concept, I get the constrain concept_vocabulary being violated, so… how do I do this?
So when inserting into Vocabulary I get:
ERROR: Insert or update on table “vocabulary” violates foreign key constrain “fpk_vocabulary_concept”
DETAIL: Key (vocabulary_concept_id)=(44819096) is not present in table “concept”
When inserting into concept:
ERROR: Insert or update on table “concept” violates foreign key constrain “fpk_concept_domain”
DETAIL: Key (domain_id)=(Metadata) is not present in table “domain”
When inserting into domain:
ERROR: Insert or update on table “domain” violates foreign key constrain “fpk_domain_concept”
DETAIL: Key (domain_concept_id)=(1) is not present in table “concept”
I got same error as @Emanuel_Villa.
Should I reload all the data from the files into the tables?..or can I reload only new vocabulary file into the concept table if I create the constraints and indexes on the tables?
and
Why the constraints are important to this? If the constraints isn’t there, what happened?
@MichealJeong here is some general info on why database constraints are used which may be helpful:
The database constraints specify some referential integrity rules between tables to ensure that the data loaded into all the tables is consistent. One consequence of this is that the tables must be loaded in a specific order.
Micheal, Constraints are pretty important to maintaining the quality of a database. Constraints help to ensure that data is properly constructed without dangling references to things that don’t exist or codes that were typo’d. It also ensures you don’t delete something that is referenced elsewhere. Furthermore, constraints serve to document valid navigation in your database, there are many 3rd party tools that use this additional information to improve user experience. You want to use them. In fact, proper constraints are what make OMOP so attractive.
On the other hand, enforcing constraints while loading a large database can slow things down quite a bit. In these cases, you typically drop/disable the constraints (and the corresponding indexes that maintain them), and then enable the constraints and recreate the indexes after the data-set is loaded. This way you enjoy the knowledge that your data is coherent, but don’t have to suffer slow loading speeds. If your source data has reference problems, say there’s missing codes, then you’ll need to fix these before the constraints are enabled.
Furthermore, some data, for example, a top-level cyclic relationship in a vocabulary may be meaningful, but cannot be constructed while the constraints are enforced. I don’t know of OMOP has terminology cycles like this. This may be another reason to temporarily disable constraints. Regardless, you really want to get those constraints back up before you put the database into production.
Because of this, before you can insert a record into the Domain table, you must have a matching record in the Concepts table. BUT before you can insert a record into the Concepts table, you have to have a matching record in the Domain table. Catch-22
This is the same with Vocabulary-Concept and Concept_Class-Concept.
The only way around this is to remove the constraints, load the data, and re-add the constraints.
Oh, I realized why the constraints are important. thank you @cce
Okay, definitely yes I have to create the constraints.
Now what I am doing is BI(Business Intelligence) with Synpuf data which is converted to CDM.
For instance for understanding easier, drawing the graphs, distribution about male/female, medicine frequency from the data. So to speak, I want to analyze them myself.
But CDM data is already converted, and nobody can touch my data except for me.
even though that, I’m not sure if the constraints are needed in analysis CDM in Business Intelligence as CDW.
I imported 39 table CDM data for two weeks.
If I need to create the constraints. I should reload all the data for a really really long time, not kidding…
So I have considered that how I make the decision now.
Even just for analytics, constraints can be quite helpful: a) some tools use the information in the foreign key and primary key constraints to help you visualize and construct queries, b) query performance, after creating the indexes that enforce the constraints, could improve significantly, and c) you could rest assured that you’ve loaded the data correctly … so that your results are accurate.
You should be able to create the constraints without having to reload the data. The main pattern for loading a database is: a) create the tables, b) load the data, c) create the indexes and constraints. Typically, the constraints/indexes are in a separate SQL file. If you are using PostgreSQL and you read the manual for pg_dump, you’ll see dump files are done in this order; further, pg_dump could let you dump all 3 sections independently, so you could apply them sequentially in an independent manner.
Looks like we tripped up ourselves. You are totally right.
The usual solution is to create Junction tables. That shouldn’t be too hard to add into the data model, but at this point would probably cause a lot of problems with existing OMOP databases.
I’ve considered creating my own, which would allow me to avoid the drop/add constraints.
You technically don’t need constraints for the Vocabulary tables (Concept, Domain, etc), that you get with the vocabulary download. The referential integrity should be provided by the download as long as you are not modifying the data in those tables.
You also don’t technically need the constraints for analyzing the data. (aside from performance gains provided by indexes and constraints), because you’re not entering data there either.
HOWEVER, you absolutely need constraints for the Standard Clinical Data Tables and others that you will fill with data from your EHR. If you don’t have constraints on those, you can have zero confidence in your data.