OHDSI Home | Forums | Wiki | Github

Newbie Vocabulary import

Hi all, I just started to learn about the OMOP CDM and I am trying to start working with it.

I downloaded the vocabularies from Athena, they come as TAB separated files, I have a PostreSQL database. This brand new database I build following the scripts provided in GitHub for V5.

I am trying to import them, however when importing the vocabulary file I get the message of the constrain vocabulary_concept being violated, and when I try to import the concept, I get the constrain concept_vocabulary being violated, so… how do I do this?

Best regards.

Emanuel:

Which constraint? Can you provide the error message?

So when inserting into Vocabulary I get:
ERROR: Insert or update on table “vocabulary” violates foreign key constrain “fpk_vocabulary_concept”
DETAIL: Key (vocabulary_concept_id)=(44819096) is not present in table “concept”

When inserting into concept:
ERROR: Insert or update on table “concept” violates foreign key constrain “fpk_concept_domain”
DETAIL: Key (domain_id)=(Metadata) is not present in table “domain”

When inserting into domain:
ERROR: Insert or update on table “domain” violates foreign key constrain “fpk_domain_concept”
DETAIL: Key (domain_concept_id)=(1) is not present in table “concept”

Thanks for your reply.

@Emanuel_Villa try loading the Vocabulary and then applying the constraints.

So go through, get rid of all indexes and keys, load the Vocab, then apply them back.

The problem may be related to the order in which the tables are loaded.

I suggest trying this approach:

  1. drop all the constraints and indexes on the tables
  2. load the data from the files into the tables
  3. create the constraints and indexes on the tables
1 Like

Yes, I thought about that, dropping the indexes, but just wanted to be sure that is the common practice.

I will give it a try and keep you posted.

Thanks again.

Hello @lee_evans

I got same error as @Emanuel_Villa.
Should I reload all the data from the files into the tables?..or can I reload only new vocabulary file into the concept table if I create the constraints and indexes on the tables?
and
Why the constraints are important to this? If the constraints isn’t there, what happened?

@MichealJeong here is some general info on why database constraints are used which may be helpful:

The database constraints specify some referential integrity rules between tables to ensure that the data loaded into all the tables is consistent. One consequence of this is that the tables must be loaded in a specific order.

1 Like

Micheal, Constraints are pretty important to maintaining the quality of a database. Constraints help to ensure that data is properly constructed without dangling references to things that don’t exist or codes that were typo’d. It also ensures you don’t delete something that is referenced elsewhere. Furthermore, constraints serve to document valid navigation in your database, there are many 3rd party tools that use this additional information to improve user experience. You want to use them. In fact, proper constraints are what make OMOP so attractive.

On the other hand, enforcing constraints while loading a large database can slow things down quite a bit. In these cases, you typically drop/disable the constraints (and the corresponding indexes that maintain them), and then enable the constraints and recreate the indexes after the data-set is loaded. This way you enjoy the knowledge that your data is coherent, but don’t have to suffer slow loading speeds. If your source data has reference problems, say there’s missing codes, then you’ll need to fix these before the constraints are enabled.

Furthermore, some data, for example, a top-level cyclic relationship in a vocabulary may be meaningful, but cannot be constructed while the constraints are enforced. I don’t know of OMOP has terminology cycles like this. This may be another reason to temporarily disable constraints. Regardless, you really want to get those constraints back up before you put the database into production.

1 Like

The issue is that there are direct Many-to-Many relationships between some of the tables.

Domain<–>Concept
Vocabulary<–>Concept
Concept_Class<–>Concept.

Because of this, before you can insert a record into the Domain table, you must have a matching record in the Concepts table. BUT before you can insert a record into the Concepts table, you have to have a matching record in the Domain table. Catch-22

This is the same with Vocabulary-Concept and Concept_Class-Concept.

The only way around this is to remove the constraints, load the data, and re-add the constraints.

Capture

2 Likes

Looks like we tripped up ourselves. You are totally right.

Oh, I realized why the constraints are important. thank you @cce
Okay, definitely yes I have to create the constraints.
Now what I am doing is BI(Business Intelligence) with Synpuf data which is converted to CDM.
For instance for understanding easier, drawing the graphs, distribution about male/female, medicine frequency from the data. So to speak, I want to analyze them myself.
But CDM data is already converted, and nobody can touch my data except for me.
even though that, I’m not sure if the constraints are needed in analysis CDM in Business Intelligence as CDW.

Is concept as master table?

I imported 39 table CDM data for two weeks.
If I need to create the constraints. I should reload all the data for a really really long time, not kidding…
So I have considered that how I make the decision now.

Even just for analytics, constraints can be quite helpful: a) some tools use the information in the foreign key and primary key constraints to help you visualize and construct queries, b) query performance, after creating the indexes that enforce the constraints, could improve significantly, and c) you could rest assured that you’ve loaded the data correctly … so that your results are accurate.

You should be able to create the constraints without having to reload the data. The main pattern for loading a database is: a) create the tables, b) load the data, c) create the indexes and constraints. Typically, the constraints/indexes are in a separate SQL file. If you are using PostgreSQL and you read the manual for pg_dump, you’ll see dump files are done in this order; further, pg_dump could let you dump all 3 sections independently, so you could apply them sequentially in an independent manner.

Happy Querying

1 Like

Looks like we tripped up ourselves. You are totally right.

The usual solution is to create Junction tables. That shouldn’t be too hard to add into the data model, but at this point would probably cause a lot of problems with existing OMOP databases.

I’ve considered creating my own, which would allow me to avoid the drop/add constraints.

You technically don’t need constraints for the Vocabulary tables (Concept, Domain, etc), that you get with the vocabulary download. The referential integrity should be provided by the download as long as you are not modifying the data in those tables.

You also don’t technically need the constraints for analyzing the data. (aside from performance gains provided by indexes and constraints), because you’re not entering data there either.

HOWEVER, you absolutely need constraints for the Standard Clinical Data Tables and others that you will fill with data from your EHR. If you don’t have constraints on those, you can have zero confidence in your data.

1 Like

Which they are. Look into https://github.com/OHDSI/CommonDataModel/tree/master/PostgreSQL (or another database), and you will find

  • OMOP CDM postgresql constraints.txt
  • OMOP CDM postgresql ddl.txt
  • OMOP CDM postgresql pk indexes.txt

It also tells you in which order to install them. But we do need to add the explanation for the vocabulary loading here.

Exactly.

I’ve added a topic to CDM Builders which addresses this issue:

Wouldn’t that just disable the constraint, @roger.carlson? Now you can have domains in the CONCEPT table that don’t exist in the DOMAIN table.

t