Has anyone experienced any issues while uploading? I tried with tab, comma, ; and | seperator but nothing works. can somebody let us know whether the issue is with the files? While I could upload other files with tab delimiter but concept.csv doesnât work with tab or comma separator.
I use pgadmin to upload the files. These are the settings that I use
I was expecting to see the data in a column wise format (after double clicking), cleanly separated.
@Dymshyts - Thanks for the links, I had a look at it.
Quick question regarding constraints, Do we have to drop constraints to add vocab files? Because I see certain constraints are being violated. So I did see few other posts in forum discussing the same. So, we drop constraints â> add vocabsâ> set constraints again?
I also faced another issue . Just thought of reporting it
@Dymshyts - Actually I tried to follow the order given here. But it still threw error for constraints.
I started with drug_strength table as shown in the code. But it expects the concepts to be already present in concept table. Hence it resulted in foreign key constraint violation error.
If we look at the order in the script that you shared,it shows that concept table is loaded after drug strength.
But unfortunately it still throws error as shown below.
ERROR: insert or update on table "drug_strength" violates foreign key constraint
"fpk_drug_strength_concept_1"
DETAIL: Key (drug_concept_id)=(37594088) is not present in table "concept".
Am I missing anything here? can you correct me if I am making any mistakes?
Vocabulary data files are tab-separated, so you should use â\tâ delimiter. Also, make sure you set the quote character â without it, some concepts from âGCN_SEQNOâ vocabulary fail to load (like on your 2nd screenshot).
Verified this with Jan 2020 vocabularies on a fresh PostgreSQL 10 install using the script that @Dymshyts pointed you to:
COPY omop_voc.concept
FROM '/data/vocabs/CONCEPT.csv'
WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b';
Not sure why DRUG_STRENGTH is supposed to be loaded first; Iâd propose to start with CONCEPT table and load DRUG_STRENGTH last. Create the tables using the provided DDL, load data into them, then apply foreign key constraints.
Hi @rookie_crewkie As I have already fed in my data to the cdm tables and would like to update my vocab version to Jan 2020 now. Mine is not a fresh postgresql install.
Since my vocab tables already had Aug 2019 data, I truncated them now. So constraints are in place as is.
So now to reupload new vocab files, are you suggesting me to drop constraints and add it again after upload?
Or just uploading concept table instead of drug_strength first will resolve the issues?
Youâll have to drop foreign key constraints at least, because for some tables they are referencing reach other. For example, see âfpk_concept_domainâ and âfpk_domain_conceptâ here â since youâve truncated both tables already, Postgres wonât allow you to load neither CONCEPT (because it would break âfpk_concept_domainâ) nor DOMAIN (breaks âfpk_domain_conceptâ). The readme file mentions in #5 that foreign key constraints should be applied after loading the data; the âNoteâ section there is actually misleading because itâs not feasible to load the data with constraints in place.
A side note on vocabulary updates: since you use ATLAS, you might want to keep the vocabularies which were used during the ETL next to the clinical data, and keep them read-only. The reason is that CDMs are tightly coupled with vocabularies they were built with, especially when you use not only the OHDSI-provided concepts but also the custom ones, created specially for your data. But vocabularies constantly change â new concepts are added, some old are deprecated, some made non-standard, some change domains etc. (you can track the details of the changes here), â therefore, itâs usually safer to rerun the ETL with new vocabs if you want to use the latest ones, because you wonât have hard-to-debug situations when e.g. a concept has changed domain but the records in your CDM have not. You can, of course, write a migration script which will update the CDM according to the vocabulary changes, but this is a more error-prone path in general (though might be preferable if your ETL takes ages).
I think this whole thing was described somewhere in the Book of OHDSI, you might want to check it out.
So, instead of truncating the tables, consider the following approach:
Create a new separate schema for new version of vocabularies and load new data there.
Rerun the ETL of your clinical data using new vocabularies instead of the ones used before.
If you donât need old vocabularies, drop them and use the updated ones going forward.
This process may look too cumbersome, but it saves you a lot of potential headache with the updates.