OHDSI Home | Forums | Wiki | Github

Run the VocabImport script fail for CMD v5

Hi there,

I’m a fresh men to setup CMD v5. Today I try to import the vocabulary to postgresql database on my macbook pro and windows, both failed. the error is below (I have stopped the script on windows and the mac is till running). Does anyone encounter the same problem as me? I just search on the forums and google it, but did not find any clue. It is appreciated if anyone can give me some suggestion!

[WARNING ] COPY DRUG_STRENGTH FROM ‘C:\CDMV5VOCAB\DRUG_STRENGTH.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’
ERROR: extra data after last expected column
CONTEXT: COPY drug_strength, line 2: "19101475 19080375 50 8576 19700101 20991231 "
[WARNING ] COPY CONCEPT FROM ‘C:\CDMV5VOCAB\CONCEPT.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’
ERROR: extra data after last expected column
CONTEXT: COPY concept, line 2771153: “2313807 Percutaneous balloo2314026 Pressurized or nonpressurized inhalation treatment for acute airw…”
[WARNING ] COPY CONCEPT_RELATIONSHIP FROM ‘C:\CDMV5VOCAB\CONCEPT_RELATIONSHIP.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’
ERROR: insert or update on table “concept_relationship” violates foreign key constraint “fpk_concept_relationship_c_1”
DETAIL: Key (concept_id_1)=(40177103) is not present in table “concept”.
… …

@davidwang:

Looks like the input files are corrupt. I don’t know what Postgres does when it reports these errors, but the drug_strength is missing 5 null values between the 8576 and 19700101. The concept line is truncated in the middle of the concept_name (“Percutaneous baloo”), and the concept_relationship fails because it expects the concepts to be loaded. You need to debug the loading process. Take each file and try loading one single line. Then you try many, till you got them all.

Thanks @Christian_Reich !

For the first error:
[WARNING ] COPY DRUG_STRENGTH FROM ‘C:\CDMV5VOCAB\DRUG_STRENGTH.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’
ERROR: extra data after last expected column
CONTEXT: COPY drug_strength, line 2: "19101475 19080375 50 8576 19700101 20991231 "
I found there is an extra column “box_size” after “denominator_unit_concept_id”. After I add that column to the DRUG_STRENGTH then run the script, the issue is fixed for error.

Following error is still debugging. The question is will we add ‘box_size’ in DRUG_STRENGTH or remove the box_size in data file DRUG_STRENGTH.csv?

@davidwang:

Oh! Then the postgres DDL files have aged. We will fix it. Yes, keep the box_size. In US drug data this is an irrelevant field, in international data it is very much needed. Where is your data from?

@Christian_Reich:
I downloaded the file from http://www.ohdsi.org/web/athena/. I just fill the form and then download the file from the link which send to my email. My data is from UK and China.

And for the CONCEPT.csv, I think the content is corrupt since I rerun the .sh script. I try to export a new one and see what happened today.

@Christian_Reich:

I have another question: Should I run the VocabImport script before run “OMOP CDM constraints - PostgreSQL.sql” and “OMOP CDM indexes required - PostgreSQL.sql” or after?

Either way. The data are clean (if you can get them loaded properly). But if you do it before it will load a lot slower.

box_size does not appear to be documented in the 5.0.1 version of the vocabulary for DRUG_STRENGTH. I’ve been receiving bug reports that our ETL-CMS load instructions no longer work with the new DRUG_STRENGTH.csv files that have this field. What version of the vocabulary adds this field? Is its format documented somewhere?

Thanks!

@Christophe_Lambert:

Agreed. We need that cleaned up. Will do shortly.

t