Skewed columns in CONCEPT and CONCEPT_SYNONYM

rsanka · March 17, 2020, 9:32pm

Hello, I am new to OHDSI and am trying to create an instance of the CDM with the POSTGRES control files.

After the first 2 steps:

Create an empty schema.
Execute the script OMOP CDM postgresql ddl.txt to create the tables and fields.

I attempted to load the Standardized Vocabularies (v5.0 13-MAR-20) into the corresponding tables by following the statements in “VocabImport/OMOP CDM vocabulary load - PostgreSQL.sql”:

COPY DRUG_STRENGTH FROM ‘C:\CDMV5VOCAB\DRUG_STRENGTH.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’ ;
COPY CONCEPT FROM ‘C:\CDMV5VOCAB\CONCEPT.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’ ;
COPY CONCEPT_RELATIONSHIP FROM ‘C:\CDMV5VOCAB\CONCEPT_RELATIONSHIP.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’ ;
COPY CONCEPT_ANCESTOR FROM ‘C:\CDMV5VOCAB\CONCEPT_ANCESTOR.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’ ;
COPY CONCEPT_SYNONYM FROM ‘C:\CDMV5VOCAB\CONCEPT_SYNONYM.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’ ;
COPY VOCABULARY FROM ‘C:\CDMV5VOCAB\VOCABULARY.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’ ;
COPY RELATIONSHIP FROM ‘C:\CDMV5VOCAB\RELATIONSHIP.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’ ;
COPY CONCEPT_CLASS FROM ‘C:\CDMV5VOCAB\CONCEPT_CLASS.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’ ;
COPY DOMAIN FROM ‘C:\CDMV5VOCAB\DOMAIN.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’ ;

However, the commands for CONCEPT and CONCEPT_SYNONYM fail due to lack of delimiters in certain rows of those tables’ CSV files. This results in the process trying to load multiple columns of a row’s data under a single column. Thus, an error message like “Value too long for varchar(255)” appears.

It’s a little difficult to view these files in Excel, perhaps due to their size, but as far as I can tell, the delimiters (tabs) are there.

Is this due to something on my end, or have others had this issue with v5.0 13-MAR-20?

prquinlan · April 19, 2020, 7:35am

Yes, same problem experienced. I guess it is possible to strip out the quotes from the text file, but it would be good if this was controlled.

prquinlan · April 19, 2020, 12:23pm

psql -U [username] -c “\copy concept FROM ‘CONCEPT.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’ encoding ‘UTF8’” -d [db_name]