OHDSI Home | Forums | Wiki | Github

SQL Scripts for PostgreSQL (COPY error)

Hi there!

Currently to create a cdm schema I’m copying the synpuf 5% sample .csv files (https://www.ohdsi.org/data-standardization/) using the Postgresql scripts here (https://github.com/OHDSI/CommonDataModel/tree/v5.2.2/PostgreSQL)

However I have encountered the following errors:

device_exposure:
COPY cdm.device_exposure FROM ‘/tmp/CDM5VOCAB/device_exposure.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’;

    ERROR: invalid input syntax for integer: "C1894"
    SQL state: 22P02
    Context: COPY device_exposure, line 2, column visit_detail_id: "C1894"

condition_occurrence:
COPY cdm.condition_occurrence FROM ‘/tmp/CDM5VOCAB/condition_occurrence.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’;

ERROR: missing data for column "condition_status_concept_id"
SQL state: 22P04
Context: COPY condition_occurrence, line 2: "12    1    134736    2009-07-25    2009-07-25 00:00:00    2009-07-25        38000230        6    6    7245    44825708    7245    0"

drug_exposure
COPY cdm.drug_exposure FROM ‘/tmp/CDM5VOCAB/drug_exposure.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’;

ERROR: missing data for column "dose_unit_source_value"
SQL state: 22P04
Context: COPY drug_exposure, line 2: "97    1    19040307    2008-04-10    2008-04-10 00:00:00    2008-07-09    2008-07-09 00:00:00        38000175            30    90                        0..."

measurement:
COPY cdm.measurement FROM ‘/tmp/CDM5VOCAB/measurement.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’;

ERROR:  missing data for column "unit_source_value"
CONTEXT:  COPY measurement, line 2: "1    1    2212731    2008-09-04        45754907            0                2    2    85610    2212731        "

observation:
COPY cdm.observation FROM ‘/tmp/CDM5VOCAB/observation.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’;

ERROR:  invalid input syntax for integer: "V4502"
CONTEXT:  COPY observation, line 2, column visit_detail_id: "V4502"

payer_plan_period:
COPY cdm.payer_plan_period FROM ‘/tmp/CDM5VOCAB/payer_plan_period.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’;

ERROR: missing data for column "plan_concept_id"
SQL state: 22P04
Context: COPY payer_plan_period, line 2: "1    1    2008-01-01    2010-12-31        Medicare Part A    "

procedure_occurrence:
COPY cdm.procedure_occurrence FROM ‘/tmp/CDM5VOCAB/procedure_occurrence.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’;

ERROR: invalid input syntax for integer: "V5841"
SQL state: 22P02
Context: COPY procedure_occurrence, line 2, column visit_detail_id: "V5841"

vocabulary:
COPY cdm.vocabulary FROM ‘/tmp/CDM5VOCAB/vocabulary.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’;

ERROR: null value in column "vocabulary_version" violates not-null constraint
SQL state: 23502
Detail: Failing row contains (Cohort Type, OMOP Cohort Type, OMOP generated, null, 44819234).
Context: COPY vocabulary, line 3: "Cohort Type    OMOP Cohort Type    OMOP generated        44819234"

For the invalid input syntax, I’m guessing the data dosen’t match the datatype specified for the column, e.g. for procedure_occurence, visit_detail_id should be an integer but the data is a string “VS841”?

I’m puzzled over the error regarding null values as I’ve looked at the .csv files and the number of columns of the data seem to match the number of columns in the table?

Thanks in advance,
Hui Xing

@hui_xing_tan

Try using the below example \COPY command in psql instead (it’s based on the postgres \COPY example in the synpuf 5% ‘README.md’ file).

\COPY cdm.vocabulary FROM 'vocabulary.csv' DELIMITER E'\t' CSV ENCODING 'UTF8'; 

The synpuf 5% data file is UTF8 and doesn’t have header records.

Hi @lee_evans

hmm, I still faced the same errors with the code but what I did was to modify the tables specs in the ddl instead of the COPY code:

  • for missing data, omit the column
  • for invalid input syntax, change from integer to varying character

Thanks,
Hui Xing

t