OHDSI Home | Forums | Wiki | Github

COPY command error loading vocab v5 into postgres on windows

Hey All,

Just trying to dip a toe in the OMOP waters here, and am trying to get a functioning vocabulary install on postgres (v9.6, x64) running on my windows 7 laptop.

The $unzip\vocabulary_5_0_Unrestrict_csv\Postgres\ddl_V5_Postgres.sql script ran just fine in pgAdmin, but when I try to run load_Postgres.sql I get:

ERROR:  invalid byte sequence for encoding "UTF8": 0xf2
********** Error **********

ERROR: invalid byte sequence for encoding "UTF8": 0xf2
SQL state: 22021

Some googling led me to try adding an encoding 'windows-1251' option on the call to COPY, which had no effect. Ditto adding the TEXT modifier.

I’m a real noob w/pg, and google is not availing me. Can any of you kind ppl throw me a clue?

Thanks!

-Roy

@rpardee:

Welcome to the family.

This is strange. Usually, the downloaded zip should filter all special characters (here the “ò”) and replace them with the simple ASCII equivalents. Not sure why this is not happening. We’ll investigate.

In the mean time, you can take them out yourself. These are rare. A bunch of accented Es, Os and As.

Thanks for the response @Christian_Reich!

It looks to me like those òs are the delimiters. They’re on every line–and specified in the COPY command. Here’s what I’m looking at:

copy vocabulary_5_0.domain
	from '$unzip\vocabulary_5_0_Unrestrict_csv\domain.txt000'
	delimiter E'\0362' 
	;

Should I write a script to sub those out for pipe characters or some such?

Strange. I just looked at the zip files folks downloaded yesterday, and they all have as the delimiter tab (0x09) as it should be the case. I don’t know Postgres, but it looks something fishy going on on your side.

Feh–strange. The link I downloaded from is

http://75.101.131.161/downloads/registration?docname=Vocabulary%20Data%20CSV%20V5

Is that the recommended source? Where are you downloading from?

Thanks!

use http://athena.ohdsi.org/

@mkhayter: Can you take down that site?

Maybe even better would be re-pointing the links on the right-hand column of this page: http://omop.org/Vocabularies. I believe that’s what led me down the path. I don’t recognize the ‘athena’ stuff at all–very glad to know about it.

FWIW, I was able to get a copy of what I d/l’d installed & running on mssql. So assuming the thing I grabbed was neither heinously out of date nor pre-release-and-not-yet-ready-for-actual-use, I should be set up. Can anybody comment on that?

Thanks All!

-Roy

@rpardee:

We updated http://omop.org over the weekend. It has been stale for 3 years now, shoing some ancient V4 data tables and vocabulary files. It’s now all either fixed, redirected, removed or a big fat warning was added. Bottom line: don’t use anything from omop.org. It’s a historic snapshot in time.

Regarding MySQL: Don’t use it. Almost all use cases will break, becasue it cannot do partitioned queries. We don’t support MySQL. If you need a cheapo database system use Postgres, which is fully supported.

Ah, excellent–many thanks!

I’m using Microsoft SQL Server, so no worries there.

I’ll try a download from the athena site & look at record counts & such to see if there are big differences. I wouldn’t like to abandon what I’ve already got (it was a PITA getting the attention of our admins to help load the data) but if it’s out of date I surely will.

I downloaded from Athena and am running into the same error in postgresql
ERROR: invalid byte sequence for encoding “UTF8”: 0xc3 0x3f
SQL state: 22021
Context: COPY concept, line 100759
issue again for COPY vocab.CONCEPT FROM ‘\Vocab\CONCEPT.csv’ WITH DELIMITER E’\t’ CSV HEADER QUOTE E’\b’ ;

Seems to be an issue with the ÿ…

Looks like these lines are the culprit:
43222776 FLECAÿNE L.P. 100 mg, gélule à libération prolongée Drug BDPM Drug Product 3580564 20011218 20991231

43228638 CHIROCAÿNE 0,625 mg/ml, solution pour perfusion Drug BDPM Drug Product 5668240 20050705 20991231

43226396 FLECAÿNE L.P. 150 mg, gélule à libération prolongée Drug BDPM Drug Product 3580541 20011218 20991231

43225430 LYSOPAÿNE MAUX DE GORGE CETYLPYRIDINIUM LYSOZYME FRAISE SANS SUCRE, comprimé à sucer édulcoré au sorbitol et à la saccharine Drug BDPM Drug Product 2802643 20150126 20991231

43224822 ROPIVACAÿNE KABI 10 mg/ml, solution injectable en ampoule Drug BDPM Drug Product 5774166 20101122 20991231
43224823 ROPIVACAÿNE KABI 10 mg/ml, solution injectable en ampoule Drug BDPM Drug Product 5774232 20101122 20991231

43221417 ROPIVACAÿNE KABI 2 mg/ml, solution injectable en ampoule Drug BDPM Drug Product 5773899 20101122 20991231

43228599 FLECAÿNIDE SANDOZ 100 mg, comprimé sécable Drug BDPM Drug Product 3824859 20071123 20991231

43248434 ROPIVACAÿNE KABI 2 mg/ml, solution pour perfusion en poche Drug BDPM Drug Product 5774338 20101122 20991231

43234289 LYSOPAÿNE MAUX DE GORGE AMBROXOL MENTHE 20 mg SANS SUCRE, pastille édulcorée au sorbitol et à la saccharine sodique Drug BDPM Drug Product 4998372 20030611 20991231
43236897 CHIROCAÿNE 1,25 mg/ml, solution pour perfusion Drug BDPM Drug Product 5668317 20050705 20991231

43232548 LYSOPAÿNE MAUX DE GORGE AMBROXOL CITRON 20 mg SANS SUCRE, pastille édulcorée au sorbitol et au sucralose Drug BDPM Drug Product 2228635 20130621 20991231

43240718 CHIROCAÿNE 1,25 mg/ml, solution pour perfusion Drug BDPM Drug Product 5668352 20050705 20991231

43245280 LYSOPAÿNE MAUX DE GORGE CETYLPYRIDINIUM LYSOZYME SANS SUCRE, comprimé à sucer édulcoré au sorbitol et à la saccharine Drug BDPM Drug Product 3903501 19940112 20991231

43241778 LYSOPAÿNE MAUX DE GORGE AMBROXOL CASSIS 20 mg SANS SUCRE, pastille édulcorée au sorbitol et au sucralose Drug BDPM Drug Product 3925916 20030611 20991231

among others…

t