OHDSI Home | Forums | Wiki | Github

Byte 0xc3 not recognised in UTF8 encoding (postgreSQL)

Dear colleagues, building the concept table in PostGreSQL after download from athena and successfully running the java -jar script. All fine up to this point.
Populating the table in postgres using sql script “OMOP CDM vocabulary load - PostgreSQL.sql” fails because some bytes are not defined in the UTF8 encoding, such as “0xc3”.
Can anyone suggest workarounds for the unrecognized byte sequences?
Thanks in advance.

Hi Leonard

True.
Sadly I cannot find trace in my bash_history, but I can remember I had to use “sed” command to remove a bunch of such character. Then I had other kind of problems such constraint checking and I recommend you loading the data without any constraint and then add them carefully.

t