I vote we standardize on some form of UTF for the vocabulary file format (the database already supports it).
Can someone please confirm that Sql Server can load a file in UTF-8 format and provide some examples of the bcp or bulk insert statements? Or does it only support UTF-16?
If SQL Server requires UTF-16 then the impact would be that we would double the file size for downloads for all dbms users (if we wanted to produce a single file) or we would need to develop additional code to generate UTF-16 files for SQL Server and UTF-8 files for Oracle and Postgres.
Urgh! As far as I can see, SQL Server does not support UTF-8, and in fact dropped support for importing UTF-8 data ( https://connect.microsoft.com/SQLServer/feedback/details/370419/ ). By default, SQL Server uses ISO-1 (ISO 8859-1), which does support European characters but not Asian characters.
That may be ok, Martijn. We don’t have Asian characters in there. The only characters we bump into are accented or otherwise altered Latin. So, when exporting we could produce a different character set depending what platform people choose.
well I guess what I asked is off the topic here.My concern was for Notes entity where we need to have encoding_concept_id to map with a standrard concept_id in concept table.But I cannot find any encoding concept id.can you suggest something here.
I am sorry,I am new to OMOP
@ambuj: No, you are right. There is the field encoding_concept_id. Unbelievable. It snuck in.
Ok, let’s do this: We create the UTF-8 concept so you can do your job, and in the mean time I will try to convince the community to drop that field. It should have never been there to begin with. They may push back with some good reason why it is needed for NLP.
So that means I can proceed with setting the encoding_concept_id = 0 ?
Also, if you can shed some light on notes_class_concept_id, that would be great.