I vote we standardize on some form of UTF for the vocabulary file format (the database already supports it).
Can someone please confirm that Sql Server can load a file in UTF-8 format and provide some examples of the bcp or bulk insert statements? Or does it only support UTF-16?
See this URL for bcp: http://msdn.microsoft.com/en-us/library/ms162802.aspx
and this URL for SQL bulk insert: http://msdn.microsoft.com/en-us/library/ms188365.aspx
where it says that “SQL Server does not support code page 65001 (UTF-8 encoding).”
If SQL Server requires UTF-16 then the impact would be that we would double the file size for downloads for all dbms users (if we wanted to produce a single file) or we would need to develop additional code to generate UTF-16 files for SQL Server and UTF-8 files for Oracle and Postgres.