What is the recommended collation for Microsoft SQL server?

clairblacketer · January 5, 2021, 5:49pm

Moving this to the forums instead of the CDM github

I came across a problem when loading the vocabularies (CONCEPT_SYNONYM):

Violation of UNIQUE KEY constraint 'uq_concept_synonym'. Cannot insert duplicate key in object 'omop.concept_synonym'. The duplicate key value is (9188, final report, 4180186)

After some searching the duplicate entries where:

9188	Final report	4180186
9188	final report	4180186

This lead me to the problem that my SQL server is Case Insensitive (collation = SQL_Latin1_General_CP1_CI_AS).
After changing to SQL_Latin1_General_CP1_CS_AS, this problem goes away.

In the book of OHDSI the only thing I found is:

Case Sensitivity in String Comparisons
Some DBMS platforms such as SQL Server always perform string comparisons in a case-insensitive way, while others such as PostgreSQL are always case sensitive. It is therefore recommended to always assume case-sensitive comparisons, and to explicitly make comparisons case-insensitive when unsure about the case. For example, instead of (…)

Is there a recommended collation to use when setting up new CDM databases for our clients (when using SQL server)?

github user w0ut0