SQL Server data types

quinnt · August 14, 2020, 3:53pm

We’re preparing to instantiate an OMOP CDM v5.3.1 on SQL Server and load it from our Epic Caboodle database. I want to ensure we’re getting off on the right foot, especially in regards to “future proofing”. I have some SQL Server data type questions that are probably best posed to @clairblacketer, @Christian_Reich, and @Patrick_Ryan as the authors of the SQL Server DDL.

Why VARCHAR(1) for Boolean flags instead of TINYINT or even CHAR(1)? (I can understand hesitancy over using SQL Server’s BIT data type.)
Did OHDSI make a conscious choice to use VARCHAR instead of NVARCHAR? (Epic uses NVARCHAR in Caboodle.)
Why FLOAT in the COST table instead of NUMERIC? Any concerns with aggregating monetary amounts?
Why FLOAT in the LOCATION table instead of NUMERIC(10,8) for latitude (+/-90 degrees) and NUMERIC(11,8) for longitude (+/-180 degrees)?
I imagine there is no harm in future proofing our CDM v5.3.1 by using BIGINT now for the non-concept ID primary keys?
Confirm that we are free to increase the length of any VARCHAR column, if we need to do so?

Like many academic organizations, we will use our OMOP instance both for our internal researchers and for contributing data to various research networks. I do not want to make any technical decisions that would get us into trouble later for either of these use cases. Thanks so much!

Sanjay_Udoshi · March 22, 2022, 4:42pm

Hi Tim,

Did you find answers to these questions? I am just starting out implementing CDM 5.4 and am running into issues on Sql Server as well.

-Sanjay

quinnt · March 27, 2022, 6:24pm

Hello @Sanjay_Udoshi,

As you can see from this thread, no one replied to my original post.

We elected to make “compatible” data type changes and have experienced no problems sharing OMOP data extracts with research partners & collaboratives.

By “compatible”, I mean that the data type can be converted implicitly without throwing an error.

INTEGER to BIGINT – for all id columns (but not the concept_id columns)
INTEGER to SMALLINT or TINYINT – for columns like person.year_of_birth and person.day_of_birth
VARCHAR to NVARCHAR
VARCHAR(1) to NCHAR(1) – because there is no need to use 2 extra bytes to store a trivial length
FLOAT to NUMERIC(p,s)
DATETIME to DATETIME2 – per Microsoft’s recommendation and for ANSI / ISO 8601 compliance
Lengthen (N)VARCHAR columns as needed to match the length of our source system’s columns

Sanjay_Udoshi · March 27, 2022, 6:46pm

Thank you @quinnt !
This is very helpful.

One other point that I found is to make the collation = SQL_Latin1_General_CP1_CS_AS for the underlying SQL Server database. This caused problems when I started out with the default as SQL_Latin1_General_CP1_CI_AS

-Sanjay

DTorok · March 28, 2022, 12:34pm

The changes you have identified will not create any problems in creating your ETL and should not be a problem when running the Data Quality Dashboard, Achilles or Atlas. But you will be the ones testing these changes, so please update this post as you start running the QC and analytic programs.

quinnt · March 28, 2022, 9:45pm

I can confirm that the data type changes I enumerated above have not caused us any problems over the past 2 years with the OHDSI tools that @DTorok mentioned:

ATLAS
ACHILLES (including HEEL)
Data Quality Dashboard (DQD)

For the record, we’re using Microsoft SQL Server 2019 Enterprise Edition (version 15.0.4153.1).

Sanjay_Udoshi · March 28, 2022, 10:23pm

Tim,

Are you using the Broadsea containers for general use? How many users do you have?

-Sanjay

Wilfried · March 3, 2023, 12:51am

@quinnt
I am the head of IT at the Cliniques Universitaires Saint-Luc Hospital in Brussels, Belgium.

We are at the beginning of the OMOP implementation project with our EMR EPIC and you seem to have already done this exact same project. Can we contact you for feedback and some guidance?

Kind regards.