OHDSI Home | Forums | Wiki | Github

Support for BIGINT (64bit integers) and R

(Clair Blacketer) #1

I am moving a question from the CDM github over here as I think it will get more traction on the forums:

General question on database design that will be accessed via R - should the use of BIGINTs be changed to a VARCHAR (and can be the hex of the big integer)?

Base R does not support 64bit integers. The bit64 package is often used to support this and I believe that many dplyr / dbplyr functionality works with integer64 class objects vs integer. But if using Base R, then one loses the precision (down to about 52 bits I believe based on the IEEE format for a double) and IDs that are BIGINT in the database will get munged to the same real / double in R.

Since the OHDSI suite includes R packages, I’m wondering if there is a known policy / best practices on this.

Thank you!

-github user Brianrepko

(Brian Repko) #2

Thank you @clairblacketer - I’ve signed up for the OHDSI forums as well since we are doing work similar to CDM based on Oncology, Clinical Trials and Genomics.

(Martijn Schuemie) #3

Thanks @clairblacketer for considering us poor R users :wink:

The develop versions of many of our packages now support 64-bit integers, including DatabaseConnector, Andromeda, FeatureExtraction, Cyclops, and CohortMethod. Some packages (including the skeletons) still need some minor tweaks, but after that we should be able to release everything and have full support for 64-bit integers.

So I recommend using BIGINTS in the CDM where appropriate. I would ask that concept IDs remain 32-bit integers, or else FeatureExtraction has a problem.

(Brian Repko) #4

Thank you @schuemie - we hit this years ago with a high-low bigint ID scheme and R and ended up changing the fields to the hex string (16 char) as a quick work-around. Now I see that most folks are using bit64 package. Cheers