OHDSI Home | Forums | Wiki | Github

Splitting table for efficiency

Problem
As I understand, When DRUG_EXPOSURE table moves from SQL SERVER or PostgreSQL to SAS, varchar variables become character string variables of length sufficient to accommodate the maximum length observation. Thus, every row in DRUG_EXPOSURE reserves a big chunk of storage for SIG. We rarely have really long SIGs, and usually do not need to use them, I would like to trim things down a bit.

Options that come to mind:

  1. reduce the variable’s length to something reasonable and accept that the field may be truncated occasionally.
  2. On the SAS side, set COMPRESS=YES on the DRUG_EXPOSURE table, accepting that there will be in increase in I/O overhead when reading the table,
  3. Drop the SIG field from DRUG_EXPOSURE, creating a local table X_DRUG_EXPOSURE which contains only SIG and an FK referring back to DRUG_EXPOSURE's PK (Drug_Exposure_ID). This may complicate coding a bit and add a bit of overhead when SIG is needed. However, SIG is not needed all that often.

I expect that the first two would be the most common strategies.

Questions:

  1. Are there reasons, beyond what I mentioned, to rule out the third approach?
  2. Are there other approaches to consider?

    Thanks,
    Gerry

option 1 seems ok.

Many OHDSI folks use R. So there is there is also option 4. avoid SAS and 5. keep data in the database (avoid exporting the whole table outside the database)

@Pulver, Option 1 or 3 seems ok for me, too. As you said, you don’t need SIG often.

Openness is one of the objectives of OHDSI. Although I much prefer R to SAS or python and I agree more or less with @Vojtech_Huser (‘keep data in the database’), SAS should be encouraged, too.

t