As I understand, When DRUG_EXPOSURE table moves from SQL SERVER or PostgreSQL to SAS, varchar variables become character string variables of length sufficient to accommodate the maximum length observation. Thus, every row in DRUG_EXPOSURE reserves a big chunk of storage for SIG. We rarely have really long SIGs, and usually do not need to use them, I would like to trim things down a bit.
Options that come to mind:
- reduce the variable's length to something reasonable and accept that the field may be truncated occasionally.
- On the SAS side, set COMPRESS=YES on the DRUG_EXPOSURE table, accepting that there will be in increase in I/O overhead when reading the table,
- Drop the SIG field from DRUG_EXPOSURE, creating a local table X_DRUG_EXPOSURE which contains only SIG and an FK referring back to DRUG_EXPOSURE's PK (Drug_Exposure_ID). This may complicate coding a bit and add a bit of overhead when SIG is needed. However, SIG is not needed all that often.
I expect that the first two would be the most common strategies.
- Are there reasons, beyond what I mentioned, to rule out the third approach?
- Are there other approaches to consider?