Hi @clairblacketer, @rvilloria
Clair - back to our conversation on that - I believe we really need to redo the Impala DDL all together. The tables should be created with Impala data types being explicit in CREATE clause, and it should include PARQUET as a clause at the end of CREATE.
CREATE TABLE parquet_table_name (x INT, y STRING, z TIMESTAMP) STORED AS PARQUET;
right now our Impala scripts are doing some really strange manipulations where first tables are created with incorrect TIMESTAMP types (VARCHAR), then data is loaded and then it is converted into correct TIMESTAMP format. It is not needed. I believe it was done due to the fact that SynPUF OMOP CDM sample data file didn't contain data in the format ready to be loaded into Impala and it was assumed that all data is like that. On a contrary, with proper approach for data to prepared in the format ready to be loaded into the target database.
I propose we create a valid Impala DDL file with TIMESTAMP being applied and PARQUET format specified for storage.