@tomwhite @Chris_Knoll @gregk Do you have any updates to share on the current status of Atlas on Impala? Does Atlas cohort generation completely work or is there still an open issue with the deletion of prior results due to the HDFS append only write limitation?
If HDFS deletes are still an issue, one solution could be to implement a separate cohort 'soft delete' key table that could be appended to with the deleted cohort row keys.
For Impala, Sqlrender could translate cohort table deletes to insert the deleted cohort row keys into the 'soft delete' table.
A cohort view, created just for Impala deployments, could be used to join the cohort table to the 'soft delete' table as a way to transparently ignore the 'soft deleted' cohort rows.
A separate batch SQL process could be scheduled to re-create the cohort table on a periodic (nightly/weekly) basis minus the soft deleted rows. It would be similar to running a table 'vacuum' process in postgres/netezza.
Are there any other solutions currently being investigated?