Hi everyone,
Following last week’s successful Hadoop Hackathon (https://www.ohdsi.org/photos-from-2017-hadoop-hack-a-thon/), I’ve been looking at ATLAS on Impala a bit more. I’ve made some progress, but have hit a blocker (probably in my understanding of the database schemas).
The progress is that I’ve got ATLAS running on a combination of Postgres (for the OHDSI tables) and Impala (for the CDM and Achilles tables) for some parts at least. In particular, Data sources and Vocabulary work. Cohort generation is working better than it was at the Hackathon in that the generated Impala SQL will now execute (I’ve fixed the bugs we hit, see https://github.com/OHDSI/Atlas/issues/418).
However, I’m having some trouble with getting the table mapping right - in particular the cohort_inclustion table, and which schema/database it should be in and how it is managed. This may be a problem with having two databases. If someone could explain how it is meant to work that would be very helpful. I’ve added a comment here with some more detail: https://github.com/OHDSI/Atlas/issues/418#issuecomment-313396510
Thanks,
Tom