Verbose logging of the SQL that Atlas plans to execute (ideally before it runs the sequence of queries) would really help debugging.
We’ve been using Spark/Databricks for a while and are enjoying the performance, but sometimes find Atlas features that aren’t working. For most user interactions in Atlas, multiple SQL commands are generated (from a template). I’d like to log the full set of SQL Atlas plans to run before it starts running any of it so that I can more easily detect when and where queries fail. Ideally those could be logged complete with content substitution; but I’d be OK with just logging the set of SQL templates if that is more feasible.
This would have a side benefit of letting me archive certain queries so that I can re-run them on a schedule.
For example, issue we’re currently trying to debug with Spark include:
- Navigating user concept hierarchy and relationships with Spark is slower than using SQL Server
- Running Heracles reports fails near last step (inserting data into heracles_results tables)
- Running Cohort Pathways fails partway through, but the Tomcat/Catalina logs flag the wrong query as the culprit (e.g. one that does run properly), so we’re having trouble tracking down which downstream query is causing the issue.
Alternatively or in addition, I presume that all Atlas interactions ultimately orchestrate the creation of a set of WebAPI calls (and do parameter substitution + SqlRender to generate the desired outputs before submitting the SQL for execution). Is there documentation that shows what sequence of API calls is generated for each type of interaction in Atlas?