Concurrent cohort generation in Atlas against Databricks sometimes fails

Thomas_White · September 9, 2023, 2:28am

When running multiple concurrent cohort generation jobs from Atlas (connected to a Databricks SQL Warehouse), we sometimes get job failures when Atlas tries to write to the following tables in the results schema:

cohort_cache
cohort_inclusion
cohort_inclusion_result_cache
cohort_summary_stats_cache

An example query is:
INSERT OVERWRITE TABLE results .cohort_summary_stats_cache
SELECT * FROM results.cohort_summary_stats_cache
WHERE NOT (design_hash = 1070835750 and mode_id = 0)

And the error we get is:
Files were added to the root of the table by a concurrent update. Please try the operation again.
Refer to https://docs.microsoft.com/azure/databricks/delta/concurrency-control for more details.

Those queries are not part of SQL visible within Export for the cohorts. What is the intent behind that query? It appears to be a work-around for:

DELETE * FROM results.cohort_summary_stats_cache WHERE (design_hash = 1070835750 and mode = 0)

Presumably, followed by an insert statement to add updated content for that design hash.

Where should I look in the code to find those queries?
Are other Databricks users facing a similar challenge?