OHDSI Home | Forums | Wiki | Github

ATLAS not refreshing cache


A couple weeks ago, I created a patient database and was able to view it successfully in Atlas. Two days ago I updated the patients, visit table and condition occurrence tables. But ATLAS is still showing me information from the old database. I have tried the following:

(1) wiping my CDM schema clean and re-creating the tables
(2) refreshing WebAPI by stopping and starting the service
(3) cleared cookies and cache in my browser
(4) clicked Clear configuration cache in the Atlas Configuration section.

I’m running Atlas 2.8.2 on an Ubuntu 20.x server.

I realize that my post is similar to the one below
Atlas Data Source Caching Issue. The original poster here has not responded but I’m facing the same issue.

'am I missing something? Or is this a bug?

Thanks in advance

I do know that sometimes toggling between sources can show the previous source’s dashboard. This is an Atlas bug that I believe was addressed after 2.8. Is this similar to what you’re seeing?

Yes. This is the same issue am facing. So, the resolution is to move to a newer version of Atlas?

I do still see it even in 2.11.1. Can you please raise an issue in the Atlas Github?

To confirm, instead of toggling between sources, can you open both in 2 tabs, and ensure you’ve refreshed both?

I did not create two sources. I overwrote the existing source and ATLAS still shows data from my previous version.

Ah okay. That is strange. Did you try the “Clear Server Cache” button?

No. Sorry, where is this button located? All I see is a “Clear Configuration Cache” in ATLAS’ Configuration menu.

So, updating/overwriting a source with new connection settings is a tricky edge-case: when we generate cohorts, we record generation stats in webapi for the execution. If you change the source info, there could be generation statistics that remain in the db that could appear in the UI and look confusing.

Personally, if you update a source data, I think you should register it as a new source in WebAPI (and you can delete the old source). This will give you a new source_id that we track geneeration info.

Otherwise, you should delete generation_info records for the source if you update your source info. Tables like cohort_generation_info, ir_execution, cc_generation, etc. I think this is a “bad practice™” in that you’re confusing the system by saying some of the generation happened on one source, and then fooling it into pointing to a different source such that some results for source A are from one source, and then other results from source A are from some other source.

Thanks for the explanation @Chris_Knoll . How about a scenario where I’m deploying ATLAS in a hospital setting and patient records are updated daily? In this scenario, source data will be updated daily. So, should I register as a new source every day? Can you please explain how to best use ATLAS?

Well, you’d have to turn off cohort caching (I’ll try to find the setting for that): cohort generation could be an expensive operation, so we introduced cohort caching such that you generate once, and the cohort records are reused in characterization, pathways, incidence rates. There is a default timeout of when you consider the results ‘dirty’…and a cohort will become dirty if you change the cohort definition. But, we don’t have any knowledge of underlying CDM changes. I understand your context: you get daily updates and that’s your environment, but the expected usecase is that Atlas points to static datasets such that when you run an analysis, you would get the same result by executing again on the same datasource. But in your context the results would/could change daily. So, in your environment you’d disable cohort caching (by having a very small timeout like 0 days) but also realize that all the results you generate (for incidence rates) are basically made invalid when you refresh your source data.

Thanks @Chris_Knoll . This gives me a lot to think about.

Again, i really appreciate the detailed answers. This is a great community.