Love / Hate Relationship with Atlas

BruceKissinger · June 29, 2023, 7:17pm

I think that the Atlas tool offers some great features (potentially) but it is very frustrating to use.
Perhaps I’m the only person feeling this way, but I’m continually mystified as to why certain features work and then don’t work. For example, the reports under the Data Source section, sometimes return data, but most often they report that no data is available. However, if I display a specific profile for a person then I can see all the data. Another example is that sometimes when searching for concepts, I can find the concept but it reports that there are zero records in the CDM that match the concept. However, I can see thousands of records with these concepts in the database.

I’m sorry if this is a rant, and I am very grateful that the product exists - I just wished it worked consistently from day-to-day. Perhaps one small improvement would be to document what the source of data is for each section of the web app. For example, the Profile section pulls data from the CDM tables directly. Most of the other sections seem to use data from the Achilles tables, but it would be nice to know exactly which Achilles tables are used as the source.

I looked at the Leaf tool as a potential replacement to Atlas. I like the web interface, but it seems difficult to manually build out the Concept hierarchies. Does anyone have an application that they recommend?

mlperkin · June 29, 2023, 7:50pm

I can’t speak to how Atlas actually performs as I’m still very new to OMOP in general and also Atlas. I’m also still in the process of trying to get Atlas actually setup and actually pointing to my CDM.

I’m primarily a web developer, and I would say that there’s a lot of frustrations and shortcomings especially during the setup phase that I think is likely a large deterrent that could be automated by scripts. Setting up Atlas, WebAPI, and Achilles did have a lot of documentation, but having some sort of interface to install all of these would be wonderful. I know it’s much easier said than done, and don’t want to sound like I’m complaining. I’ve spent the last 2 days experimenting with Electron and react-native-windows to get an idea of what I could do to help this process. So far…I haven’t got anything to contribute. I will say that react-native-windows is a huge pain in dealing with the file-system and made much more progress with Electron, but got quickly got overwhelmed when I tried to add React to it.

Chris_Knoll · June 30, 2023, 4:45pm

Setting up the record count cache has been a challenge for a few reasons:

The counts are generated by an R package which is outside of the WebAPI codebase
We wanted to make it fast, so we implemented a caching layer that will pull counts from potentially two places: first from the source’s results schema, and second from the WebAPI cache. This means that when you update your underlying cdm data, you have to run achilles to populate the raw data, run a concept_count query to build the RC and DRC (row count and descendent row counts) from the raw results, and then let WebAPI refresh its cache.
The caching and other analyses act as tho the CDM source is used for retrospective study (ie: ones you point it at the CDM source, we don’t expect it to change). Many of the issues have come from people wanting to do daily refreshes of the CDM source, making it behave like a prospective data collection thing, and thus we didn’t account for data changes pulling the proverbial rug from beneath our feet unexpecedly.

So, it feels like the issues you raised initially (data sources having empty reports, record counts returning zero) all stem from the challenges of integrating Achilles results into the Atlas UI. What I think we should work on short term is a way to ‘one-click’ clear the cached data so that after you have CDM data updated and you’ve re-run the Achilles scripts, you can just push a button and those results can be refreshed.

mlperkin · July 7, 2023, 8:09pm

What I think we should work on short term is a way to ‘one-click’ clear the cached data so that after you have CDM data updated and you’ve re-run the Achilles scripts, you can just push a button and those results can be refreshed.

I think this is a great idea. Also, maybe having the option to enable/disable the caching as a checkbox would be great for developers as well. I am having to manually clear the achilles_cache table as I develop.

Is it possible to execute the Achilles script from WebAPI? I feel like executing Achilles is something that could be called on the backend and would make it that much easier.

Chris_Knoll · July 8, 2023, 1:28am

No, those queries are in the domain of achilles, and so Achilles doesn’t expose it’s queries in any way that WebAPI could consume (unless we provided a PlumbR endpoint that would be able to invoke the routine).

But, even if it did, I wouldn’t suggest it: first, it takes hours and hours (maybe days in some datasets) to process all the analyses. Yes, we handle async tasks in WebAPI, but I think this one is a bit out of scope…like saying ‘can I press a button in Atlas to execute my ETL?’. Second: Achilles was designed to be to get summary statistics at the database level for diagnostic purposes. We applied some data quality checks on top of Achilles (called the Achilles Heel report), and all that is supposed to be evaluated before calling your ETL ‘done’. And, you wouldn’t want to put a non-done CDM source on your WebAPI (at least, that was our perspective about how these tools would be used, but that’s not to restrict anyone doing it their own way, just giving the perspective about what we thought the general use-case was).

Thomas_White · July 8, 2023, 10:41pm

@BruceKissinger, what you are seeing might be related to a bug we’ve seen since Atlas 2.13.0. That link shows how we work around the bug to get the correct counts.

@Chris_Knoll , are there any benchmarks available in the OHDSI community for this? If the community were willing to share the Achilles and DQD processing time for their data sources (along with appropriate metrics about data volume and database specs), that might help identify opportunities and set expectations.

Our data is not as large as many (3.5M patients; 7.5 years of data; largest tables are for measurements and observations, which each have ~8 B records). However, we’re able to serially run all 730 SQL queries in the Achilles suite in about 45 minutes, and all 3000+ queries in the DQD suite in less than 30 minutes. In both cases, we used the R packages to do a one-time generation (but not run) of parameterized SQL. Then, we run that SQL as part of our ETL pipeline (via python against a Databricks Small SQL Warehouse cluster).

Of note, for DQD, we union 100 sub-queries at a time (using sqlOnlyUnionCount), so the net is only 53 actual queries against DQD. This is a feature available since DQD 2.3.0, and documented here.

mlperkin · July 12, 2023, 4:04pm

This makes sense. I can understand that. I don’t think I’m fully understanding the purpose of Achilles, but from my end it seems like when I want to update my graphs and data in Atlas then I need to execute the Achilles function on my CDM. It feels like executing those scripts would be a better/smoother experience being executed/called from Atlas. WebAPI already knows all the information (I think?) I’m passing to the Achilles function. In designing my ETL, I am planning to use Atlas to ensure that the CDM I create works properly in Atlas so I can see a lot of executions of Achilles in my near future.

My biggest complaint is just that the setup for something like Atlas is rather laborious and not that straight-forward. I know that development is ongoing and things are actively being worked on so I thought this could have been low hanging fruit that might be easy to develop and add. Running Achilles is only two steps (at least in my case so far) I set the connectionDetails and then I execute Achilles so perhaps the return on value vs the effort it would require isn’t worth it.

I think the setup for Atlas and all of its requirements could have some small improvements here and there, but I don’t know all the use cases and different types environments the software is trying to encompass. I can understand the difficulty when a piece of software is trying to support multiple environments.

I do appreciate the insight and thanks again for all your help!

Ajit_Londhe · July 12, 2023, 5:38pm

If you are able to run Docker, I’d invite you to look at Broadsea. We’ve made a lot of progress on making Atlas/WebAPI setup easier, and are targeting the next release to add a service to run Achilles, DQD, and AresIndexer.

With Broadsea, you can stand up the tools with some tweaks to the included environment variables file and a few Docker commands.

If you’re not able to run Docker, we are also evaluating alternatives like Podman.