Atlas/WebAPI 2.7.2 Released

anthonysena · June 29, 2019, 5:56pm

This release with detailed release notes and testing results are found on GitHub at the following links:

Atlas: https://github.com/OHDSI/Atlas/releases/tag/v2.7.2
WebAPI: https://github.com/OHDSI/WebAPI/releases/tag/v2.7.2

We encourage you to demo this new release at http://www.ohdsi.org/web/atlas and install it locally following the setup guides for Atlas (https://github.com/OHDSI/Atlas/wiki/Atlas-Setup-Guide) and WebAPI (https://github.com/OHDSI/WebAPI/wiki/WebAPI-Installation-Guide). If you run into any issues or have any suggestions for future ATLAS features, please tell us at: https://github.com/OHDSI/Atlas/issues

Many thanks to the Atlas working group team members that made this release possible. Here are some notes on what’s new with 2.7.2:

General

Ensure all design elements (i.e. concept sets, cohort definitions, etc) have a unique name in the system. Any design elements with duplicated names, will be renamed to have a suffix (i.e. “(2)”) to distinguish it from the other entity that shared the same name.
Full Import/Export for all design elements: Characterization, Pathways, Incidence Rates, Estimation, Prediction.
Improved experience by disabling features when they are not appropriate to use (i.e. export is disabled when you have unsaved changes)
Reviewed and updated error messages where possible
Wiki updates - Atlas Tutorials (https://github.com/OHDSI/Atlas/wiki) & WebAPI setup documentation (https://github.com/OHDSI/WebAPI/wiki) on GitHub wikis for each code repository

Data sources

Added caching to Data sources reports (person, dashboard, data density, and top-level domain reports). Additional caching optimizations will be applied in the 2.8 release.
Fixed precision/rounding on data density plot
Fixed ATC display on drug era reports

Vocabulary

Better handling of special characters in search (can search for "MG/DL" and it will replace it with "MG DL" as a work-around for now)
Advanced search is restored for use
Concept counts added for non-standard concepts (assumes we have run Achilles with new analyses to characterize non-standard concepts)
Using the "add all" concepts from the vocabulary search will no longer clear filters

Concept sets

Added back ability to import concepts into an open concept set
Fixed a lot of items that were broken when adding concepts from the vocab, navigating to included concepts/source codes, exporting, etc. This experience should be much better in the repository concept sets and cohort definitions.

Cohort Definition

Provides flexibility for restricting events based on occurrence in the observation period
Export the print friendly in a single click
Added labels to the attrition view to display the inclusion rule information
Fixes the drill down display for condition and drug era reports in cohort reporting (aka Heracles reporting)

Characterization

Removed prompt for re-generation of failed jobs
Prevents generating a characterization with 0 cohorts
Revised the "explore" button for navigating results & fixed the proportions displayed
Revised the standardized difference scatterplot to hide (gray out) covariates with stddiff < 0.1. Revised the display of the mark on the plot upon hover-over and adjusted the precision of the x/y display.
Reformatted the design display in the results section to make it more human readable
Added the ID field to the list of all characterizations

Pathways

Optimizations made for storing and retrieving results
Fixed search for target and event cohorts
Added the ID field to the display of all pathways

Incidence Rates

Fixed display issues when navigating to results (sometimes the screen appeared empty but would then show results after waiting a while).

Profiles

Fixed event highlighting and period selection

Estimation/Prediction

Fixed package name entry bug (you couldn’t remove the first character)
Properly refresh of study summary after editing a design element

SELVA_MUTHU_KUMARAN · June 29, 2019, 11:59pm

@anthonysena @Chris_Knoll - thanks for the update. Would like to know whether generating custom feature under characterization is available and working under this version. I mean in v2.7.1 it had a bug, so we couldn’t create. I have opened a github Issue… Can you let us know?

anthonysena · June 30, 2019, 1:59pm

Hi @SELVA_MUTHU_KUMARAN - the problem you reported via the GitHub issue (https://github.com/OHDSI/WebAPI/issues/1215) should be resolved with this release. Could give it a try on the 2.7.2 build and if the problem persists, could you keep a record of it in that issue on GitHub? Thanks!

SELVA_MUTHU_KUMARAN · June 30, 2019, 2:01pm

Okay. It still occurs. Anyway I will update it in github. Thanks

anthonysena · June 30, 2019, 2:11pm

If you could post the full characterization, including the custom feature that is failing, to the GitHub issue that would help with the investigation. I was able to generate a characterization using the 2.7.2 build with the feature you described in this post: How to create own features - Atlas Characterization - Summary Statistics.

SELVA_MUTHU_KUMARAN · June 30, 2019, 2:18pm

@anthonysena - Ahh. I am waiting to use that feature for a demo. were you able to make use of features from measurement domain?

SELVA_MUTHU_KUMARAN · June 30, 2019, 2:19pm

@anthonysena - Is there any instruction on how to create a new feature? Are the steps that I have mentioned in the post ( How to create own features - Atlas Characterization - Summary Statistics.is correct? Am I doing it the wrong way?

SELVA_MUTHU_KUMARAN · June 30, 2019, 2:25pm

Can you share your characterization and feature json code? would be helpful. I have uploaded mine in the github

bfurey · July 5, 2019, 9:18am

Is it possible to make available a tagged version of the broadsea-webtools docker image to bump the version of Atlas/WebAPI to 2.7.2 from 2.6.0?
https://hub.docker.com/r/ohdsi/broadsea-webtools

lee_evans · July 5, 2019, 1:50pm

@bfurey Sure. I’m working on updating the broadsea-webtools Dockerfile to add the required ATLAS npm build step and then I’ll push the change to GitHub/Docker Hub

krfeeney · July 9, 2019, 1:37pm

New release looks great!

Is there a user guide for navigating the study package outputs? More specifically, is there a way to know what’s expected and how to tell if the package actually ran correctly?

For example, we now have 2.7.2 in our environment. I excitedly ran a PLE package and got multiple execution ZIPs. I’m looking at the outputs and suspect there’s a configuration issue with RedShift based on a file “ErrorReport.txt”.

Where i see:
"DBMS:
redshift

Error:
java.sql.SQLException: Amazon Invalid operation: relation “#id_set_1” does not exist;

SQL:
CREATE TABLE #cov_1
DISTSTYLE ALL
AS
SELECT
CAST(observation_concept_id AS BIGINT) * 1000 + 804 AS covariate_id,

row_id,
1 AS covariate_value

FROM
(
SELECT DISTINCT observation_concept_id,

	cohort.row_id AS row_id

FROM #cohort_person cohort
INNER JOIN full_201903_omop_v5.observation
	ON cohort.subject_id = observation.person_id

WHERE observation_date <= DATEADD(DAY,CAST(0 as int),cohort.cohort_start_date)
	AND observation_date >= DATEADD(DAY,CAST(-30 as int),cohort.cohort_start_date)
	AND observation_concept_id != 0


	AND observation_concept_id NOT IN (SELECT id FROM #id_set_1)

) by_row_id

R version:
R version 3.4.4 (2018-03-15)

Platform:
x86_64-pc-linux-gnu

Attached base packages:

methods
stats
graphics
grDevices
datasets
utils
base

Other attached packages:

CohortMethod (3.0.2)
FeatureExtraction (2.2.3)
Cyclops (2.0.2)
DatabaseConnector (2.4.0)
snow (0.4-3)
MASS (7.3-50)
godmode (0.0.1)
remotes (2.0.4)
usethis (1.5.0)
devtools (2.0.2)"

If there’s anywhere I can review a user guide on what expected outputs are, that would be immensely helpful. The GitHub is a little light on this documentation.

Chris_Knoll · July 9, 2019, 2:05pm

Would you mind opening up an issue on OHDSI/SQLRender? It seems the temp table is not being handled properly, and I believe this is related to sql translation into redshift, so not a PLP package problem per se.

-Chris

krfeeney · July 9, 2019, 4:59pm

I, too, suspect it’s a SqlRender problem. But I don’t want to pass the hot potato just yet.

Is there a guide on what the expected output is for what the Estimation tab is generating back? I hadn’t expected the ErrorReport though I appreciate its existence. Trying to understand how the package execution runs and functions first. Will raise tickets in sub-packages as appropriate.

Chris_Knoll · July 9, 2019, 8:03pm

As it turns out, redshift supports # notation when defining temp tables! Who knew? So, not a sql render problem.

@anthonysena tried to trace through this, and it is possible that there’s a table name mismatch for the excluded concepts list in the query (the #id_set_1 table). We’re going to ask the devs on feature extraction to chime in, but I am pretty confident that this is a FeatureExtraction problem, and that would be the place to raise the issue in Git.

-Chris

krfeeney · July 10, 2019, 12:27am

@chris_knoll Love the problem solving and appreciate you tapping in Mr. @anthonysena. I want to take a step back and be more holistic here because I really want understand what we click in the UI and what happens under the hood.

My view of the problem: we’re experiencing a problem with FeatureExtraction in the Estimation package construction being generated from ATLAS.
My understanding based on your response: ATLAS is pulling a specific version of FeatureExtraction to execute the Estimation module. In this problem, we see evidence of a mismatch in table creation for how PLE is using FeatureExtraction.

But what I don’t totally get is… does ATLAS do anything on the backend in the sausage making process? Or is it simply executing the equivalent code of what you export from ATLAS? If so, I would like to trace what it does holistically because we’re actually struggling in multiple capacities to generate a successful R package using the ATLAS 2.7.2 Estimation module. I’m starting to wonder if it’s actually a Estimation packaging issue holistically.

But then, I’m also under the impression Characterization, Estimation and Prediction utilize FeatureExtraction. I have not tested Characterization or Prediction yet to see whether I would get a message of similar variety. You see my quandary?

Is this: A) a simple FeatureExtraction package issue for the database layer (Redshift) we use, B) a local configuration issue in how we deployed ATLAS 2.7.2 in our environment that propagates across all the times we use FeatureExtraction, C) a bug in how Estimation uses FeatureExtraction to generate results or D) I give up.

I’m also moderately alarmed that an Estimation study executed by ATLAS 2.7.2 ran for 95 minutes (seems like it was a greedy unbuildable cohort, I don’t know) before it packaged itself up with an ErrorReport that says it totally failed in the middle. How do I reconcile that? Is this not a good place we should maybe debug the utility of the module to have a better stage gate?

schuemie · July 10, 2019, 4:55am

I’m not a native English speaker. Would you mind rephrasing “debug the utility of the module to have a better stage gate”? I’m having a hard time understanding what you mean.

If you share your study definition JSON with me, I can debug the problem, create a fix, and it will be incorporated in WebAPI as soon as possible.

But I think that is not what you want?

Chris_Knoll · July 10, 2019, 5:18am

Hey, @schuemie,
Sena and I didn’t get too far, but it had to do with the creation of the temp table from the excluded concepts. Somewhere in the code, it’s naming the temp table that contains the excluded concepts as id_set_1, but sena and I were a little suspicious about the loop that creates the temporary tables and how the temporary table that is defined matches up with id_set_1 (we didn’t get far enough in the code to trace where the tables are actually created and inserted into based on excludedConcepts).

If you don’t want to wait for @krfeeney’s specific test case, I believe this issue comes up when you define a PLE analysis, that uses features from any domain (it uses the DomainConcept.sql to build the query) and it injects the table name here: FeatureExtraction/inst/sql/sql_server/DomainConcept.sql at main · OHDSI/FeatureExtraction · GitHub. But, like I said, it’s not exactly clear the trace path that led to the paramater @excluded_concept_table. But we’re pretty sure it’s related to exclusions because of the error @krfeeney pointed out where the query goes:
AND observation_concept_id NOT IN (SELECT id FROM #id_set_1) .
And, actually, based on that, the specific feature was trying to pull from the Observation table.

Did you design the analysis in Atlas and then download the package as a zip? If so, then all Atlas did was build some R scripts, put it into a folder structure, compressed it as a zip, and then sent it down to you. I’m not sure if your environment is configured to execute analyses directly in atlas, but if so, it simply runs the R script that was bundled in the zip. If you did want to trace what it is doing, you could just open the script in RStudio, and set breakpoints.

They do. In the case of Characterization, we retrieve the SQL for features and execute them directly in the WebAPI layer. For estimation and prediction, the calls to the FeatureExtraction library are called from within the generated R script (which would execute in the context of an R session). But characterization doesn’t allow you to specify any of the exclusion parameters of feature extraction, but PLE does. It looks like you are using the exclusion paramater for this analysis, and that’s why I belive the bug is related to something where the excluded concepts are loaded into a temp table.

Depending on your environment and database indexes and configuration, it may run a long time. You can always generate the cohort in atlas to get an idea of what the cohort generation will take. You can also step through the R code to see the timing of cohort generation. I thought there was console output describing what was happening in the package, but I could be wrong.

Without really knowing anything about your environment or the cohorts/analysis you are trying to execute, it’s not easy to give you specific advice.

krfeeney · July 10, 2019, 1:21pm

Appreciate you clarifying my lazy English, @Chris_Knoll.

For context: I took @mattspotnitz’s cohort from public ATLAS (http://www.ohdsi.org/web/atlas/#/estimation/cca/81) and imported it via Utilities into my internal ATLAS that we just updated to 2.7.2. (@Konstantin_Yaroshove could speak to the configuration – it’s a RedShift data layer.) In the UI, I used the execution tab to run it against a data set.

You’re right. Though, if you’re using the execution tab and not dumping this into R, this file is quietly in the background and only becomes apparent after the code eventually knows to kill/stop itself. Totally fine but if you are trying to follow along from execution via UI, it’s not as straightforward. You’re on a bit of a treasure hunt for the information of what happened.

Ultimately, I downloaded the ZIP output to see where it landed.

For other context: @mattspotnitz has been passing around just the ZIP of the ATLAS study package. Many of us (@izzysrdks @George_Argyriou) are working on testing it in our local environments (sans running via ATLAS) and having issues. So my hunch is something is funky about this particular package.

I admittedly overgeneralized. I just looked at my runs for a different Estimation package (http://www.ohdsi.org/web/atlas/#/estimation/cca/80) that I imported into our ATLAS. In this case, there’s no ErrorReport so I think the package actually executed correctly… but again, trying to decode what spits out after using the Execution tab is a bit of walking in the dark. Not sure what I’m actually looking at but I guess the absence of an ErrorReport is a good thing? Let’s call that issue #2. Which is to say – if someone from the ATLAS team can explain to me what’s what, I’ll happily write up a blurb we can publish on the GitHub wiki so other folks can use the resource.

In the short order, I’m going to email @schuemie the output I’ve got from this run and copy @mattspotnitz since it’s really his study. Big picture, it seems like something about this particular study is not rendering correctly when it’s spit into an ATLAS PLE package. Maybe it’s actually a cohort creation issue? Not sure whose issue queue this falls just being a squeaky wheel til I find the right home for this problem.

gregk · July 10, 2019, 1:30pm

@krfeeney it looks like you are using ARACHNE Execution Engine to execute ATLAS PLE code. I am tagging @pavgra and @Konstantin_Yaroshove here to look into that and help to debug

schuemie · July 12, 2019, 10:24am

So the problem was in DatabaseConnector, and can be solved by upgrading to the latest development version:

devtools::install_github("ohdsi/Databaseconnector", ref="develop")

Taking a step back, the reason we did not observe this issue in our unit tests is because our unit tests only test against SQL Server, Oracle, and PostgreSQL. @lee_evans: perhaps we could add a RedShift testing environment?