@anthonysena @Chris_Knoll - thanks for the update. Would like to know whether generating custom feature under characterization is available and working under this version. I mean in v2.7.1 it had a bug, so we couldn’t create. I have opened a github Issue… Can you let us know?
Hi @SELVA_MUTHU_KUMARAN - the problem you reported via the GitHub issue (https://github.com/OHDSI/WebAPI/issues/1215) should be resolved with this release. Could give it a try on the 2.7.2 build and if the problem persists, could you keep a record of it in that issue on GitHub? Thanks!
Okay. It still occurs. Anyway I will update it in github. Thanks
If you could post the full characterization, including the custom feature that is failing, to the GitHub issue that would help with the investigation. I was able to generate a characterization using the 2.7.2 build with the feature you described in this post: How to create own features - Atlas Characterization - Summary Statistics.
@anthonysena - Ahh. I am waiting to use that feature for a demo. were you able to make use of features from measurement domain?
@anthonysena - Is there any instruction on how to create a new feature? Are the steps that I have mentioned in the post ( How to create own features - Atlas Characterization - Summary Statistics.is correct? Am I doing it the wrong way?
Can you share your characterization and feature json code? would be helpful. I have uploaded mine in the github
Is it possible to make available a tagged version of the broadsea-webtools docker image to bump the version of Atlas/WebAPI to 2.7.2 from 2.6.0?
@bfurey Sure. I’m working on updating the broadsea-webtools Dockerfile to add the required ATLAS npm build step and then I’ll push the change to GitHub/Docker Hub
New release looks great!
Is there a user guide for navigating the study package outputs? More specifically, is there a way to know what’s expected and how to tell if the package actually ran correctly?
For example, we now have 2.7.2 in our environment. I excitedly ran a PLE package and got multiple execution ZIPs. I’m looking at the outputs and suspect there’s a configuration issue with RedShift based on a file “ErrorReport.txt”.
Where i see:
java.sql.SQLException: Amazon Invalid operation: relation “#id_set_1” does not exist;
CREATE TABLE #cov_1
CAST(observation_concept_id AS BIGINT) * 1000 + 804 AS covariate_id,
row_id, 1 AS covariate_value
SELECT DISTINCT observation_concept_id,
cohort.row_id AS row_id FROM #cohort_person cohort INNER JOIN full_201903_omop_v5.observation ON cohort.subject_id = observation.person_id WHERE observation_date <= DATEADD(DAY,CAST(0 as int),cohort.cohort_start_date) AND observation_date >= DATEADD(DAY,CAST(-30 as int),cohort.cohort_start_date) AND observation_concept_id != 0 AND observation_concept_id NOT IN (SELECT id FROM #id_set_1)
R version 3.4.4 (2018-03-15)
Attached base packages:
Other attached packages:
- CohortMethod (3.0.2)
- FeatureExtraction (2.2.3)
- Cyclops (2.0.2)
- DatabaseConnector (2.4.0)
- snow (0.4-3)
- MASS (7.3-50)
- godmode (0.0.1)
- remotes (2.0.4)
- usethis (1.5.0)
- devtools (2.0.2)"
If there’s anywhere I can review a user guide on what expected outputs are, that would be immensely helpful. The GitHub is a little light on this documentation.
Would you mind opening up an issue on OHDSI/SQLRender? It seems the temp table is not being handled properly, and I believe this is related to sql translation into redshift, so not a PLP package problem per se.
I, too, suspect it’s a SqlRender problem. But I don’t want to pass the hot potato just yet.
Is there a guide on what the expected output is for what the Estimation tab is generating back? I hadn’t expected the ErrorReport though I appreciate its existence. Trying to understand how the package execution runs and functions first. Will raise tickets in sub-packages as appropriate.
As it turns out, redshift supports
# notation when defining temp tables! Who knew? So, not a sql render problem.
@anthonysena tried to trace through this, and it is possible that there’s a table name mismatch for the excluded concepts list in the query (the
#id_set_1 table). We’re going to ask the devs on feature extraction to chime in, but I am pretty confident that this is a FeatureExtraction problem, and that would be the place to raise the issue in Git.
@chris_knoll Love the problem solving and appreciate you tapping in Mr. @anthonysena. I want to take a step back and be more holistic here because I really want understand what we click in the UI and what happens under the hood.
My view of the problem: we’re experiencing a problem with FeatureExtraction in the Estimation package construction being generated from ATLAS.
My understanding based on your response: ATLAS is pulling a specific version of FeatureExtraction to execute the Estimation module. In this problem, we see evidence of a mismatch in table creation for how PLE is using FeatureExtraction.
But what I don’t totally get is… does ATLAS do anything on the backend in the sausage making process? Or is it simply executing the equivalent code of what you export from ATLAS? If so, I would like to trace what it does holistically because we’re actually struggling in multiple capacities to generate a successful R package using the ATLAS 2.7.2 Estimation module. I’m starting to wonder if it’s actually a Estimation packaging issue holistically.
But then, I’m also under the impression Characterization, Estimation and Prediction utilize FeatureExtraction. I have not tested Characterization or Prediction yet to see whether I would get a message of similar variety. You see my quandary?
Is this: A) a simple FeatureExtraction package issue for the database layer (Redshift) we use, B) a local configuration issue in how we deployed ATLAS 2.7.2 in our environment that propagates across all the times we use FeatureExtraction, C) a bug in how Estimation uses FeatureExtraction to generate results or D) I give up.
I’m also moderately alarmed that an Estimation study executed by ATLAS 2.7.2 ran for 95 minutes (seems like it was a greedy unbuildable cohort, I don’t know) before it packaged itself up with an ErrorReport that says it totally failed in the middle. How do I reconcile that? Is this not a good place we should maybe debug the utility of the module to have a better stage gate?
I’m not a native English speaker. Would you mind rephrasing “debug the utility of the module to have a better stage gate”? I’m having a hard time understanding what you mean.
If you share your study definition JSON with me, I can debug the problem, create a fix, and it will be incorporated in WebAPI as soon as possible.
But I think that is not what you want?
Sena and I didn’t get too far, but it had to do with the creation of the temp table from the excluded concepts. Somewhere in the code, it’s naming the temp table that contains the excluded concepts as
id_set_1, but sena and I were a little suspicious about the loop that creates the temporary tables and how the temporary table that is defined matches up with
id_set_1 (we didn’t get far enough in the code to trace where the tables are actually created and inserted into based on
If you don’t want to wait for @krfeeney’s specific test case, I believe this issue comes up when you define a PLE analysis, that uses features from any domain (it uses the
DomainConcept.sql to build the query) and it injects the table name here: https://github.com/OHDSI/FeatureExtraction/blob/master/inst/sql/sql_server/DomainConcept.sql#L39. But, like I said, it’s not exactly clear the trace path that led to the paramater
@excluded_concept_table. But we’re pretty sure it’s related to exclusions because of the error @krfeeney pointed out where the query goes:
AND observation_concept_id NOT IN (SELECT id FROM #id_set_1) .
And, actually, based on that, the specific feature was trying to pull from the Observation table.
Did you design the analysis in Atlas and then download the package as a zip? If so, then all Atlas did was build some R scripts, put it into a folder structure, compressed it as a zip, and then sent it down to you. I’m not sure if your environment is configured to execute analyses directly in atlas, but if so, it simply runs the R script that was bundled in the zip. If you did want to trace what it is doing, you could just open the script in RStudio, and set breakpoints.
They do. In the case of Characterization, we retrieve the SQL for features and execute them directly in the WebAPI layer. For estimation and prediction, the calls to the FeatureExtraction library are called from within the generated R script (which would execute in the context of an R session). But characterization doesn’t allow you to specify any of the exclusion parameters of feature extraction, but PLE does. It looks like you are using the exclusion paramater for this analysis, and that’s why I belive the bug is related to something where the excluded concepts are loaded into a temp table.
Depending on your environment and database indexes and configuration, it may run a long time. You can always generate the cohort in atlas to get an idea of what the cohort generation will take. You can also step through the R code to see the timing of cohort generation. I thought there was console output describing what was happening in the package, but I could be wrong.
Without really knowing anything about your environment or the cohorts/analysis you are trying to execute, it’s not easy to give you specific advice.
Appreciate you clarifying my lazy English, @Chris_Knoll.
For context: I took @mattspotnitz’s cohort from public ATLAS (http://www.ohdsi.org/web/atlas/#/estimation/cca/81) and imported it via Utilities into my internal ATLAS that we just updated to 2.7.2. (@Konstantin_Yaroshove could speak to the configuration – it’s a RedShift data layer.) In the UI, I used the execution tab to run it against a data set.
You’re right. Though, if you’re using the execution tab and not dumping this into R, this file is quietly in the background and only becomes apparent after the code eventually knows to kill/stop itself. Totally fine but if you are trying to follow along from execution via UI, it’s not as straightforward. You’re on a bit of a treasure hunt for the information of what happened.
Ultimately, I downloaded the ZIP output to see where it landed.
For other context: @mattspotnitz has been passing around just the ZIP of the ATLAS study package. Many of us (@izzysrdks @George_Argyriou) are working on testing it in our local environments (sans running via ATLAS) and having issues. So my hunch is something is funky about this particular package.
I admittedly overgeneralized. I just looked at my runs for a different Estimation package (http://www.ohdsi.org/web/atlas/#/estimation/cca/80) that I imported into our ATLAS. In this case, there’s no ErrorReport so I think the package actually executed correctly… but again, trying to decode what spits out after using the Execution tab is a bit of walking in the dark. Not sure what I’m actually looking at but I guess the absence of an ErrorReport is a good thing? Let’s call that issue #2. Which is to say – if someone from the ATLAS team can explain to me what’s what, I’ll happily write up a blurb we can publish on the GitHub wiki so other folks can use the resource.
In the short order, I’m going to email @schuemie the output I’ve got from this run and copy @mattspotnitz since it’s really his study. Big picture, it seems like something about this particular study is not rendering correctly when it’s spit into an ATLAS PLE package. Maybe it’s actually a cohort creation issue? Not sure whose issue queue this falls just being a squeaky wheel til I find the right home for this problem.
So the problem was in DatabaseConnector, and can be solved by upgrading to the latest development version:
Taking a step back, the reason we did not observe this issue in our unit tests is because our unit tests only test against SQL Server, Oracle, and PostgreSQL. @lee_evans: perhaps we could add a RedShift testing environment?