OHDSI Home | Forums | Wiki | Github

Suggestion for Achilles

My database got billions of entries and sometimes I got an error that could potentially trash the whole Achilles execution by the time Achilles ends the latest Analysis. I found that for several cases if Achilles could stop on an error like that, notify the user, and ask if the user will like to insist on running the problematic Analysis (of course after correcting the related error if possible) or to halt the whole process could be better than continue with all the analysis but trashing the whole process after all analysis have been completed. For example, I just got an error like:

Error:
org.postgresql.util.PSQLException: ERROR: column “payer_source_concept_id” does not exist
Position: 112

And Aquilles continued running…

If Achilles could give me the opportunity to create this column and restart the related failed Analysis without cancelling Achilles then the whole process could have a “happy ending”.

Can this be implemented?

Thanks
Jose

Yes, for the moment I think the only thing you can do is remove some tests from the planned execution list. By looking at the error file you will know which ones failed, and you can remove them for the next execution

Can you please tell me where to find the planned execution file? I wonder what could happen if I restart Achilles with a planned execution that starts from the first error I got. So far I can see two error messages:

/achillesError_1425.txt 836/836 100%
DBMS:
postgresql

Error:
org.postgresql.util.PSQLException: ERROR: column “payer_source_concept_id” does not exist
Position: 112

SQL:
–HINT DISTRIBUTE_ON_KEY(stratum_1)
CREATE TEMP TABLE s_ta_1425

and

/achillesError_1900.txt 2693/7830 34%
DBMS:
postgresql

Error:
org.postgresql.util.PSQLException: ERROR: column “modifier_source_value” does not exist
Position: 1150

SQL:
–HINT DISTRIBUTE_ON_KEY(stratum_1)
CREATE TEMP TABLE s_ta_1900

AS
SELECT

So, I will like to restart from Analysis 1425.

Thanks

As far as I know you cannot restart, to remove the ones that can’t be completed do the following

allAnalyses <- Achilles::getAnalysisDetails()
elements_rem=c(1425,1900)
analysesToInclude <- allAnalyses[!allAnalyses %in% elements_rem]

Then pass achilles the analyses to include (just add analysisIds=analysesToInclude as a new parameter to your achilles call) and launch it again
In any case, be sure the errors cannot be solved, because usually may have further implications later on

To solve those errors I got I just added the respective columns to the needed tables with null values. This is happening because my cdm db was done for a previous version of omap. So, I always want to include all my Analysis. Well, hope my suggestion will be implemented. But, from your code, is it possible to rerun with something like

allAnalyses ← Achilles::getAnalysisDetails()
elements_rem=c( 0:1424)
analysesToInclude ← allAnalyses[!allAnalyses %in% elements_rem]

So I wonder if this “strategy” will respect the previous good results for any analysis with id < 1425

Thanks

Actually the test that work generate a temporal table that is not actually deleted, so if you are ok with no regenerating these you could only run those 2 tests and in principle as you have all the partial tables created already the process should end correctly this time

After waiting for nearly 2.5 weeks Achilles showed the next message:

Executing SQL took 0.0464 secs
2022-05-27 09:30:45 [Main Analysis] [COMPLETE] 2201 (0.047198 secs)
2022-05-27 09:31:34 Merging scratch Achilles tables
|=================================== | 50%2022-05-27 09:31:35 Merging scratch Achilles tables [ERROR] (Error in FALSE: Error executing SQL:
org.postgresql.util.PSQLException: ERROR: relation “s_ta_1425” does not exist
Position: 94226
An error report has been created at //errorReportSql.txt
)
2022-05-27 09:31:35 Done. Achilles results can now be found in schema atr2003_2020
Connecting using PostgreSQL driver
| | 0%Error: Error executing SQL:
org.postgresql.util.PSQLException: ERROR: relation “atr2003_2020.achilles_results” does not exist
An error report has been created at //errorReportSql.txt
Run rlang::last_error() to see where the error occurred.

and there is no other reference to a failed Analysis but for 1425,1900. Since I made the db corrections, is it going to be enough to run:

allAnalyses ← Achilles::getAnalysisDetails()
elements_inc=c(1425,1900)
analysesToInclude ← allAnalyses[allAnalyses %in% elements_inc]

achilles(connectionDetails,
cdmDatabaseSchema = “mdcr2003_2020”,
resultsDatabaseSchema = “atr2003_2020”,
scratchDatabaseSchema = “achilles_scratch”,
vocabDatabaseSchema = “omop_20220331”,
numThreads = 1,
sourceName = “src_mdcr2003_2020”,
cdmVersion = “5.3”,
runHeel = FALSE,
runCostAnalysis = FALSE,
tempAchillesPrefix = ‘ta’,
optimizeAtlasCache = TRUE,
createIndices = TRUE,
analysisIds=analysesToInclude )

Are analysis independent among them? I mean the result of one does not affect any other?
How can I do to ensure that the temporal results will not be deleted and Achilles will end happily this time?

I was reading Running Achilles on Your CDM • Achilles and it is not clear what will happen with all the results if I just run the previous cmds.

Thanks

Executing Achilles with default settings will delete previous results. Given your situation I would suggest running Achilles::listMissingAnalyses and Achilles::runMissingAnalyses. This would allow you to see which analyses have not yet completed and run only those analyses.

I would first recommend that you install the latest Achilles code from the main branch on github. I’m recommending this because the code you listed appears out of date. runHeel and runCostAnalysis are no longer parameters to the function with the latest updates.

Running Achilles should take no more than a few hours to complete. If it is taking longer than that then I suggest we discuss your environment in more detail to understand what is causing performance issues.

Another approach would be to review the log for performance timings of each analysis or review performance with the performance report in ARES as shown in the screenshot below.

I tried:

Achilles::runMissingAnalyses(
connectionDetails = connectionDetails,
cdmDatabaseSchema = “mdcr2003_2020”,
resultsDatabaseSchema = “atr2003_2020”,
outputFolder = “/output”,
vocabDatabaseSchema = “omop_20220331”,
runCostAnalysis=FALSE
)
Error in Achilles::runMissingAnalyses(connectionDetails = connectionDetails, :
unused argument (runCostAnalysis = FALSE)

and also tried:

Achilles::runMissingAnalyses(
connectionDetails = connectionDetails,
cdmDatabaseSchema = “mdcr2003_2020”,
resultsDatabaseSchema = “atr2003_2020”,
outputFolder = “/output”,
vocabDatabaseSchema = “omop_20220331”
)
Error in Achilles::runMissingAnalyses(connectionDetails = connectionDetails, :
object ‘runCostAnalysis’ not found

So, I do not know how to proceed…
Any ideas?

Thanks

@jcabrerazuniga

Thanks for bringing this to our attention. An issue has been opened here:

This will be addressed sometime next week. We’ll update this thread when the fix is merged into master.

This error happened while trying to use the latest version of Achilles against the temp results provided by an older version (like 4 months older). I decided to restart the whole process -which is still running- using the newest version suspecting that there might be some incompatibilities.

Thanks anyway
Jose

t