OHDSI Home | Forums | Wiki | Github

Achilles - Introducing new functionality

Hello Achilles community,

This note is to explain new functionality recently implemented into Achilles. Please note, no default parameter values have been changed; if you run Achilles::achilles() as you’ve always run it, you will experience the same results as before. This note serves to explain the new features and how to execute them if you choose to.

Summary of new features/changes:

  • A new parameter, updateGivenAnalysesOnly, was added to the achilles function to enable a user to update existing analyses without deleting prior analyses or insert new analyses without deleting prior analyses.
  • A new function, runMissingAnalyses was added to the package to enable a user to find and run only missing analyses without deleting prior analyses.
  • A new function, listMissingAnalyses was added to the package to enable a user to find the analyses that are available to be run, but do not exist in your data.

Details

New Feature: Achilles::listMissingAnalyses(connectionDetails,resultsDatabaseSchema)

Find the analyses that are available but exist in neither achilles_results nor achilles_results_dist

How it works

missingAnalyses <- Achilles::listMissingAnalyses(connectionDetails,resultsDatabaseSchema)

# missingAnalyses is a data frame with the following columns to provide additional detail
# about the missing analyses
colnames(missingAnalyses)
[1] "ANALYSIS_ID"   "DISTRIBUTION"  "COST"          "CATEGORY"      "IS_DEFAULT"    "ANALYSIS_NAME"

# If you simply want a list of missing IDs, using the following
missingAnalyses$ANALYSIS_ID 

New Feature: Achilles::runMissingAnalyses(...)

It may also be the case that many new analyses were added since you last pulled and ran achilles, but you do not know which ones and would prefer not to inspect a potentially long list. In this case, you can execute the function runMissingAnalyses to find all the analyses you are missing and to run only those analyses without deleting prior data.

How it works

Achilles::runMissingAnalyses(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = cdmDatabaseSchema,
  resultsDatabaseSchema = resultsDatabaseSchema,
  outputFolder = "/tmp"
)

New Feature: Achilles::achilles(..., updateGivenAnalysesOnly = TRUE)

By default, when running achilles and using the analysisIds parameter, all previous results will be deleted and only the specified analyses will be inserted into achilles_results and/or achilles_results_dist. An enhancement to this behavior is to optionally update only the analyses specified, rather than remove all prior analyses. This is particularly useful when working with very large datasets that require substantial time and resources to run the analyses.

How it works

A new parameter, updateGivenAnalysesOnly, has been introduced into the achilles function to enable the updating of only the specified analyses given by analysisIds, while preserving previous results for the analyses not specified. To support this enhancement while ensuring the current default behavior of achilles is not changed, three conditions must be satisfied when calling achilles to invoke the new behavior:

  1. The parameter analysisIds must be specified and non-empty
  2. The parameter createTable must be FALSE
  3. The parameter updateGivenAnalysesOnly must be TRUE

Unless all three conditions are met, achilles will run with the default behavior. By default, createTable is TRUE and updateGivenAnalysesOnly is FALSE, so the new functionality cannot be triggered accidentally.

EXAMPLE

You expect changes to your VISIT_OCCURRENCE table and would like to re-run only analysis 213, without deleting prior results

Achilles::achilles(
  connectionDetails      = connectionDetails,
  cdmDatabaseSchema      = cdmDatabaseSchema,
  resultsDatabaseSchema  = resultsDatabaseSchema,
  outputFolder           = "/tmp",
  analysisIds            = c(213),
  updateGivenAnalysesOnly = T,
  createTable             = F
)

NB: In the example above, if results for analysis 213 exist, they are deleted and recomputed. If results for analysis 213 do not exist, they are still computed. Therefore, this approach can also be used as a way to run new analyses that you know are missing, without deleting all previous results.

Finally, when testing the new functionality, keep in mind that specified analyses will run even if no data are found. Therefore, if you inspect the achilles log and see an analysis was run, but do not see the corresponding data in ether achilles_results or achilles_results_dist, that means the query was not satisfied by your cdm. So, it is possible to still have “missing analyses” even after you call runMissingAnalyses, simply because of the nature of your data.

Any and all feedback is welcome. Feel free to respond at: Discussions · OHDSI/Achilles · GitHub

2 Likes

Thanks @AnthonyMolinaro this is great! This will allow for quick analysis runs rather than having to re-run from scratch.

I tried:

Achilles::runMissingAnalyses(
connectionDetails,
cdmDatabaseSchema = “mdcr2003_2020”,
resultsDatabaseSchema = “atr2003_2020”,
scratchDatabaseSchema = “achilles_scratch”,
vocabDatabaseSchema = “omop_20220331”,
outputFolder = “output”,
defaultAnalysesOnly = TRUE
)
Error in Achilles::runMissingAnalyses(connectionDetails, cdmDatabaseSchema = “mdcr2003_2020”, :
object ‘runCostAnalysis’ not found

how runCostAnalysis is specified now?

t