Hello Achilles community,
This note is to explain new functionality recently implemented into Achilles. Please note, no default parameter values have been changed; if you run Achilles::achilles()
as you’ve always run it, you will experience the same results as before. This note serves to explain the new features and how to execute them if you choose to.
Summary of new features/changes:
- A new parameter,
updateGivenAnalysesOnly
, was added to theachilles
function to enable a user to update existing analyses without deleting prior analyses or insert new analyses without deleting prior analyses. - A new function,
runMissingAnalyses
was added to the package to enable a user to find and run only missing analyses without deleting prior analyses. - A new function,
listMissingAnalyses
was added to the package to enable a user to find the analyses that are available to be run, but do not exist in your data.
Details
New Feature: Achilles::listMissingAnalyses(connectionDetails,resultsDatabaseSchema)
Find the analyses that are available but exist in neither achilles_results
nor achilles_results_dist
How it works
missingAnalyses <- Achilles::listMissingAnalyses(connectionDetails,resultsDatabaseSchema)
# missingAnalyses is a data frame with the following columns to provide additional detail
# about the missing analyses
colnames(missingAnalyses)
[1] "ANALYSIS_ID" "DISTRIBUTION" "COST" "CATEGORY" "IS_DEFAULT" "ANALYSIS_NAME"
# If you simply want a list of missing IDs, using the following
missingAnalyses$ANALYSIS_ID
New Feature: Achilles::runMissingAnalyses(...)
It may also be the case that many new analyses were added since you last pulled and ran achilles, but you do not know which ones and would prefer not to inspect a potentially long list. In this case, you can execute the function runMissingAnalyses
to find all the analyses you are missing and to run only those analyses without deleting prior data.
How it works
Achilles::runMissingAnalyses(
connectionDetails = connectionDetails,
cdmDatabaseSchema = cdmDatabaseSchema,
resultsDatabaseSchema = resultsDatabaseSchema,
outputFolder = "/tmp"
)
New Feature: Achilles::achilles(..., updateGivenAnalysesOnly = TRUE)
By default, when running achilles
and using the analysisIds
parameter, all previous results will be deleted and only the specified analyses will be inserted into achilles_results
and/or achilles_results_dist
. An enhancement to this behavior is to optionally update only the analyses specified, rather than remove all prior analyses. This is particularly useful when working with very large datasets that require substantial time and resources to run the analyses.
How it works
A new parameter, updateGivenAnalysesOnly
, has been introduced into the achilles
function to enable the updating of only the specified analyses given by analysisIds
, while preserving previous results for the analyses not specified. To support this enhancement while ensuring the current default behavior of achilles
is not changed, three conditions must be satisfied when calling achilles
to invoke the new behavior:
- The parameter
analysisIds
must be specified and non-empty - The parameter
createTable
must be FALSE - The parameter
updateGivenAnalysesOnly
must be TRUE
Unless all three conditions are met, achilles
will run with the default behavior. By default, createTable
is TRUE and updateGivenAnalysesOnly
is FALSE, so the new functionality cannot be triggered accidentally.
EXAMPLE
You expect changes to your VISIT_OCCURRENCE table and would like to re-run only analysis 213, without deleting prior results
Achilles::achilles(
connectionDetails = connectionDetails,
cdmDatabaseSchema = cdmDatabaseSchema,
resultsDatabaseSchema = resultsDatabaseSchema,
outputFolder = "/tmp",
analysisIds = c(213),
updateGivenAnalysesOnly = T,
createTable = F
)
NB: In the example above, if results for analysis 213 exist, they are deleted and recomputed. If results for analysis 213 do not exist, they are still computed. Therefore, this approach can also be used as a way to run new analyses that you know are missing, without deleting all previous results.
Finally, when testing the new functionality, keep in mind that specified analyses will run even if no data are found. Therefore, if you inspect the achilles log and see an analysis was run, but do not see the corresponding data in ether achilles_results
or achilles_results_dist
, that means the query was not satisfied by your cdm. So, it is possible to still have “missing analyses” even after you call runMissingAnalyses
, simply because of the nature of your data.
Any and all feedback is welcome. Feel free to respond at: Discussions · OHDSI/Achilles · GitHub