How to Use CohortGenerator with CohortDiagnostics?

TheCedarPrince · January 5, 2022, 4:44pm

Hi @anthonysena , @schuemie , and @Gowtham_Rao

Anthony and Martijn, I love CohortGenerator!
You released it at the perfect time for my work and ongoing study!
Currently, I went through the tutorial which was very straightforward to do!
However, what I am having trouble with doing is how to take my generated cohort and use it in CohortDiagnostics.

In the following sections, I go through my process of creating cohorts via CohortGenerator, then my attempt at utilizing these cohorts in CohortDiagnostics, and then final concluding thoughts.

Process Using `CohortGenerator`

First, I load the libraries needed

library("CohortGenerator") # Used for generating cohort
library("DatabaseConnector") # Connecting to OMOP Database
library("ROhdsiWebApi") # Used for ATLAS connection

Then I grab my ATLAS cohort definition from the public ATLAS Demo instance:

ATLASurl <- "http://api.ohdsi.org:8080/WebAPI" 
cohortID <- c(1778180) 
cohortDefinitionSet <- ROhdsiWebApi::exportCohortDefinitionSet(baseUrl = ATLASurl, cohortIds = cohortID, generateStats = TRUE)

Utilizing my cohortDefinitionSet per CohortGenerator, I save my cohort appropriately based on the CohortGenerator tutorial:

saveCohortDefinitionSet(cohortDefinitionSet = cohortDefinitionSet,
                        settingsFolder = file.path("inst/settings"),
                        jsonFolder = file.path("inst/cohorts"),
                        sqlFolder = file.path("inst/sql/sql_server"))

Then, I connect to my database via DatabaseConnector.
The exact details redacted so just trust me when I say the connection does in fact work:

connectionDetails <- DatabaseConnector::createConnectionDetails(
  dbms = "postgresql",
  server = "server",
  user = "user",
  password = "password",
  port = 1234,
  pathToDriver = "utils"
)

Now using CohortGenerator, I can generate my cohorts appropriately:

cohortTableNames <- getCohortTableNames(cohortTable = "cg_example")

createCohortTables(connectionDetails = connectionDetails,
                   cohortTableNames = cohortTableNames,
                   cohortDatabaseSchema = "test")

generateCohortSet(connectionDetails= connectionDetails,
                  cdmDatabaseSchema = "schema",
                  cohortDatabaseSchema = "test",
                  cohortTableNames = cohortTableNames,
                  cohortDefinitionSet = cohortDefinitionSet)

Additionally, I can now build statistics utilizing CohortGenerator:

insertInclusionRuleNames(connectionDetails = connectionDetails,
                         cohortDefinitionSet = cohortDefinitionSet,
                         cohortDatabaseSchema = "test",
                         cohortInclusionTable = cohortTableNames$cohortInclusionTable)

exportCohortStatsTables(connectionDetails = connectionDetails,
                        cohortDatabaseSchema = "test",
                        cohortTableNames = cohortTableNames,
                        cohortStatisticsFolder = file.path("inst/InclusionStats"))

Now I am in theory ready to utilize CohortDiagnostics!

Process for Utilizing `CohortDiagnostics`

After reading through the documentation for CohortDiagnostics, I surmised that one could do this after generating a cohort with CohortGenerator:

library("CohortDiagnostics")

runCohortDiagnostics(
  packageName = "SUPREME-DM",
  cohortToCreateFile = "CohortsToCreate.csv",
  baseUrl = NULL,
  cohortSetReference = NULL,
  connectionDetails = connectionDetails,
  connection = NULL,
  cdmDatabaseSchema = "schema",
  oracleTempSchema = NULL,
  tempEmulationSchema = "tmp",
  cohortDatabaseSchema = "test",
  vocabularyDatabaseSchema = "schema",
  cohortTable = "stats_example",
  cohortIds = NULL,
  inclusionStatisticsFolder = NULL,
  exportFolder = "testdiagnostics",
  databaseId = "ID",
  databaseName = NULL,
  databaseDescription = NULL,
  cdmVersion = 5,
  runInclusionStatistics = FALSE,
  runIncludedSourceConcepts = TRUE,
  runOrphanConcepts = TRUE,
  runTimeDistributions = TRUE,
  runVisitContext = TRUE,
  runBreakdownIndexEvents = TRUE,
  runIncidenceRate = TRUE,
  runTimeSeries = FALSE,
  runCohortOverlap =  FALSE,
  runCohortCharacterization = TRUE,
  covariateSettings = createDefaultCovariateSettings(),
  runTemporalCohortCharacterization = TRUE,
  temporalCovariateSettings = createTemporalCovariateSettings(useConditionOccurrence =
    TRUE, useDrugEraStart = TRUE, useProcedureOccurrence = TRUE, useMeasurement = TRUE,
    temporalStartDays = c(-365, -30, 0, 1, 31), temporalEndDays = c(-31, -1, 0, 30, 365)),
  minCellCount = 5,
  incremental = FALSE,
  incrementalFolder = NULL
)

Yet when I run this, I get the following error that I do not know how to debug:

Run Cohort Diagnostics started at 2022-01-05 11:42:40
Created folder at testdiagnostics
Error: '' does not exist in current working directory ('/Projects/
PhenoEx/demos/ohdsi-r').

I do not know what this inscrutable error means and I have spent several hours trying to debug it.

Conclusion

I think CohortGenerator could honestly become the de facto tool used in the HADES environment to provide a standard interface to the rest of the HADES ecosystem.
I just need a bit of help to satisfy the interface using it to the other packages.
Any thoughts or help here would be much appreciated!

Thanks!

~ tcp

P.S. Here are my package versions:

                            Package Version
Andromeda                 Andromeda   0.5.0
CohortDiagnostics CohortDiagnostics   2.1.4
CohortGenerator     CohortGenerator   0.2.0
DatabaseConnector DatabaseConnector   5.0.0
FeatureExtraction FeatureExtraction   3.2.0
rJava                         rJava   1.0-6
rlang                         rlang  0.4.12
ROhdsiWebApi           ROhdsiWebApi   1.3.0
SqlRender                 SqlRender   1.8.1

schuemie · January 7, 2022, 7:35am

Hi @TheCedarPrince ! Thanks for your enthusiasm. You are right that right now CohortDiagnostics does play nicely with CohortGenerator. We’re furiously working on a release that is specifically focused on this. (You can take a peek at the CohortGeneratorDependency branch until then).

TheCedarPrince · January 7, 2022, 2:14pm

This is wonderful to hear!
@schuemie , I took a look at the branch and read through some of the Rmd documentation and I would be happy to assist if possible in the development.
Happy to chat further!
Thanks!

How to Use CohortGenerator with CohortDiagnostics?

Process Using CohortGenerator

Process for Utilizing CohortDiagnostics

Conclusion

Process Using `CohortGenerator`

Process for Utilizing `CohortDiagnostics`