Error seen running CohortDiagnostics::executeDiagnostics for "index_event_breakdown" table

greshje.gmail · August 10, 2023, 10:07pm

I’m running executeDiagnostics following the instructions in the SOS video and associated slides.

The code below is giving the error shown.

The error seems to be related to a table called “index_event_breakdown”. I’m not sure where this table is supposed to exist. I’m not seeing it in my CDM instance or represented by any of the output files.

The full script, output, error log, and output dir (as zip file renamed to .txt) are attached.
01-phenotype-evaluation.R.txt (3.3 KB)
ouput.txt (9.7 KB)
errorReportR.txt (1.4 KB)
output.zip.txt (25.2 KB)

Any thoughts @anthonysena, @jpegilbert?

Code and Error:

> library(CohortDiagnostics)
> 
> executeDiagnostics(
+   cohortDefinitionSet = cohortDefinitionSet,
+   connectionDetails = connectionDetails,
+   cdmDatabaseSchema = cdmDatabaseSchema,
+   cohortDatabaseSchema = cohortDatabaseSchema,
+   cohortTableNames = cohortTableNames,
+   exportFolder = file.path(dataFolder,databaseId),
+   databaseId = databaseId,
+   incremental = TRUE,
+   incrementalFolder = incrementalFolder,
+   minCellCount = 5,
+   runInclusionStatistics = TRUE,
+   runIncludedSourceConcepts = TRUE,
+   runOrphanConcepts = TRUE,
+   runTimeSeries = TRUE,
+   runVisitContext = TRUE,
+   runBreakdownIndexEvents = TRUE,
+   runIncidenceRate = TRUE,
+   runCohortRelationship = TRUE,
+   runTemporalCohortCharacterization = TRUE
+ )
Run Cohort Diagnostics started at 2023-08-10 15:52:36.41267
 - Databasename was not provided. Using CDM source table
 - Databasedescription was not provided. Using CDM source table
Created folder at D:\_YES_2023-05-28\workspace\SosExamples\_COVID\04-phenotype-evaluation\output\demo_db\demo_cdm
The following fields found in the cohortDefinitionSet will be exported in JSON format as part of metadata field of cohort table:                                  
    atlasId,
    generateStats,
    logicDescription
 - Unexpected fields found in table cohort - atlasId, logicDescription, generateStats. These fields will be ignored.                                              
Connecting using Spark JDBC driver                                                                                                                                
Saving database metadata                                                                                                     
Saving database metadata took 0.0716 secs                                                                                                                         
Counting cohort records and subjects
Counting cohorts took 0.38 secs
- Censoring 0 values (0%) from cohortEntries because value below minimum                                                                                          
- Censoring 0 values (0%) from cohortSubjects because value below minimum
Found 0 of 2 (0.00%) submitted cohorts instantiated. Beginning cohort diagnostics for instantiated cohorts.                  
Fetching inclusion statistics from files
Exporting cohort concept sets to csv                                                                                         
 - Unexpected fields found in table concept_sets - databaseId. These fields will be ignored.                                                                      
Starting concept set diagnostics                                                                                             
Instantiating concept sets
  |=====================================================================================================================| 100%
Creating internal concept counts table                                                                                       
  |=====================================================================================================================| 100%
Executing SQL took 5.55 secs
Fetching included source concepts                                                                                            
  |=====================================================================================================================| 100%
Executing SQL took 5.69 secs
Finding source codes took 9.94 secs                                                                                                                               
Breaking down index events                                                                                                   
- Breaking down index events for cohort 'Not Homeless (Draft 1)'                                                                                                  
- Breaking down index events for cohort 'Homeless (Draft 1)'
An error report has been created at  D:\_YES_2023-05-28\workspace\SosExamples\_COVID\04-phenotype-evaluation\output\demo_db\demo_cdm/errorReportR.txt             
Error in makeDataExportable(x = data, tableName = "index_event_breakdown",  : 
   - Cannot find required field index_event_breakdown - conceptId, conceptCount, subjectCount, cohortId, domainField, domainTable.
In addition: Warning messages:
1: Unknown or uninitialised column: `isSubset`. 
2: Unknown or uninitialised column: `isSubset`. 
3: Unknown or uninitialised column: `isSubset`. 
4: Unknown or uninitialised column: `isSubset`. 
5: There were 2 warnings in `dplyr::summarise()`.
The first warning was:
ℹ In argument: `conceptCount = max(.data$conceptCount)`.
Caused by warning in `max()`:
! no non-missing arguments to max; returning -Inf
ℹ Run dplyr::last_dplyr_warnings() to see the 1 remaining warning. 
6: Unknown or uninitialised column: `isSubset`. 
7: In eval(expr) :
  No primary event criteria concept sets found for cohort id: 4
8: Unknown or uninitialised column: `isSubset`. 
9: In eval(expr) :
  No primary event criteria concept sets found for cohort id: 5
Error in exists(cacheKey, where = .rs.WorkingDataEnv, inherits = FALSE) : 
  invalid first argument
Error in assign(cacheKey, frame, .rs.CachedDataEnv) : 
  attempt to use zero-length variable name
>

jpegilbert · August 11, 2023, 6:02pm

@greshje.gmail I don’t see any obvious errors with your code here, which version of CohortDiagnostics are you running? I assume the latest released version?

Have you tried running with “runBreakdownIndexEvents = FALSE”. I’ve also got very little experience running CD on a databricks platform so it could be something related to this.

greshje.gmail · August 12, 2023, 3:16am

Thanks @jpegilbert!

Running with “runBreakdownIndexEvents = FALSE” seems to have done the trick.

The script I was able to run successfully and the output are attached.
01-phenotype-evaluation.R.txt (3.6 KB)
ouput.txt (16.7 KB)

However, I did get these warnings:

Warning messages:
1: In getCdmDataSourceInformation(connection = connection, cdmDatabaseSchema = cdmDatabaseSchema) :
  CDM Source table does not have any records. Metadata on CDM source will be limited.
2: Unknown or uninitialised column: `isSubset`. 
3: Unknown or uninitialised column: `isSubset`. 
4: Unknown or uninitialised column: `isSubset`. 
5: Unknown or uninitialised column: `isSubset`. 
6: There were 2 warnings in `dplyr::summarise()`.
The first warning was:
ℹ In argument: `conceptCount = max(conceptCount)`.
Caused by warning in `max()`:
! no non-missing arguments to max; returning -Inf
ℹ Run dplyr::last_dplyr_warnings() to see the 1 remaining warning. 
> dplyr::last_dplyr_warnings()
[[1]]
<warning/rlang_warning>
Warning in `dplyr::summarise()`:
ℹ In argument: `conceptCount = max(conceptCount)`.
Caused by warning in `max()`:
! no non-missing arguments to max; returning -Inf
---
Backtrace:
    ▆
 1. ├─CohortDiagnostics::executeDiagnostics(...)
 2. │ ├─CohortDiagnostics:::timeExecution(...)
 3. │ │ └─base::eval(expr)
 4. │ └─CohortDiagnostics:::runConceptSetDiagnostics(...)
 5. │   └─... %>% dplyr::ungroup()
 6. ├─dplyr::ungroup(.)
 7. ├─dplyr::summarise(., conceptCount = max(conceptCount), conceptSubjects = max(conceptSubjects))
 8. └─dplyr:::summarise.grouped_df(...)

[[2]]
<warning/rlang_warning>
Warning in `dplyr::summarise()`:
ℹ In argument: `conceptSubjects = max(conceptSubjects)`.
Caused by warning in `max()`:
! no non-missing arguments to max; returning -Inf
---
Backtrace:
    ▆
 1. ├─CohortDiagnostics::executeDiagnostics(...)
 2. │ ├─CohortDiagnostics:::timeExecution(...)
 3. │ │ └─base::eval(expr)
 4. │ └─CohortDiagnostics:::runConceptSetDiagnostics(...)
 5. │   └─... %>% dplyr::ungroup()
 6. ├─dplyr::ungroup(.)
 7. ├─dplyr::summarise(., conceptCount = max(conceptCount), conceptSubjects = max(conceptSubjects))
 8. └─dplyr:::summarise.grouped_df(...)

>

greshje.gmail · August 12, 2023, 4:12am

However…

When I try to run launchDiagnosticsExplorer I get the error shown below. My SQLite file is attached.
SOS_STUDY_COVID_HOMELESS.sqlite.txt (580 KB)

> launchDiagnosticsExplorer(sqliteDbPath = "D:\\_YES_2023-05-28\\workspace\\SosExamples\\_COVID\\04-phenotype-evaluation\\output\\covid_ohdsi\\SOS_STUDY_COVID_HOMELESS.sqlite")
Loading required package: shiny
Connecting using SQLite driver
Error in UseMethod("arrange") : 
  no applicable method for 'arrange' applied to an object of class "NULL"
>

Maybe this is a character-set/windows issue??? It looks like there are a lot of non-printing characters in the file. From Notepad++:

nadavrap · September 18, 2023, 8:59am

@greshje.gmail , were you able to solve that? I have the same issue.