Hi folks and Happy New Year! ![]()
Summary: I am seemingly encountering an unexpected behavior with CohortDiagnostics when ran on a phenotype definition I have written where I seem to receive results for concepts I did not specify in my definition. For discussion reference, here is the phenotype definition: Lung Cancer.
Click To See R sessionInfo Output
R version 4.4.3 (2025-02-28)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
LAPACK version 3.10.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C
[3] LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
[5] LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C.UTF-8
[9] LC_ADDRESS=C.UTF-8 LC_TELEPHONE=C.UTF-8
[11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C.UTF-8
time zone: localtime
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] CohortGenerator_0.12.2 R6_2.6.1
[3] CohortDiagnostics_3.4.2 FeatureExtraction_3.11.0
[5] Andromeda_1.1.1 dplyr_1.1.4
[7] DatabaseConnector_6.4.0
loaded via a namespace (and not attached):
[1] vctrs_0.6.5 cli_3.6.5 rlang_1.1.6
[4] DBI_1.2.3 purrr_1.1.0 renv_1.1.5
[7] generics_0.1.4 rJava_1.0-11 glue_1.8.0
[10] bit_4.6.0 tibble_3.3.0 fastmap_1.2.0
[13] lifecycle_1.0.4 memoise_2.0.1 duckdb_1.4.0
[16] compiler_4.4.3 SqlRender_1.19.3 RSQLite_2.4.3
[19] blob_1.2.4 pkgconfig_2.0.3 tidyr_1.3.1
[22] tidyselect_1.2.1 pillar_1.11.1 magrittr_2.0.4
[25] tools_4.4.3 bit64_4.6.0-1 cachem_1.1.0
Steps: What I do is this:
- Run
exportCohortDefinitionSetandgenerateCohortSetfromCohortGenerator– this works perfectly and the cohort gets generated - Generate cohort statistics via
executeDiagnosticsfromCohortDiagnostics– this works as expected on the cohort with no errors (for details on execution settings, see below) - Prepare statistics for Shiny viewer using
createMergedResultsFilefromCohortDiagnostics– works as expected and a SQLite DB is made - Review results file and find additional concepts included that are not originally seen within my ATLAS instance – very confused here
For 2, here is the configuration I had made:
Click To View executeDiagnostics Settings
executeDiagnostics(cohortDefinitionSet, connectionDetails = connectionDetails, cohortTable = cohortTable, cohortDatabaseSchema = cohortDatabaseSchema, cdmDatabaseSchema = cdmDatabaseSchema, exportFolder = exportFolder, databaseId = "Pharmetrics", databaseDescription = "Lab Database", minCellCount = 11, runInclusionStatistics = FALSE, runIncludedSourceConcepts = TRUE, runOrphanConcepts = FALSE, runTimeSeries = FALSE, runVisitContext = FALSE, runBreakdownIndexEvents = TRUE, runIncidenceRate = FALSE, runCohortRelationship = FALSE, runTemporalCohortCharacterization = FALSE, runFeatureExtractionOnSample = FALSE )
Problem: To give some additional context to this, here is a screenshot of what I mean:
On the left of the image is my phenotype definition and on the right is the CohortDiagnostics results viewer. On the right is the code, 35206086, which shows up in my results file but is explicitly not present in my ATLAS instance as shown on the left with “No Matching Records Found” in the Included Concepts tab for my definition. The ATLAS and CohortDiagnostics are produced from the exact same database.
The principle reason why I am worried about this is why are additional concepts seemingly being added in CohortDiagnostics? It makes me question if there is something amiss with my definition in general. My thoughts about what is going on here are as follows:
- Is there some “strict” setting I am not thinking of in
executeDiagnosticsto toggle? - Could it be some kind of strange underlying vocabulary issue that while I don’t see this code in ATLAS, it gets added into analyses by
executeDiagnostics? - Some maladaptive interface behavior I am not accounting for during
CohortGenerator::generateCohortSetbeing ingested by downstreamCohortDiagnosticsuse? - Some other cause?
I am happy to provide additional information but I am quite stumped by this. Any thoughts?
Thanks!
~ tcp ![]()
