Capr error and Phenotype Feb idea (cohorts by code; may be faster)

Vojtech_Huser · February 14, 2025, 5:37pm

I am trying to learn Capr package and have it help with phenotype February.
My goal is to create R code that creates the cohort that is exactly the same as cohort defined via GUI in Atlas.

So at the end, I will compare the json to be identical (or equivalent).

cohort_by_coding “equals” cohort_by_gui
(either on SQL level or JSON level or any other means)

I want to recreate drug based cohorts in one gideline study (led by Kevin) (but it would apply in general to any study)

infliximab	https://atlas-demo.ohdsi.org/#/cohortdefinition/1792089

adalimumab	https://atlas-demo.ohdsi.org/#/cohortdefinition/1792086

golimumab	https://atlas-demo.ohdsi.org/#/cohortdefinition/1792085

vedolizumab	https://atlas-demo.ohdsi.org/#/cohortdefinition/1792094

tofacitinib	https://atlas-demo.ohdsi.org/#/cohortdefinition/1792088

ustekinumab	https://atlas-demo.ohdsi.org/#/cohortdefinition/1792087

methotrexate	https://atlas-demo.ohdsi.org/#/cohortdefinition/1792092

natalizumab	https://atlas-demo.ohdsi.org/#/cohortdefinition/1792098

certolizumab pegol	https://atlas-demo.ohdsi.org/#/cohortdefinition/1792090

My first victim is the first cohort on the list.
https://atlas-demo.ohdsi.org/#/cohortdefinition/1792089/conceptsets
937368 infliximab

I am running into error in R. Any help is appreciated.

Capr::as.jason seem to be the key problem. I have not looked at the R code of that problem method and opted for community help.

library(Capr)
ch <- cohort(
  entry = entry(
    drugExposure(cs(descendants(937368),name='infliximab'))
  ),
  exit = exit(endStrategy = observationExit())
)
#gives error
cohort_by_coding <-  Capr::as.json(ch)
#Error: unable to find an inherited method for function ‘as.json’ for signature ‘x = "Cohort"’
#also give same error


#other later code (planned)

assertCohortCompiles(ch)
library(CirceR)
sessioninfo::package_info()
#cohort_by_gui = Fetch via API, for now, manually
sql_by_coding <- CirceR::buildCohortQuery(
  expression = CirceR::cohortExpressionFromJson(ch),
  options = CirceR::createGenerateOptions(generateStats = FALSE)
)

connectionDetails <- Eunomia::getEunomiaConnectionDetails()
cohortsToCreate <- tibble::tibble(cohortId = 1,cohortName = "one",sql = sql)
cohortTableNames <- CohortGenerator::getCohortTableNames(cohortTable = "my_cohort_table")
CohortGenerator::createCohortTables(
  connectionDetails = connectionDetails,
  cohortDatabaseSchema = "main",
  cohortTableNames = cohortTableNames
)
# Generate the cohorts
cohortsGenerated <- CohortGenerator::generateCohortSet(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = "main",
  cohortDatabaseSchema = "main",
  cohortTableNames = cohortTableNames,
  cohortDefinitionSet = cohortsToCreate
)

@mdlavallee92 @Adam_Black
I need to read more the documentation to also replicate other parts of the cohort (e.g., cohort exist as end of continuous drug exposure)

I also anticipate the equivalence evaluation will not be easy (perhaps do it on R list level (and not as JSON).
The SQL generated is quite long and comparison will be tedious.

edburn · February 14, 2025, 8:27pm

Hi @Vojtech_Huser, to chip in with another option along with capr and atlas, perhaps you would also like to kick the tires with the CohortConstructor package? Am on the search for user feedback for the early versions, so maybe trying to replicate some phenotype feb cohorts would be a good opportunity! This vignette explains more about the package Introduction • CohortConstructor and this one about benchmarking with other tools CohortConstructor benchmarking results • CohortConstructor

Below is some code that would give an example of getting codes and making cohorts you mentioned all at once (although in the example data all the cohort counts are zero)

library(DBI, warn.conflicts = FALSE)
#> Warning: package 'DBI' was built under R version 4.4.1
library(CDMConnector, warn.conflicts = FALSE)
#> Warning: package 'CDMConnector' was built under R version 4.4.2
library(CodelistGenerator, warn.conflicts = FALSE)
#> Warning: package 'CodelistGenerator' was built under R version 4.4.2
library(CohortConstructor, warn.conflicts = FALSE)
con <- dbConnect(duckdb::duckdb(), 
              dbdir = eunomiaDir(datasetName = "synthea-covid19-10k"))
cdm <- cdmFromCon(con = con, 
                  cdmSchema = "main", 
                  writeSchema = "main", 
                  cdmName = "Eunomia")
#> Note: method with signature 'DBIConnection#Id' chosen for function 'dbExistsTable',
#>  target signature 'duckdb_connection#Id'.
#>  "duckdb_connection#ANY" would also be valid
drug_codes <- getDrugIngredientCodes(cdm, 
                                     name = c("infliximab",
                                              "adalimumab",
                                              "golimumab",
                                              "vedolizumab",
                                              "tofacitinib",
                                              "ustekinumab",
                                              "methotrexate",
                                              "natalizumab",
                                              "certolizumab pegol"))
cdm$drug_cohorts <- conceptCohort(cdm = cdm, 
                                  conceptSet = drug_codes, 
                                  name = "drug_cohorts", 
                                  exit = "event_end_date") |> 
  collapseCohorts(gap = 30)
#> ✖ Domain regimen (119 concepts) excluded because it is not supported.
#> ℹ Subsetting table drug_exposure using 7713 concepts with domain: drug.
#> ℹ No cohort entries found, returning empty cohort table.
cohortCount(cdm$drug_cohorts)
#> # A tibble: 9 × 3
#>   cohort_definition_id number_records number_subjects
#>                  <int>          <int>           <int>
#> 1                    1              0               0
#> 2                    2              0               0
#> 3                    3              0               0
#> 4                    4              0               0
#> 5                    5              0               0
#> 6                    6              0               0
#> 7                    7              0               0
#> 8                    8              0               0
#> 9                    9              0               0

^{Created on 2025-02-14 with reprex v2.1.0}

katy-sadowski · February 15, 2025, 1:51am

Hi @Vojtech_Huser, this looks like a known bug in Capr. In the interim til it’s fixed, you can use this code snippet from the Capr docs to convert your Capr cohort to Circe json:

cohortJson <- ch |>
  toCirce() |>
  jsonlite::toJSON(pretty = TRUE, auto_unbox = TRUE) |>
  as.character()

I tested it on your code and it worked

Adam_Black · February 15, 2025, 3:04pm

@Vojtech_Huser Maybe this will help.

library(Capr)

# write a template function that parameterizes the things that change
drugCohortTemplate <- function(drugConceptSet, persistanceWindow = 30, daysSupplyOverride = NULL) {
  
  # return the cohort
  cohort(
    entry = entry(drugExposure(drugConceptSet),
                  primaryCriteriaLimit = "All",
                  qualifiedLimit = "All"),
    exit = exit(
      drugExit(drugConceptSet,
               persistenceWindow = persistanceWindow,
               surveillanceWindow = 0L, 
               daysSupplyOverride = daysSupplyOverride)
    )
  )
}


# repeat this for each cohort 
drugCohortTemplate(
  drugConceptSet = cs(descendants(19041065), name = "golimumab")
) |>
  writeCohort("golimumab.json")

You might find a better way to write the code but I think the example above give the idea of how to use capr to write all these cohort json files with a small amount of code. You could for example loop over the parameters.

Note that the json this Capr code produces is missing the details in the concept sets. If you want to fill those in you can use Capr::getConceptSetDetails() and pass in a connection to a CDM and the concept set object created with cs(descendants(19041065), name = "golimumab").

Vojtech_Huser · May 7, 2025, 4:30pm

Thank you all for replies

I am further working on Capr for study a thon of the industry WG.

There is related thread here: Phenotype Phebruary - Capr style!

and relevant repo: GitHub - ohdsi-studies/PhenotypePhebruaryCapr: Repo to store Capr implementations of all cohort definitions created during Phenotype Phebruary

I am finding that when I try generate SQL using CirceR and feed it with json from Capr - I am getting an error.

https://rpubs.com/vojtech_huser/1306691

There was also recent discussion in the Technology Advisory Group around JSON role (schema) (ask @Frank for more or wait few weeks)

katy-sadowski · May 7, 2025, 11:43pm

Hi @Vojtech_Huser - you missed as.character() for the Capr json

Also I think you can use Coerce Capr object to json — as.json • Capr instead of the couple of lines in your example script.