OHDSI Home | Forums | Wiki | Github

How to do quick feasibility counts from R given public server cohort definition URL link

I would like to propose some semi-standard way how to run a feasibility cohort count with 3 or 4 OHDSI partner sites. (nothing perfect, just some code snippet perhaps)

The latest study (Raloxifene…) has a split into runFeasiblity and executeStudy which is very nice way to split it.

If I have 4 cohort definitions – supplied as 4 ids on the public server with cohort definitions (e.g., here
http://www.ohdsi.org/web/atlas/#/cohortdefinition/152651 )

What is a good R code snipped that can

  1. Fetch the cohort definition from the server (using my connectionDetails dbms parameter to pick the right sql flavor)
  2. Populate the parameters of the query (e.g., use the default cohort table) (see code below)
  3. Produce counts for my 4 cohorts (or simply list all my cohorts at the moment) (my draft code does that)

What I have for 2 and 3 so far is below signature. But with step 1 (I need help). We almost had it working at the tutorial during symposium (with help from Marc) – but it never worked fully and now the routine is probably even more standardized. @msuchard @schuemie
My code gets the cohort from a file on hdd. (does not do step 1)

library(DatabaseConnector);library(SqlRender)
connection<-connect(joseph)
tsql<-readLines('c:/b/coh01.sql');sql<-paste(tsql,collapse="\n")
cat(sql)
cdmDatabaseSchema='OHDSI.dbo'
targetDatabaseSchema='OHDSI.dbo'
targetCohortTable='cohort'
target_cohort_id = '9904'


sql <- SqlRender::renderSql(sql,
                            cdm_database_schema = cdmDatabaseSchema,
                            target_database_schema = targetDatabaseSchema,
                            target_cohort_table = targetCohortTable,
                            target_cohort_id = target_cohort_id)$sql
sql <- SqlRender::translateSql(sql, targetDialect = attr(connection, "dbms"))$sql
res <- DatabaseConnector::executeSql(connection, sql)

#execute other queries here

#obtain counts

sql2<-sprintf('SELECT cohort_definition_id, COUNT(*) AS count FROM %s.%s GROUP BY cohort_definition_id'
              ,targetDatabaseSchema,targetCohortTable)
#skipping translation (may not be safe for all) (cutting corner here)
counts<-DatabaseConnector::querySql(connection, sql2)
counts

Assume that the side does not have Atlas and WebAPI working (and for step 1 can simply use the public API (to use public (OHDSI network infrustructure) to convert JSON cohort definition to SQL cohort definition)

I’m working on streamlining this, but here is the process I currently have for generating cohort counts including statistics per inclusion rule (like ATLAS):

Step 1: Create a package

Step 2: Create a folder in the package called inst/settings

Step 3: Inside that folder, create a CSV file with a format like this. The cohortId is the ID that you may want to use inside your study. atlasId is the ID on ATLAS (in your example 152651), and name is used to generate the file name (so preferably no spaces etc.).

Step 4: Run this code: (note: I like to keep this code with the package)

OhdsiRTools::insertCohortDefinitionSetInPackage(fileName = "CohortsToCreate.csv",
                                                baseUrl = "http://api.ohdsi.org/WebAPI",
                                                insertTableSql = TRUE,
                                                insertCohortCreationR = TRUE,
                                                generateStats = TRUE,
                                                packageName = "Test")

where fileName is the name of your csv file, and packageName is the name of your package. Set generateStats to FALSE if you have no inclusion rules in your cohort definitions to avoid an error later.

This command will automatically create the following artifacts inside your package:

  • A folder inst/cohorts with the JSON definitions for each cohort
  • A folder inst/sql/sql_server with SQL for creating a cohort table and populating it with the cohorts
  • A file called R/CreateCohorts.R containing a function called .createCohorts().

You can call the .createCohorts() inside your own code to generate all cohort. The counts will automatically be generated and written to the output folder you specify.

Alternatively, you can also use the insertCohortDefinitionInPackage to insert the cohort definitions in your package one by one (set baseUrl to http://api.ohdsi.org/WebAPI).

2 Likes

@schuemie, this is great guidance and great R supporting infrustructure.
I was able to use it on a draft package
If there are updates (changes; improvements) to this routine, please post them here or share them via email.

we have been working on solving a similar problem as a part of ARACHNE platform, focusing on creating a pattern of how we can consistently exchange analysis code between multiple collaborating sites.

In the last few weeks, Odysseus and Janssen team (@Chris_Knoll, @pavgra, @Franka, @anthonysena and Co.) worked on creating a separate shared component called Circe-be that would allow us - among other things - to translate a JSON cohort definition into OHDSI SQL template which you can then translate into a database SQL dialect using SQL Renderer.

One of the reasons for this functionality is that we also wanted to exchange cohorts using JSON but then do translation directly in an execution engine component installed on a data provider side, including using it from an R package.

Coincidentally, we also had a great initial discussion at our last OHDSI Architecture Workgroup call about what exactly are the use cases where we are using cohorts and how what we do when we use it will change depending on this use case e.g. serializing cohort for the duration of use case level “transaction only” , using cohorts to perform feasibility and returning counts etc…

I think it would be fantastic if we all would bring these discussions together, document cohort usage use cases and produce re-usable components that would enable us to use cohorts for those? I would be happy to organize and facilitate this call, including providing an initial set of possible use cases that we can use as a start.

Btw, we have also made a great progress on being able to created a distributable, self-contained R package using packrat (@schuemie) and I would be happy to share this as well. I think these ideas go hand-in-hand and very complimentary

1 Like

Can you elaborate more on the packrat work? What martijn posted about table 1 seems also relevant for our Cancer cohort experiments.

the idea is to create a set of reusable, self-contained, parametrized analysis components for each of the popular OHDSI methods that would be taking analysis parameters as arguments vs. coding it into R script as it is done today.

The “self-contained” is a key to avoid dynamic references and conflict version and to ensure reproducible analysis execution across different sites. This is where packrat comes in. It allows us to package parametrized analysis components including all and every code dependency. This package can then be shared and is guaranteed to run on every site.

Longer term, I am envisioning a better model (or, as complementary) where sites/systems can just exchange analysis parameters within a standard message (HL7? JSON?) and then correct pre-packaged components can be picked up from global repository and executed by feeding in those parameters.

t