OHDSI Home | Forums | Wiki | Github

CohortMethod user group

The OHDSI CohortMethod package allows one to perform new-user cohort studies in observational data using large-scale propensity score models and equally large-scale outcome models.It is currently our most complex (operational) effect estimation method, and here users of the package can discuss their experiences and thoughts.

1 Like

Hi @schuemie, Iā€™ve been playing around with the CohortMethod for a study Iā€™m doing to do a comparative analysis of anticoagulants. I noticed that you have YEAR as a demographic variable. Iā€™m not sure that I agree with that. The year that the drug was prescribed does not seem like it should fall into the same category as sex, age, and ethnicity. Because I am comparing a drug that only recently became popular to an old standby including YEAR as a demographic is fitting the propensity score model to that instead of more meaningful variables ā€“ like prior conditions.

Just curious what your thoughts are.

Hi Nick, In cases where exposure is (almost) separable by a given covariate (year in this case), youā€™ll probably want to throw it out before fitting a propensity score model (and certainly note the separability / lack balance in cohorts). Iā€™ll dig into the model fitting R code and identify an easy way to force covariates out.

Yes, exactly. So right now I am excluding all of the demographics from the PS model. Iā€™m looking forward to an update there. Thanks Marc.

Here is how to exclude all age-related covariates in the vignette:

  cohortData <- loadCohortData("coxibVsNonselVsGiBleed")

  # Build default PS model
  ps_original <- createPs(cohortData, outcomeConceptId = 3)
  propensityModel_original <- getPsModel(ps_original, cohortData)

  which_age <- grepl("^Age", propensityModel_original$covariateName)
  propensityModel_original$covariateName[which_age] # Fitted PS model includes age-related covariates

  # Build PS model a priori excluding age-related covariates
  which_out <- grepl("^Age",
       cohortData$covariateRef$covariateName[1:length(cohortData$covariateRef$covariateName)])
  excludeIds <- cohortData$covariateRef$covariateId[which_out]

  ps_new <- createPs(cohortData, outcomeConceptId = 3,
                     excludeCovariateIds = excludeIds) # Here is relevant modification
  propensityModel_new <- getPsModel(ps_new, cohortData)
  which_age <- grepl("^Age", propensityModel_new$covariateName)
  sum(which_age) # Age was excluded.

Here is how to exclude Index year from your study:

  # Find index year covariates
  which_out <- grepl("^Index year", 
       cohortData$covariateRef$covariateName[1:length(cohortData$covariateRef$covariateName)])
  # List names to screen to confirm
  cohortData$covariateRef$covariateName[which_out]
  # Collect IDs to pass to createPs()
  excludeIds <- cohortData$covariateRef$covariateId[which_out]

Here is a (maybe) useful function that I just wrote to grep the covariate names in cohortData:

  grepCovariateNames <- function(pattern, cohortData) {
    as.ram(with(cohortData,
                covariateRef[ffwhich(covariateRef,
                                     grepl(pattern, covariateName)),]))
  }

I tried to use getDbCohortData function in CohortMethod package, but none of attempt succeed. The one most near success provide me the following error message

Error in UseMethod(ā€œopenā€) :
no applicable method for ā€˜openā€™ applied to an object of class ā€œdata.frameā€

This happen in using comparatorDrugConceptID process in getDbCohortData function, so no data can be generated use this function. Does anyone try this function in CohortMethod?

Thanks,

Jun,

Based on the settings in ā€œSingle studies using the CohortMethod packageā€, function of getDbCohortData work. But it takes a while to the complete data in our database. And the constraint for list parameter in the function is that the length of the list functions should be less than 1000.

Thanks,

Zuoyi

Hi Jun,

Can you provide the script youā€™re using to call dbGetCohortData?

Hi, Martin

here is the code,
Dt<-getDbCohortData(connectionDetails,cdmDatabaseSchema = cdmDatabaseSchema,
oracleTempSchema = resultsDatabaseSchema,
targetDrugConceptId =1 ,
comparatorDrugConceptId=1,
useCovariateDemographics = FALSE)

the error message is
Error in UseMethod(ā€œopenā€) :
no applicable method for ā€˜openā€™ applied to an object of class ā€œdata.frameā€
In addition: Warning messages:
1: In DatabaseConnector::dbGetQuery.ffdf(conn, cohortSql) :
Data has zero rows, returning an empty data frame
2: In DatabaseConnector::dbGetQuery.ffdf(conn, covariateSql) :
Data has zero rows, returning an empty data frame
3: In getDbCovariates(connection = conn, oracleTempSchema = oracleTempSchema, :
No data found

Thanks.

Hi Jun,

The error message is caused by the fact that there is no data: there are no people on your target or comparator drug. Since youā€™re using concept ID = 1 for both, and havenā€™t specified a custom cohort table where these cohorts might live, it is looking for concept ID 1 in the drug_era table. And 1 isnā€™t a valid concept ID for drugs.

Sorry for the mysterious error messages! Itā€™s on my todo list to make these messages more meaningful.

Please take a look at the vignette for an example on how to use the CohortMethod.

When I tried to extract data using getDbCohortData example in single study (p5), itā€™s repeatedly reported that some columns not available. And I was told because cohortMethod is currently works on CDMv4, thatā€™s why some columns not available on our CDMv5.

And I am also try to figure out why cohortmethod can work on million covariates, whatā€™s algorithm behind this package?

Thanks

Indeed, CohortMethod currently only works for CDM v4. Iā€™ll add support for v5 shortly.

CohortMethod uses our Cyclops package, which has efficient implementations of a wide range of regularized regressions. For fitting the propensity model we use regularized logistic regression with a LaPlace prior.

Thanks Martijn,

I am just wondering which specific algorithm you adopted and greatly increase the computation efficiency. Our experience is that the we canā€™t get quick result when fitting thousands covariates, and the resources requirement for handling 10 thousands covariates or more is too high to be considered.

Hi Jun,

Cyclops uses cyclic coordinate descent for optimizing the likelihood function. Also, Cyclops makes extensive use of the fact that the data is sparse, both for storage (no need to store zeroes) and for computation of the likelihood (no need multiplying betas with zeroes).

It also helps that Marc implemented the algorithm in C++ in very efficient way.

A more detailed description can be found in this article.

Hi all,

We have just released version 1.1.0 of CohortMethod. Here is the description:

This version adds support for running large scale studies including many target-comparator combinations, outcomes, and analysis. This functionality is documented in a new vignette.

Also included are many bug fixes, as well as significant performance improvements in data fetching, matching, and covariate balance computation. (Recent changes in Cyclops have already improved performance in propensity model fitting and outcome model fitting). This new version also supports the Common Data Model version 5.

Current users of CohortMethod should be aware of these changes in the user interface:

  • CohortData is now called CohortMethodData everywhere
  • targetDrugConceptId, comparatorDrugConceptId, and outcomeConceptId are now called targetId, comparatorId, and outcomeId, respectively.

Iā€™m getting unexpected results from running CohortMethod and was wondering if you could point me in the right direction.

Here is what I have done:

I created three cohorts using Circe. Here they are on ohdsi.org if helpful:

Target - http://www.ohdsi.org/web/circe/#/814
Comparator - http://www.ohdsi.org/web/circe/#/815
Outcome - http://www.ohdsi.org/web/circe/#/816

I confirmed that all three cohorts generated patients from my local CDM. I then ran some manual SQL joins to ensure that there is overlap in patients between my target / comparator cohorts and outcome cohort.

I moved these three cohorts to a dedicated table and set up the CM configuration as described in the SingleStudy vignette, using the appropriate cohort_definition_ids for my targetId, comparatorId, and outcomeId.

Things run swimmingly for a while, then I get what appears to be empty result set (i.e., no outcomes found).

Constructing treatment and comparator cohorts
Executing multiple queries. This could take a while
  |==================================================================================================================================| 100%
Analysis took 0.822 secs
Fetching data from server
Loading took 2.56 secs
Constructing default covariates
  |==================================================================================================================================| 100%
Analysis took 9.72 hours
Done
Fetching data from server
Loading took 4.11 mins
Removing redundant covariates
Normalizing covariates

Constructing outcomes
Executing multiple queries. This could take a while
  |==================================================================================================================================| 100%
Analysis took 6.27 secs
Done
Fetching data from server
Loading took 0.306 secs

Error in UseMethod("open") : 
  no applicable method for 'open' applied to an object of class "data.frame"
In addition: Warning message:
In lowLevelQuerySql.ffdf(connection, sql) :
  Data has zero rows, returning an empty data frame

What puzzles me is why the result set would be empty when I have confirmed patients do indeed appear in the outcome group. I assume I have done something incorrectly in my configuration. Has any one else had this problem?

I am happy to post my configuration but didnā€™t want to bomb peopleā€™s inboxes.

Thanks

Jon

Hi Jon,

I think this thread was actually just fine in the issue tracker (keeps the issue close to the offending package). Iā€™ll reopen it there.

Hi all,

Please be aware that a major new version of CohortMethod has been released. See here for details.

In the statistical tutorial - the speakers referenced several statistical textbooks that are useful. At the time of the tutorial, I did not take note of those.

To dig deeper into statistics (related to CohortMethod), which [text]books would you recommend as good reference textbook?

I think one mentioned book was this by Rosenbaum (but it is from 2010 which seems not that recent)

t