OHDSI Home | Forums | Wiki | Github

CohortMethod: Error in data frame size when Creating cohortMethodData objects

I’m running the CohortMethod code for a PLE study generated by Atlas. It looks like there’s a mismatch in the size of the cohortMethodData objects:

Running CohortMethod analyses
*** Creating cohortMethodData objects ***
| | 0%Thread 1 returns error: “numbers of columns of arguments do not match” when using argument(s): list(args = list(connectionDetails = list(dbms = “redshift”, user = “user”, password = “password”, server = “[redacted]”, oracleDriver = “thin”), cdmDatabaseSchema = “full_201903_omop_v5”, oracleTempSchema = NULL, exposureDatabaseSchema = “study_reference”, exposureTable = “arthro”, outcomeDatabaseSchema = “study_reference”, outcomeTable = “arthro”, cdmVersion = 5, outcomeIds = c(882, 26662, 30234, 30753, 31317, 72715, n72737, 73575, 74725, 75860, 76161, 77139, 78162, 78474, 79864, 80824, 81151, 133729, 136057, 136184, 136661, 136773, 137351, 137951, 138384, 138825, 139099, 140641, 141375, 141932, 194686, 195562, 195603, 196456, 197006, 197381, 197607, 197610, 197911, 198101, 198400, 198803, 199866, 200528, 201061, 201072, 201254, 201322, 201418, 201620, 201826, 257004, 261325, 312950, 313459, 316127, 318736, 318800, 321596, 373478, 374375, 375806, 376065, 376382, 377575, 377910, 379805, 381292, 381295, 381877, n432730, 432738, 432798, 432883, 433163, 433316, 433440, 433753, 433811, 434004, 434005, 434610, 434613, 435243, 435510, 435511, 435515, 435796, 436070, 436118, 436246, 436375, 436940, 436962, 437246, 437264, 437833, 437851, 438120, 438409, 439082, 439147, 440129, 441417, 443211, 443597, 443731, 444367, 4033837, 4084229, 4088920, 4098604, 4103703, 4110815, 4168217, 4172432, 4193704, 4209423, 4211231, 4305080, 4308509, 4311591, 40483107, 45768910), targetId = 880, comparatorId = 881, restrictToCommonPeriod = FALSE, n removeDuplicateSubjects = “remove all”, maxCohortSize = 100000, excludeDrugsFromCovariates = FALSE, covariateSettings = list(list(VisitCountMediumTerm = FALSE, ObservationShortTerm = TRUE, shortTermStartDays = -30, MeasurementRangeGroupShortTerm = FALSE, ConditionOccurrenceLongTerm = FALSE, DrugEraStartLongTerm = FALSE, VisitCountShortTerm = FALSE, Chads2Vasc = TRUE, ConditionGroupEraStartLongTerm = FALSE, ConditionEraShortTerm = FALSE, Dcsi = TRUE, DrugGroupEraLongTerm = TRUE, DrugGroupEraShortTerm = TRUE, n ConditionEraStartLongTerm = FALSE, temporal = FALSE, DemographicsIndexMonth = TRUE, ConditionOccurrencePrimaryInpatientLongTerm = FALSE, ConditionEraAnyTimePrior = FALSE, addDescendantsToInclude = FALSE, ConditionGroupEraStartMediumTerm = FALSE, ProcedureOccurrenceLongTerm = TRUE, DrugExposureLongTerm = FALSE, DrugEraStartShortTerm = FALSE, DistinctIngredientCountMediumTerm = FALSE, DistinctMeasurementCountShortTerm = FALSE, MeasurementRangeGroupLongTerm = TRUE, ConditionGroupEraOverlapping = FALSE, n MeasurementRangeGroupMediumTerm = FALSE, DrugGroupEraStartMediumTerm = FALSE, MeasurementAnyTimePrior = FALSE, MeasurementMediumTerm = FALSE, includedCovariateIds = list(), ConditionOccurrenceAnyTimePrior = FALSE, DistinctConditionCountLongTerm = FALSE, MeasurementValueLongTerm = FALSE, DrugEraShortTerm = FALSE, DrugGroupEraAnyTimePrior = FALSE, DrugEraOverlapping = FALSE, ConditionOccurrencePrimaryInpatientAnyTimePrior = FALSE, ConditionEraMediumTerm = FALSE, ConditionEraOverlapping = FALSE, n ConditionEraStartShortTerm = FALSE, ObservationAnyTimePrior = FALSE, VisitConceptCountShortTerm = FALSE, DemographicsEthnicity = TRUE, DistinctIngredientCountLongTerm = FALSE, ConditionOccurrencePrimaryInpatientShortTerm = FALSE, DemographicsAgeGroup = TRUE, DistinctProcedureCountShortTerm = FALSE, DistinctObservationCountMediumTerm = FALSE, includedCovariateConceptIds = list(), DrugGroupEraStartShortTerm = FALSE, addDescendantsToExclude = TRUE, DrugEraLongTerm = FALSE, DistinctConditionCountShortTerm = FALSE, n ConditionGroupEraShortTerm = TRUE, ConditionEraStartMediumTerm = FALSE, VisitCountLongTerm = FALSE, DemographicsRace = TRUE, ProcedureOccurrenceAnyTimePrior = FALSE, DistinctObservationCountLongTerm = FALSE, ProcedureOccurrenceMediumTerm = FALSE, CharlsonIndex = TRUE, DemographicsPriorObservationTime = FALSE, MeasurementShortTerm = TRUE, DistinctProcedureCountMediumTerm = FALSE, ConditionEraLongTerm = FALSE, DrugGroupEraStartLongTerm = FALSE, DemographicsGender = TRUE, DeviceExposureAnyTimePrior = FALSE, n ObservationLongTerm = TRUE, DemographicsIndexYearMonth = FALSE, ConditionOccurrenceMediumTerm = FALSE, longTermStartDays = -365, DemographicsAge = FALSE, DrugGroupEraOverlapping = TRUE, DistinctMeasurementCountLongTerm = FALSE, MeasurementRangeGroupAnyTimePrior = FALSE, DistinctConditionCountMediumTerm = FALSE, DrugGroupEraMediumTerm = FALSE, ProcedureOccurrenceShortTerm = TRUE, ObservationMediumTerm = FALSE, ConditionGroupEraAnyTimePrior = FALSE, Chads2 = TRUE, DrugExposureAnyTimePrior = FALSE, n DeviceExposureLongTerm = TRUE, DemographicsTimeInCohort = FALSE, DistinctMeasurementCountMediumTerm = FALSE, MeasurementValueShortTerm = FALSE, DeviceExposureMediumTerm = FALSE, ConditionGroupEraStartShortTerm = FALSE, ConditionOccurrencePrimaryInpatientMediumTerm = FALSE, MeasurementLongTerm = TRUE, DemographicsIndexYear = TRUE, MeasurementValueMediumTerm = FALSE, DrugEraStartMediumTerm = FALSE, MeasurementValueAnyTimePrior = FALSE, DistinctObservationCountShortTerm = FALSE, DrugEraMediumTerm = FALSE, n ConditionGroupEraLongTerm = TRUE, DrugExposureShortTerm = FALSE, DistinctIngredientCountShortTerm = FALSE, DeviceExposureShortTerm = TRUE, mediumTermStartDays = -180, DemographicsPostObservationTime = FALSE, VisitConceptCountLongTerm = FALSE, VisitConceptCountMediumTerm = FALSE, excludedCovariateConceptIds = c(19058978, 4001652, 4149198, 2211436, 44514807, 4248555, 320827, 43531648, 4136345, 4138870, 4003142, 40756852, 4326255, 2000025, 44514800, 2005904, 2211717, 2211716, 4253808, 199073, n 37017417, 2105103, 75039, 4062247, 200761, 4076181, 4176868, 4138872, 2006200, 4001650, 4079266, 2771224, 4035487, 2773425, 2773451, 4140294, 2775846, 4076882, 4177835, 35625764, 46272777, 2773435, 44809076, 35625794, 37115753, 2771205, 2773426, 4300754, 4203771, 35610630, 2771691, 2104836, 763946, 2773449, 2771701, 2771684, 44790484, 2773442, 2771220, 4150990, 2005944, 2771225, 44790486, 2771697, 2773453, 4205526, 4144432, 44515517, 2771705, 4079401, 2771215, 42872820, 2104837, 4137703, n 2773424, 4330505, 44515513, 4142076, 4034666, 2771694, 44514793, 4078258, 4142923, 2103716, 4196649, 4077286, 4143687, 2771703, 44790478, 2771702, 46272778, 4076609, 4079260, 2771707, 2771223, 2775859, 2771696, 2771222, 44514777, 44515510, 2771221, 2773443, 2771700, 2771706, 2771219, 44515504, 44514784, 46272775, 2771217, 37111559, 2771218, 2773444, 35625480, 44791175, 4138868, 2771226, 2771211, 4083672, 44790443, 2771698, 4034665, 4079259, 44783151, 44790803, 2771704, 2771203, 4175642, 2771213, n 4152063, 2773452, 2771699, 4079258, 42538476), ConditionGroupEraMediumTerm = FALSE, DrugExposureMediumTerm = FALSE, DistinctProcedureCountLongTerm = FALSE, DrugEraAnyTimePrior = FALSE, endDays = 0, ConditionOccurrenceShortTerm = FALSE)), studyStartDate = 20060101, washoutPeriod = 365, firstExposureOnly = TRUE, studyEndDate = NULL), compressCohortMethodData = FALSE, cohortMethodDataFolder = “./columbia/ADA/cmOutput/CmData”)
|================================================ | 50%Thread 2 returns error: “numbers of columns of arguments do not match” when using argument(s): list(args = list(connectionDetails = list(dbms = “redshift”, user = “user”, password = “password”, server = “[redacted]”, oracleDriver = “thin”), cdmDatabaseSchema = “full_201903_omop_v5”, oracleTempSchema = NULL, exposureDatabaseSchema = “study_reference”, exposureTable = “arthro”, outcomeDatabaseSchema = “study_reference”, outcomeTable = “arthro”, cdmVersion = 5, outcomeIds = c(885, 26662, 30234, 30753, 31317, 72715, n72737, 73575, 74725, 75860, 76161, 77139, 78162, 78474, 79864, 80824, 81151, 133729, 136057, 136184, 136661, 136773, 137351, 137951, 138384, 138825, 139099, 140641, 141375, 141932, 194686, 195562, 195603, 196456, 197006, 197381, 197607, 197610, 197911, 198101, 198400, 198803, 199866, 200528, 201061, 201072, 201254, 201322, 201418, 201620, 201826, 257004, 261325, 312950, 313459, 316127, 318736, 318800, 321596, 373478, 374375, 375806, 376065, 376382, 377575, 377910, 379805, 381292, 381295, 381877, n432730, 432738, 432798, 432883, 433163, 433316, 433440, 433753, 433811, 434004, 434005, 434610, 434613, 435243, 435510, 435511, 435515, 435796, 436070, 436118, 436246, 436375, 436940, 436962, 437246, 437264, 437833, 437851, 438120, 438409, 439082, 439147, 440129, 441417, 443211, 443597, 443731, 444367, 4033837, 4084229, 4088920, 4098604, 4103703, 4110815, 4168217, 4172432, 4193704, 4209423, 4211231, 4305080, 4308509, 4311591, 40483107, 45768910), targetId = 883, comparatorId = 884, restrictToCommonPeriod = FALSE, n removeDuplicateSubjects = “remove all”, maxCohortSize = 100000, excludeDrugsFromCovariates = FALSE, covariateSettings = list(list(VisitCountMediumTerm = FALSE, ObservationShortTerm = TRUE, shortTermStartDays = -30, MeasurementRangeGroupShortTerm = FALSE, ConditionOccurrenceLongTerm = FALSE, DrugEraStartLongTerm = FALSE, VisitCountShortTerm = FALSE, Chads2Vasc = TRUE, ConditionGroupEraStartLongTerm = FALSE, ConditionEraShortTerm = FALSE, Dcsi = TRUE, DrugGroupEraLongTerm = TRUE, DrugGroupEraShortTerm = TRUE, n ConditionEraStartLongTerm = FALSE, temporal = FALSE, DemographicsIndexMonth = TRUE, ConditionOccurrencePrimaryInpatientLongTerm = FALSE, ConditionEraAnyTimePrior = FALSE, addDescendantsToInclude = FALSE, ConditionGroupEraStartMediumTerm = FALSE, ProcedureOccurrenceLongTerm = TRUE, DrugExposureLongTerm = FALSE, DrugEraStartShortTerm = FALSE, DistinctIngredientCountMediumTerm = FALSE, DistinctMeasurementCountShortTerm = FALSE, MeasurementRangeGroupLongTerm = TRUE, ConditionGroupEraOverlapping = FALSE, n MeasurementRangeGroupMediumTerm = FALSE, DrugGroupEraStartMediumTerm = FALSE, MeasurementAnyTimePrior = FALSE, MeasurementMediumTerm = FALSE, includedCovariateIds = list(), ConditionOccurrenceAnyTimePrior = FALSE, DistinctConditionCountLongTerm = FALSE, MeasurementValueLongTerm = FALSE, DrugEraShortTerm = FALSE, DrugGroupEraAnyTimePrior = FALSE, DrugEraOverlapping = FALSE, ConditionOccurrencePrimaryInpatientAnyTimePrior = FALSE, ConditionEraMediumTerm = FALSE, ConditionEraOverlapping = FALSE, n ConditionEraStartShortTerm = FALSE, ObservationAnyTimePrior = FALSE, VisitConceptCountShortTerm = FALSE, DemographicsEthnicity = TRUE, DistinctIngredientCountLongTerm = FALSE, ConditionOccurrencePrimaryInpatientShortTerm = FALSE, DemographicsAgeGroup = TRUE, DistinctProcedureCountShortTerm = FALSE, DistinctObservationCountMediumTerm = FALSE, includedCovariateConceptIds = list(), DrugGroupEraStartShortTerm = FALSE, addDescendantsToExclude = TRUE, DrugEraLongTerm = FALSE, DistinctConditionCountShortTerm = FALSE, n ConditionGroupEraShortTerm = TRUE, ConditionEraStartMediumTerm = FALSE, VisitCountLongTerm = FALSE, DemographicsRace = TRUE, ProcedureOccurrenceAnyTimePrior = FALSE, DistinctObservationCountLongTerm = FALSE, ProcedureOccurrenceMediumTerm = FALSE, CharlsonIndex = TRUE, DemographicsPriorObservationTime = FALSE, MeasurementShortTerm = TRUE, DistinctProcedureCountMediumTerm = FALSE, ConditionEraLongTerm = FALSE, DrugGroupEraStartLongTerm = FALSE, DemographicsGender = TRUE, DeviceExposureAnyTimePrior = FALSE, n ObservationLongTerm = TRUE, DemographicsIndexYearMonth = FALSE, ConditionOccurrenceMediumTerm = FALSE, longTermStartDays = -365, DemographicsAge = FALSE, DrugGroupEraOverlapping = TRUE, DistinctMeasurementCountLongTerm = FALSE, MeasurementRangeGroupAnyTimePrior = FALSE, DistinctConditionCountMediumTerm = FALSE, DrugGroupEraMediumTerm = FALSE, ProcedureOccurrenceShortTerm = TRUE, ObservationMediumTerm = FALSE, ConditionGroupEraAnyTimePrior = FALSE, Chads2 = TRUE, DrugExposureAnyTimePrior = FALSE, n DeviceExposureLongTerm = TRUE, DemographicsTimeInCohort = FALSE, DistinctMeasurementCountMediumTerm = FALSE, MeasurementValueShortTerm = FALSE, DeviceExposureMediumTerm = FALSE, ConditionGroupEraStartShortTerm = FALSE, ConditionOccurrencePrimaryInpatientMediumTerm = FALSE, MeasurementLongTerm = TRUE, DemographicsIndexYear = TRUE, MeasurementValueMediumTerm = FALSE, DrugEraStartMediumTerm = FALSE, MeasurementValueAnyTimePrior = FALSE, DistinctObservationCountShortTerm = FALSE, DrugEraMediumTerm = FALSE, n ConditionGroupEraLongTerm = TRUE, DrugExposureShortTerm = FALSE, DistinctIngredientCountShortTerm = FALSE, DeviceExposureShortTerm = TRUE, mediumTermStartDays = -180, DemographicsPostObservationTime = FALSE, VisitConceptCountLongTerm = FALSE, VisitConceptCountMediumTerm = FALSE, excludedCovariateConceptIds = c(4001650, 4142923, 2775859, 44790478, 2006200, 44514793, 4079260, 4143687, 4142076, 2104836, 320827, 4150990, 4136345, 4248555, 4203771, 763946, 44515517, 42872820, 2211436, 19058978, n 4001652, 44790486, 4144432, 4149198, 44515513, 4137703, 2104837, 75039, 4035487, 44809076, 2211717, 2211716, 4253808, 199073, 2775846, 35625764, 4140294, 35625794, 4177835, 4152063, 4175642, 4079259, 4079258, 4062247, 200761, 4176868, 4138868, 44791175, 2000025, 44514784, 40756852, 44515510, 44514777, 44515504, 44790443, 4003142, 35625480, 4326255, 46272775, 2105103, 2771224, 2773425, 2773451, 4076882, 2005904, 44514800, 46272777, 2773435, 37115753, 2771205, 37017417, 2773426, 4138870, 4300754, n 35610630, 2771691, 43531648, 2773449, 2771701, 2771684, 44790484, 2773442, 2771220, 2005944, 2771225, 2771697, 2773453, 4205526, 2771705, 4079401, 2771215, 2773424, 4330505, 44514807, 4034666, 2771694, 4078258, 2103716, 4196649, 4077286, 2771703, 2771702, 46272778, 4076609, 4079266, 2771707, 2771223, 2771696, 2771222, 2771221, 2773443, 2771700, 2771706, 2771219, 2771217, 37111559, 2771218, 2773444, 2771226, 2771211, 4083672, 4138872, 2771698, 4034665, 44783151, 44790803, 2771704, 2771203, n 2771213, 2773452, 4076181, 2771699, 42538476), ConditionGroupEraMediumTerm = FALSE, DrugExposureMediumTerm = FALSE, DistinctProcedureCountLongTerm = FALSE, DrugEraAnyTimePrior = FALSE, endDays = 0, ConditionOccurrenceShortTerm = FALSE)), studyStartDate = 20060101, washoutPeriod = 365, firstExposureOnly = TRUE, studyEndDate = NULL), compressCohortMethodData = FALSE, cohortMethodDataFolder = “./columbia/ADA/cmOutput/CmData”)
|===============================================================================================| 100%
Error in ParallelLogger::clusterApply(cluster, objectsToCreate, createCmDataObject) :
Error(s) when calling function ‘fun’, see earlier messages for details

I’m trying to find where rbind exists in the code but it’s a little buried. I know there’s some references in CohortMethod. @schuemie or @msuchard, any suggestions on where you’d go to debug this? I was guessing I may need to adjust rbind to an rbind.fill because I have NAs in some of the data frames. Open to other suggestions on how to deal with the size mismatch.

(cc: @jweave17 and @SCYou :slight_smile: )

@krfeeney
When I see this error, I usually think ‘there’s something wrong in my study design’.
(If I were you, I wouldn’t try to debug the code itself. Believe the CohortMethod and ATLAS.)

My recommendations:

  1. Check the number of cohorts (You can see this in the output folder)
  2. Check the time-at-risk and the outcome (Be sure that enough (?) number of outcome occurred in the target and comparator cohort with your time-at-risk setting)
  3. Do this study again without any matching or stratification and see whether this design works
  4. Remove number of covariates for matching (it can be resulted from the incompatible data architecture. You know OMOP- CDM has been evolving so fast)

Hope this is helpful :slight_smile:

1 Like

Thank you @SCYou! I inherited this study package from a network collaborator. We appreciate your suggestions. I’ll go through these and see how we can fix the design! Appreciate your help!

You can get more specific error messaging by turning multi-threading off (maxCores = 1) while debugging. Also, maybe there’s some more information in the log file?

@krfeeney, @schuemie - we seems to have narrowed it down to something specific in PLE and Hydra and created an new issue here

while our team was unable to reproduce it on SynPUF, but it was quite easy to reproduce it on Truven and some other data sets. Martijn, any chance you could try running this on your side to see if you can reproduce this issue as well?

1 Like

just a quick update - it looks like @schuemie was able to address the issue reported here (see details on GitHub link above)

t