Team:
On today’s call, we’ll discuss the current state and potential future directions in the use of the COHORT, COHORT_DEFINITION (and by extension, COHORT_ATTRIBUTE and COHORT_ATTRIBUTE_DEFINITION) tables.
There’s been a lot of exciting progress from the CIRCE working group and the HERACLES working group, both of which are heavily relying on the existence of a standardized data structure to represent patients that satisfy a set of inclusion criteria for a duration of time. Our COHORT table, as defined in OMOP CDM v4 and CDM v5 specifications, is that standardized data structure, and is efficiently represented as: COHORT_ID, SUBJECT_ID, COHORT_START_DATE and COHORT_END_DATE. CIRCE (latest UI available at: http://ohdsi.org/web/circe) is a standardized tool to provide a user interface, standardized syntax to present cohort definitions, and a compiler to produce platform-independent SQL to instantiate the cohort. HERACLES takes a cohort as an input to produce standardized summary statistics about the cohort, in much the same way that ACHILLES produces standardized summary statistics about the entire database.
Beyond the standardized applications, we know several of you in the community are building your own cohorts for other purposes. @amatcho provided a good example within CPRD for how to standardize HES data, of which only a subset of people qualify for a shorter period of time than their overall observation period. Our colleagues at Erasmus have been considering how to apply a new cohort to define the subset of time that they have confidence in the reasearch-readiness of the data. Others have been thinking about how to pre-define cohorts as part of their ETLs for diseases that we all know we need to use over and over again, like diabetes, etc.
A question that has cropped up is where is the best place(s) for these data structures to reside, and how should our analyses accommodate these location(s). On the one side, storing everything in the CDM keeps patient-level data consolidated. On the flip side, some don’t have write permissions to the CDM, and given the more dynamic nature in which these data structures may be used, it could be worthwhile to store a copy of the objects in the OHDSI application schema. Given that this could have material consequences on future standardized application development as well as ETL conventions, I think it’d be a useful conversation to raise with the group to hear everyone’s perspectives.
As time permits, other topics with ongoing activity:
-
Registration is now closed for the OHDSI F2F meeting at Stanford. We should have a productive session with the ~25 folks who will be in attendence, it’ll be good to roll up our sleeves to dive into some hard topics.
-
AMIA Annual Conference submissions: there’s threads around a Systems Demonstration and a paper on the OHDSI infrastructure. A couple weeks left before that deadline.
Cheers,
Patrick