When we originally developed the COHORT table for the CDM, it was with an eye toward having a standard data structure that could be used to store cohorts of individuals, for reuse in various analytic applications. This structure has proven quite useful for me, and as our phenotyping tutorial demonstrated, standardizing a definition of ‘cohort’ around the ‘set of persons who satisfy one or more inclusion criteria for a duration of time’ is quite a valuable unifying construct that helps both for extracting patient-level data from CDM into a cohort, as well as for using a cohort as input into subsequent analyses. The ATLAS cohort design and execution engine has successfully utilized this structure and become the foundation for most of our work.
So, here’s my question: has anyone in the community actually used the COHORT table to store anything other than PERSON_ID? The thought when this was initially brainstormed was that the community may want to store cohorts of other entities, like ‘cohort of providers’, or ‘cohort of visits’, or ‘cohort of care sites’. This was the reason that the COHORT field for identifier is SUBJECT_ID. However, I’ve yet to see anyone actually use the COHORT table for any of these use cases. I’ve now created thousands of cohorts of PERSON_IDs, but 0 cohorts of any other type. And for many of these cohorts, I’m writing analysis queries, and often, I’m finding myself annoyed that I have to join COHORT.SUBJECT_ID on DOMAIN_TABLE.PERSON_ID. If the field were named COHORT.PERSON_ID instead of COHORT.SUBJECT_ID it would be a lot cleaner for 100% of my use cases. But, before I make this formal recommendation of a change for a future CDM release to the OMOP CDM workgroup, I want to check in with the community to see if others have different use cases that I’m just not aware of.