OHDSI Home | Forums | Wiki | Github

Is anyone else using COHORT to store anything other than PERSON_ID?

When we originally developed the COHORT table for the CDM, it was with an eye toward having a standard data structure that could be used to store cohorts of individuals, for reuse in various analytic applications. This structure has proven quite useful for me, and as our phenotyping tutorial demonstrated, standardizing a definition of ‘cohort’ around the ‘set of persons who satisfy one or more inclusion criteria for a duration of time’ is quite a valuable unifying construct that helps both for extracting patient-level data from CDM into a cohort, as well as for using a cohort as input into subsequent analyses. The ATLAS cohort design and execution engine has successfully utilized this structure and become the foundation for most of our work.

So, here’s my question: has anyone in the community actually used the COHORT table to store anything other than PERSON_ID? The thought when this was initially brainstormed was that the community may want to store cohorts of other entities, like ‘cohort of providers’, or ‘cohort of visits’, or ‘cohort of care sites’. This was the reason that the COHORT field for identifier is SUBJECT_ID. However, I’ve yet to see anyone actually use the COHORT table for any of these use cases. I’ve now created thousands of cohorts of PERSON_IDs, but 0 cohorts of any other type. And for many of these cohorts, I’m writing analysis queries, and often, I’m finding myself annoyed that I have to join COHORT.SUBJECT_ID on DOMAIN_TABLE.PERSON_ID. If the field were named COHORT.PERSON_ID instead of COHORT.SUBJECT_ID it would be a lot cleaner for 100% of my use cases. But, before I make this formal recommendation of a change for a future CDM release to the OMOP CDM workgroup, I want to check in with the community to see if others have different use cases that I’m just not aware of.


Looking forward for the proposal, but just one caveat: Many of the data usually come with the explicit restriction to not test providers or institutions for the outcomes of their interventions. But folks don’t really like to broadly discuss this fact in public. There is a lot of good and not so good reasons for this reticence, even though personally I believe we should move beyond that. But as a result you will not see much research activities in the public domain. In proprietary research, the situation is slightly different. Therefore, I would suggest to keep the door open.

I welcome discussion about tweaking the cohort table.

My vote would be to keep the flexibility of cohort table to have providers too.

One convention I would like to add to cohorts is to add some standard cohorts. To implement OMOP CDM would also mean to populate some standard phenotypes.

E.g., a patient with at least one visit would be one standard cohort. A cohort ID for it would be part of OMOP Specs (or OMOP Implementation guide) and analyst could rely on it.

So to report counts, I would not take the person table but this standard cohort.

And we could have a set of 10 or so “standardized cohorts”.

The idea of minimal patient. For the purpose of more reasonable denominator.

Been toying around the idea of a cohort of providers or hospital/caresites - institutes.

@patrick_ryan @christian_reich

cohort types are to be described here https://github.com/OHDSI/CommonDataModel/blob/master/Documentation/CommonDataModel_Wiki_Files/StandardizedVocabularies/COHORT_DEFINITION.md