OHDSI Home | Forums | Wiki | Github

Cohort vs Cohort_attribute -- redundant?

Good day:

The tables cohort and cohort_attribute seem to be redundant. Cohort seems to have a subset of columns in cohort_attribute and non-contributing.

Is there a reason to populate data in cohort table or is it being retained for backward compatibility?

Thank you

No not redundant. A cohort is a set of persons who satisfy one of more
inclusion criteria for a duration of time. So, a record in the cohort
table indicates a period of time that a particular person is a member of a
cohort. The cohort table is actively used throughout ohdsi tools,
including when you generate a cohort definition in atlas.

In contrast, cohort_attribute is a construct to store information about the
members of a cohort. So one member of a cohort could have zero or more
records in the cohort attribute records. The table is there as a
standardized structure to support derived elements (ex. Perhaps you want to
calculate a predictive probability or risk score like charlson for a person
and want to store it rather than compute on the fly each time). I feel
good about the structure in general, but in practice, I havent seen much
use of it in the community yet (though i expect that will change).

Hope that helps to clarify. Happy hacking…

Thank you @Patrick_Ryan - I think i understand, will try to reflect my understanding here - and maybe you could validate. btw - happy morning!

My original post was based on assumption that there is a 1:1 correlation between records of Cohort and Cohort_attribute tables i.e. for every row in Cohort there is a row in Cohort_attribute and vice versa. If that assumption is true then: Cohort table is just a subset of the fields in Cohort_attribute - because cohort holds a subset of fields from Cohort_attribute ‘cohort_definition_id, subject_id, cohort_start_date, cohort_end_date’ .

Based on your clarification - this assumption is not true (I think):

So my new understanding is: A record in the Cohort_attribute table is only populated if there is a need to represent information about an attribute for the cohort - i.e. computation derived and when there is a attribute_definition_id. So - not redundant.

Splitting hair here, but an option to consider: If there is not much adoption of the cohort_attribute table, we can eliminate it - but move its fields to the cohort table. i.e. we move attribute_definition_id, value_as_number, value_as_concept_id to cohort table. We add a new concept for attribute_definition_id that semantically represents something like ‘base - cohort definition’. When this concept is present it would make the base cohort definition. It will reduce the table count from OMOP model by one.

Personally, i prefer to keep both structures. In many use cases (certainly

90%), simply generating the cohort and knowing the members is all you need
to perform your analysis (this covers all the quality measures you are
interested in). But as we evolve toward large scale modeling and look to
reuse expensive computations, i can imagine deriving hundreds or thousands
or hundreds of thousands of attributes for each member, and we would want
to handle that data quite differently.

1 Like


In the COHORT_ATTRIBUTE table we have cohort_start_date and cohort_end_date. I think we should change it to, or add (for backward compatibility) attribute_start_date and attribute_end_date.

e.g. we want to know subject level attribute 1 year before being eligible to be in the cohort, or in the first month of being in the cohort, or the max value of an attribute during the entire duration of the cohort, or 1 year after the exit from being in the cohort - we need start-date and end-date.

One use-case maybe tracking blood pressure during the course of 1 year treatment with a blood pressure medication. We collect systolic blood pressure attribute on a weekly basis during treatment. This attribute has a attribute_definition_id, attribute_start_date, attribute_end_date (start and end may be same date), value_as_number. Now we are tracking treatment impact over time for each member of the cohort.


Regarding the comment above - are there any pre-defined attribute_type_concept_id in OMOP vocabulary (I could not find one). Examples maybe - Risk score, Demographics, Conditions, Measurement, etc.

That’s interesting, I’m not exactly sure the right way to model that. I
would think the time dimension there is a characteristic of the attribute
itself, so should be embedded within the ATTRIBUTE definition, and not
captured within each instantiation of the attribute.

But the modeling choices are: ATTRIBUTE_DEFINITION = ‘blood pressure
value’, and you use your new ATTRIBUTE_START_DATE to capture which week it
was…or you have ATTRIBUTE_DEFINITION = ‘blood pressure value at XX
weeks before/after index’, so you’d have different ATTRIBUTE_DEFINITION
for each time period. Probably other ideas too…

in featureextraction, we define covariateid to provide the full definition,
including concepts and time windows, and thats worked effectively thusfar
so if i were forced to do something, id choose a similar model. but open
to hearing others thoughts.

the reason cohort start and end date need to remain is one person could
belong to a cohort in multiple intervals so you need those dates as the key
to link records.

Is there a place in the cdm for that?

no, thats done at analysis time and doesnt need to be persisted.

Maybe it’s time to get rid of COHORT_ATTRIBUTE?

Was the Cohort_attribute table deprecated? What would be the best way to store subject attributes related to their membership in a cohort without that table in v5.4?