OHDSI Home | Forums | Wiki | Github

Cohort names in CDM v5

Hi all, just wanted to clarify with @Chris_Knoll, @Christian_Reich, and others the convention for Cohort names and definitions in CDM v5.

So we have the Cohort Attribute table, which links us over to the Attribute Definition table. And the Cohort Attributes seem to operate at the Cohort-Subject level. So two questions:

  1. Is the plan to write the name of the cohort in the Attribute Definition table? What concept_id is Cohort Name in the vocabulary?

  2. As subject_id is a required field for Cohort Attribute row, are we assign a name and cohort definition for every person in the cohort?

My previous understanding was that CDM v5 enabled clear standalone Cohort names and definition syntax that live independently (i.e., not tied to the any persons). I recall the discussion of the attribute table and how it would enable more flexibility with cohort related stuff, but it seems like the two basic attributes of a cohort definition (name and definition syntax) are now obscured.

Thoughts much appreciated.

Jon

To make the discussion more concrete, what CDM v5 SQL statement would produce something that would provide data for the prototype Cohort search field below (ignoring the creator name part):

Jon:

I am confused. Are you talking about Cohort or Cohort Attributes? The table you give below (Breast Cancer s/p Mastectomy, Cervical Cancer in-situ) are all Cohorts. Cohort Attributes would be things like Age, Charleson Index, BMI, Top Co-morbidity, etc.

So, let’s talk Cohort. In the Cohort table, you have no names or anything. Just the Cohort Definition ID (referring for your Breast Cancer s/p Mastectomy, etc.), the Subject IDs (person_id values in your case) and the dates. The COHORT_DEFINITION table contains all the detail of how the Cohort is defined, what consitutes a Breast Cancer and a Mastectomy etc., as well as the definition (text, SQL or FORTRAN), what subjects the cohort is built on (subject_concept_id) and the date you ran the definition and created records in the COHORT table (cohort_instaltiation_date).

So, to create a pop-down of all Cohorts in your CDM instance, you would say:


SELECT cohort_definition_name FROM cohort_definition;

The fact that these where written by Patrick on 11/14/2013 is not part of the CDM.

(BTW!!! Please use international date conventions in your applications. 11/14/2013 is not legible in Korea, because there is no 14th month in the year. Use 2013-11-14 or 14-Nov-2013).

I fixed the explanation a bit. Hopefully, that will make it clearer: http://ohdsi.org/web/wiki/doku.php?id=documentation:cdm:cohort_attribute.

@Christian_Reich, you’re right! I conflated the COHORT_ATTRIBUTE and COHORT_DEFINITION tables. Absolutely my mistake, I knew we wouldn’t leave it out!

And yes, the dates and creators are something that do not live in the CDM, nor should they. But they should be storable by an OHDSI application in a separate schema (which we currently are calling the RESULTS_SCHEMA but I’m advocating for a broader name).

On this note, to give fair advance warning, I will be gradually pushing for eventually everything user-created (i.e., not raw patient data and not the Vocabulary from on high) to live in a schema separate from the CDM. At least in a production-oriented DBA-security-loving place like Regenstrief, things like Cohorts and Cohort definitions, which will be frequently created, updated, deleted etc by regular users, need a different set of permissions from the CDM itself, which is a read-only place for most of our users. So a separate schema for everything Cohort related would make sense in our world, but perhaps not for others. Anyway, as CIRCE and HERACLES begin to see the light of day, we can begin sorting through these things more concretely.

Thanks again (and I’ll internationalize the dates!)

Jon

t