OHDSI Home | Forums | Wiki | Github

cohort_definition in ohdsi schema different to public schema

I have been trying to understand what Atlas does to the cohort and cohort definition tables to manage cohorts. I am managing cohort data manually at present but I want to future proof for when I use Atlas to manage it directly. During my investigations I noticed that the COHORT_DEFINITION table in the ohdsi schema (in the VM I downloaded prior to the conference) is different to the COHORT_DEFINITION table in the “public” schema.

The ohdsi definition contains the following columns:
id integer,
name (255),
description (1000),
expression_type (50),
created_by,
created_date,
modified_by, modified_date

the public definition contains the slightly different set below;
cohort_definition_id integer,
cohort_definition_name (255),
cohort_definition_description text,
definition_type_concept_id ,
cohort_definition_syntax text,
subject_concept_id,
cohort_initiation_date

The ohdsi schema seems to be where Atlas stores its cohort definitions but it is the public schema definition that matches the ohdsi documentation. Can anyone help me with understanding how these tables are used and why there are two separate sets with different structures.

WebAPI doesn’t use the cdm’s cohort tables. The cohort generation tables were discussed here:

One of the reasons we don’t use the CDM’s table, is that the cohort definition table doesn’t include the level of detail that we’d need for authoring cohort definitions. Another is that people usually lock down their CDM tables such that they are read-only. So, we introduced a ‘results’ schema, and you’ll find a ‘cohort’ table in that schema, and the results chema is for read/write access.

The topic of the cohort and cohort_definition table has caused too much confusion, and I’m wondering if those tables used by webapi should be renamed.

Or whether the COHORT table is removed from the CDM entirely.

1 Like

It definitely needs a little cleanup. I started this thread to keep track of all the tables here. There is a link to a spreadsheet

If possible - please contribute to the documentation there. Hopefully we can move it to a wiki-page with documentation at some point.

@Chris_Knoll, thanks for that information. That explains why the CDM tables are not being used. Just one thing to note from your explanation, it appears the ohdsi schema rather than the results schema is used by Atlas to store cohort, cohort_definition and cohort_definition_detail data. In fact there is no reference to any cohort table in the results schema. In the VM I downloaded there is just one table in the results schema called achilles_result_bak. When I create a cohort in Atlas, the cohort its details appear in the ohdsi schema.

I will work on the assumption that the cohort tables in the CDM are no longer a component of the CDM and the cohort tables are purely to support Altas functionality, correct?

Hi, @ColinOrr2006,
the cohort_definition and cohort_definition_detail tables are used by WebAPI (Atlas is the UI) to store cohort definitions. There is a cohort table, but it is used as an example table to create in the CDM results schema.

I’m not sure about the VM you downloaded, but it is possible that it was configured to use the same schema for the ohdsi schema (that WebAPI uses), and the cdm results schema. This is a valid configuration for ‘single cdm’ setups, but the recommended configuration is to have 3 separate schemas: ohdsi (for WebAPI configuration), results (for storing analysis query results, like cohort generation), and the cdm schema (for storing the patient level data). For each additional CDM, you’d set up an additional results schema and a cdm schema.

@ColinOrr2006, @Chris_Knoll, @frank_defalco

Where are we with the proposal to (i) add the idea of a schema to the CDM and (ii) kick out the cohorts (and other writeable tables) from the main CDM?

Let’s not make that a Webapi thing, but a generic thing, so applications that don’t use the API will use the same conventions and can exchange Cohort information.

@Christian_Reich,
Just need to work out how to have different applications that may write to this ‘generic thing’ without clobbering each other’s results. IE: if WebAPI writes to the cohort results table, and then some other R process does the same thing, who maintains the generation status of the cohort?

@Chris_Knoll:

Can we parallelize?

@Christian_Reich, I’m not sure what you mean by ‘parallelize’.

Parallelize the conversions about the CDM change and rules of engagement for applications. The CDM will not cover that.

BTW: Where are we putting conventions like the latter?

I believe there’s a dependency between the two. For example, if we were to handle the ‘rules of engagement’ by saying you should tag a record with an ‘application id’, then the definition of the CDM table will have to account for this to support this ‘rule of engagement’. However, we could also say that the CDM structure is what the structure is (the cohort table just has a cohort_id, but nothing else to denote ‘ownership’) then some external thing must exist to coordinate these cohort_ids between those systems. Could be cumbersome if there’s lots of different systems working against the same table, since they’d all have to come to agreement on identification.

Alternatively, if each system creates their own tables in their own schemas, then they don’t run the risk of clobbering each other, and they have the flexibility of creating the writable tables they need without the overhead of a CDM working group. This approach is the one WebAPI took, and perhaps ostensibly called the writable schema the ‘ohdsi results schema’, but perhaps this is a bit of an overreach since it’s plausible that other applications could be written to read the contents of a CDM and write those statistical results to their own results schema, and why woudln’t they have the right to call themselves the ‘results schema’? But since WebAPI was there first, it claimed that prize :slight_smile:

If the ‘latter’ is the results schema: it is completely driven by the needs of WebAPI to store the results of the analytical queries invoked by WebAPI. I would have preferred a schema that accommodates different participants in writing to this schema, but the way it has evolved is based on the functional requirements of WebAPI. For example: cohort feature extraction was something new that we wanted to generate via WebAPI, so in order to persist those results, we created cohort_feature_* tables in the results schema to support that capability.

@Chris_Knoll:

Hm. I am not worried about that. If an application wants to do some private thing - fine. But two or more analytics applications accessing data and cohorts in the same way makes a lot of sense to me. It will support a whole software ecosystem. At IQVIA (IMS), we’ve had that situation that there is more than one application for building cohorts, but they can’t exchange them or write them to the same repository.

WebAPI: Is there a DDL table which defines the schema with documentation?

t