All,
Inspired by and some issues with BI/Analysis tools using cohort table and the new cost table proposal…
I’m not trying to comment specifically on the cost approach, but rather OHDSI standards for managing relationships within the CDM. My concern is that the “connect to any identifier” approach (cohort, fact_relationship, proposed cost). This approach allows for flexible connectivity between rows and easier ETL, but:
- reduces referential integrity in the CDM
- leads to more complex SQL to support variable joins
- makes use of some BI / analytics tools more difficult
- requires analysts to understand more of the inner workings of the CDM
- removes support for leveraging Object Relational Mapping tools for presentation of data to users
Cohort:
I can load a big data set and create condition-based Cohorts of persons. When I connect to the DB using Tableau (like our customer), it leverages foreign keys to automatically represent the relationships between data, which is great. However, this tool feature is pointless for Cohort where there can be no specified foreign key relationship. This requires the end user to define the mapping, which is unreliable given that subject_id can be any identifier and the neither cohort not cohort_definition definitively define a descriminator domain_id.
Our solution was to restrict cohorts to represent groups of persons and implement an additional foreign key for this relationship. I’m glad to see the CIRCE also appears to support the person centric view of cohorts.
Questions:
- Can we add a domain_id onto Cohort for a future version of the CDM?
- Have we considered array data types to store lists of IDs?
Fact relationship:
We decided on skipping this table because we cannot build a logical model to support ORM-based web services using it.
Bill