We need your help, especially folks involved in building tools and methods. I am tagging @Chris_Knoll, @anthonysena, @Frank, @gregk, @pavgra, @schuemie, but there are a whole lot more folks who should have an opinion:
With CDM V6 we have a proposal to fix one of the shortcomings and turn it into a full information model. This requires that it can do self-reference to any CDM structure dynamically in the data. In particular, we want to replace the rather unspecific Domains with concepts for all table and fields. As a consequence, we can link records between any tables together, like in FACT_RELATIONSHIP or in the upcoming EPISODE table. Episodes will be like Cohorts and Eras, but have Fact Relationship-type links to the events that belong to it. These links will work through the combination of a table_concept_id and and event_id (the foreign key in that table). Alternatively, it could be a field_concept_id and event_id.
But there is another use case we can cover: Tracking CDM versions for OHDSI tools. An application like ATLAS, or any of the methods, is built with a certain version in mind. When we have the METADATA table properly implemented it could check for the version number and refuse to start up in the wrong one. However, that may not be smart, since older and newer versions might be compatible enough for what the tool is trying to do. But how does the tool know when it can safely use the data?
Of course, the developers could now go through all the DDLs and figure out and test and test, but that is not realistic. Here is a potential solution:
- Each table and each concept is represented by a concept as proposed. When the table or field definition changes a new concept gets made.
- There is another concept for each CDM version. All table and field concepts that are active for that version have a record in CONCEPT_RELATIONSHIP linking them to the CDM concept (relationship_id = ‘belongs to’ or something like that). If a field concept doesn’t change between versions it will have a link to both.
So, when a tool gets released, it should capture all concepts for the tables and fields it touches. Let’s say person.person_id, person.year_of_birth, condition_occurrence.condition_concept_id and drug_exposure.drug_concept_id. To check whether it’s safe to run on a given CDM, all it has to do is to pull the concepts linked to the CDM version concept mentioned in METADATA and check whether those 4 are all in that list. If yes, the CDM is compatible and the tool can launch. If not, it’s not safe.
Does that make sense?
If yes, we need answers to the following questions:
What constitutes “change of a table” and “change of a field”? Is going from integer to big_int a change? Nullable to not nullable? Change of field name?
Do we even need a table concept? Aren’t we always referring to a certain field?
Which database flavor should we use? Or should we have table and field concepts for each flavor, and their life cycle could be different?