OMOP version 6 - put on ice?

gregk · April 26, 2023, 1:30pm

Friends,

Quite often, the question is being raised by the customers on why they are not being upgraded to OMOP version 6. I can also see that this same question is being raised quite often by many OHDSI newcomers. A lot of companies without previous OMOP knowledge and background make an honest mistake and convert their data to v6 instead of supported v5.4 which then results in additional effort of downgrading.

Question - why do we keep confusing people/community and ourselves? Why can’t we recognize that the OMOP v6 - at least in it’s current design (famous mandatory datetime stamp that no one really needs/likes - is not really adoptable and officially put it on ice? Great experiment and grand ideas but didn’t work. Sure, many of these ideas can be brought into CDM in a slightly another form, many actually already in v.5.4

Can we please consider removing OMOP CDM v6 pages from our WIKI to stop confusing people who still think that OMOP CDM v6 is about to be released (5 years after it was actually released)? Or move it to another location aka OHDSI museum :).

Your thoughts on this are welcome

Greg

@clairblacketer @Christian_Reich @Patrick_Ryan @aostropolets @Andrew

Mark · April 26, 2023, 1:37pm

Yes please, I went down this road a couple-three years ago when there was at least active talk about it on the forum.

Adam_Black · April 26, 2023, 8:22pm

I think this is why we need a Technology Advisory Board composed of all the various OHDSI stakeholders who can help guide the community. I’ve been personally confused about whether I should support CDM v6 in the tools I work on. I would love a clear decision one way or the other. It’s been >4 years since v6 was introduced and I think it’s fair for the community to expect a clear path forward from the OHDSI leadership.

For new collaborators to OHDSI, please transform your data to CDM v5.4 until such time that the v6 series of the CDM is ready for mainstream use.
reference

This implies that v6 will eventually be supported and implicitly encourages tool developers to implement for v6. Clarity would be really helpful here.

Does anyone disagree that OMOP v6 should be scrapped? If there’s a debate to be had then let’s have it! I can certainly see the issue with casting all datetimes to dates in our tools and analysis scripts. (Dates are the most common level of granularity that I’ve seen).

@Paul_Nagy @lee_evans

Adam_Black · April 26, 2023, 9:15pm

Changes in v6.0 already implemented in v5.4

Latitude and Longitude added to LOCATION
The name of ADMISSION_SOURCE_CONCEPT_ID was changed to ADMITTED_FROM_CONCEPT_ID in VISIT_OCCURRENCE and VISIT_DETAIL

Non-breaking changes in v6.0 to consider implementing in v5.5

Add Contract owner field to PAYER_PLAN_PERIOD
Change All primary keys to bigint
Make Concept_Ids mandatory using 0 if there is no value. The only exceptions would be for UNIT_CONCEPT_ID, VALUE_AS_CONCEPT_ID, and OPERATOR_CONCEPT_ID.
Make DATETIME fields mandatory

Breaking changes in v6.0 to consider implementing in v5.5 (or v7)

Remove DEATH table and add DEATH_DATETIME field added to the PERSON table. Store cause of death in the CONDITION_OCCURRENCE
Make DATE fields optional

I would suggest taking the following two changes off the table as it seems like a lot of work for little benefit.

Remove DEATH table and add DEATH_DATETIME field added to the PERSON table. Store cause of death in the CONDITION_OCCURRENCE
Make DATE fields optional (note we can treat making datetimes mandatory and making dates optional as two separate proposals)

I think that would leave just four changes to debate if I’m not mistaken:

Add contract owner concept fields to PAYER_PLAN_PERIOD
Change all primary keys to BIGINT
Make nearly all concept IDs mandatory using 0 if there is no value.
Make all DATETIME fields mandatory

(edit: updated breaking vs non-breaking distinction. All changes break ETL code. A “breaking change” is one the would cause analytic code like Atlas, or Hades to fail.)

Chris_Knoll · April 27, 2023, 5:35am

I second the idea of reversing the decision to remote Death. Death is a clinical context that can have special fields (such as a reference to the condition_occurrence table for the cause, and anything else we want to attribute to a Death event).

richmedlin · April 27, 2023, 9:03am

Same. I was under the impression it was ‘coming soon’

Adam_Black · April 27, 2023, 10:56am

Now that I think about it the “breaking change” vs “non-breaking change” label probably requires context (i.e. what exactly is “breaking”). I don’t think any of the four changes above would break analytic code (e.g. Atlas, Hades, DQD). They would however require updates to ETL code but so would pretty much any change to the CDM specification. Is that right?

@gregk, @Chris_Knoll would you say these four changes are “non-breaking”?

Christian_Reich · April 27, 2023, 11:26am

Friends:

The top paragraph in OMOP CDM v6.0 is indeed written quite cautiously with regard to the future of this version, as was appropriate at the time when we decided to put it on ice. This should change. In my opinion, V6.0 will never come back, and instead the next major version should be 6.1. For the simple reason that we had to back out of the datetime solution for good.

You could ask why that was even put forward if it is such a bad idea, but the answer is easy. There were folks with a use case (PLE and PLP for acute interventions, like for fast acting substances e.g. in the ER, or for surgeries). It was debated quite a while and nobody disliked it. The opposition only woke up afterwards, when folks realized that the datetime thing changes almost EVERY script plus it makes time calculations ultra slow. In addition, nobody felt the urgency to adopt it, despite the use cases. Really nobody. So, we decided to abandon the version. It was done very publicly at the OHDSI Symposium. Was a painful experience with me having to eat tons of oversweet cake.

In order to avoid such a debacle ever again, we now have a conservative stance: We need strong use cases to put any significant (= not backwards compatible) change forward, and we actively run these changes by the communities and Work Groups mostly affected by them (Atlas, Hades and the like), asking “you may not have noticed, but we are planning this change, is that ok?”

So: For making a new version we needn’t regurgitate the old changes to V6.0. We know what they are, and besides, there are tons more proposals, What we need are use cases and people clamoring for them, because they cannot do their job with the current version.

If anybody has those please step forward.

Christian_Reich · April 27, 2023, 11:41am

Whether or not it is backward compatible, or if an analytical method or application continues to work after the change. Adding a field or changing int to bigint is non-breaking, replacing date by datetime is breaking. ETLs don’t count, they almost always have to change with a CDM change.

gregk · April 27, 2023, 12:43pm

Guys, I am glad so many people replied.

So, going back to my original question / proposal to remove the OMOP CDM v6 pages from our WIKI to stop confusing people into thinking 'it is coming soon". Would you support that?

Then, I can see another active debate on what needs to be carried over into the next version and what needs to be scrapped from v6 - great discussion, probably a good one for the OMOP WG to pick up on? Once we have an officially released new version - we can then publish it back into WIKI

clairblacketer · April 27, 2023, 1:02pm

So we tried to make is clear that v6.0 is not currently viable but I guess the language on the website and github was not strong enough. The latest release is listed as v5.4 in the repo and there is a large note on the website saying that v6.0 is not supported by the tools.

We also developed scripts to help people who started in v6.0 to move to v5.4. Perhaps there is a better place for it instead of where the documentation currently sits but I would like to keep it somewhere as reference because certain elements of it are quite useful. The COST table, for example, is much better in v6.0 than v5.4 and some groups create an amalgam using that table from v6 but everything else from v5.4.

I agree with Christian that v6.0 will not be used but we have not come to a consensus that the v6 series will be scrapped completely. I also don’t think we are ready to begin thinking about the next CDM version. Like Christian said we are deliberately moving slowly on introducing anything new to the model because the community has grown so large and any change creates long term ripple effects so we are putting any suggestions thorough their paces to make sure of the use cases and proper design.

My suggestion for now is that we move the v6.0 documentation elsewhere on the website so that it does not sit next to the other viable versions. We have some time in our agenda next Tuesday where we can bring this up and we will go over what we want to do with v6.0 in the future and how we want to handle it.

Adam_Black · April 27, 2023, 1:30pm

All good comments. One small question: Do we use semantic versioning for the CDM?

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes

MINOR version when you add functionality in a backwards compatible manner

PATCH version when you make backwards compatible bug fixes

Christian_Reich · April 27, 2023, 5:17pm

That’s how it works, @Adam_Black. Except the model is logical, not physical, so you cannot do this exactly.

Adam_Black · April 27, 2023, 6:33pm

So I’m thinking the next major version should be 7.0 then.
Because if 6.0 requires datetimes but 6.1 does not require datetimes that would result in a major breaking change from 6.0 to 6.1

Gowtham_Rao · April 27, 2023, 7:22pm

7 makes more sense. We have skipped a number before i think

Christian_Reich · April 27, 2023, 7:30pm

We skipped only version 1 (there was one, but it was never published). But please come to the CDM meeting and we can discuss it.

Adam_Black · April 27, 2023, 9:08pm

I wish I could make the CDM meeting but I usually have conflicts. I’m curious how many CDM datasets out in the wild are in currently in v6. I’m sure there are some organizations that have mapped their data to v6.

Christian_Reich · April 27, 2023, 9:50pm

My hunch is zero or close to zero. The tools don’t work.

Chris_Knoll · April 28, 2023, 12:49am

3 Is breaking that joins that used to return no row (because joins on null columns fail) would now return records with unknown concept.

is breaking in that making non mandatory fields mandatory will break because exiting process that populates the tables will break because they may not be setting those column values and it will break.

Here’s my definition of braeking:

When a former input to a process results in either a failure or a different result then it is a breaking change. An exception to this is a bugfix, but it is a bug because the original result was not the correct one. but, in fact, there are examples where some things that are ‘buggy’ (ie: not standards compliant) becomes a depended-on feature, and in that case the bugfix is breaking.

Chris_Knoll · April 28, 2023, 1:27am

Not exactly. 5.4 had several breaking changes. I’ll give a couple of examples, but not to criticize, but to illustrate and also to propose approaches to ensure backwards compatibility.

Renamed column: ADMISSION_SOURCE_CONCEPT_ID was changed to ADMITTED_FROM_CONCEPT_ID. Renaming a column amounts to removing a column and adding a column. And, obviously, removing a column is a breaking change. The non-breaking approach is to put both columns in the table, and indicate the one that will be removed in the next MAJOR version.
Added condition_status column: By itself, that’s not a problem,but they removed concepts from being assigned to condition_type and put those into condition_status, thus breaking existing cohort definitions that may have been coded to look for those condition_status concepts in the condition_type column. This is an example where a logical change does break compatability and should be made in MAJOR version changes only. The non-breaking approach to this is to keep both concept types in condition_type, put the new status-sspecific concepts in condition_status, and indicate the logical change that will be made in the next MAJOR version.

i’m hoping that I can convince everyone that we should be taking major, minor, hotfix version changes extremely seriously, and be fully aware of when it has happened, and how to avoid it in the future.