OMOP version 6 - put on ice?

roger.carlson · April 28, 2023, 2:58pm

7 makes more sense.

Just don’t name it “Vista”

MPhilofsky · April 28, 2023, 3:05pm

I have fond memories of Vista from my youth

Adam_Black · April 28, 2023, 4:32pm

Agreed! Semantic versioning for the win. Thanks for the definition of “breaking change”. It’s nice to have clear, easy to find definitions.

Windows has had a few miss fires. It happens

I don’t really understand the distinction between a physical and logical model in the context of versioning.

Mark · April 28, 2023, 8:19pm

Non-technical answer:
Think of how much of the logical model we use is unfinished, and each DBMS has specific implementation details for said model, i.e. physical. The logic of how the data is organized(logical) is the same across all(in theory) DBMS but not always the implementation(physical).

cce · April 28, 2023, 6:36pm

For v7, we should fix the GENDER column. It’s not gender, it’s biological SEX (male, female, intersex).

Adam_Black · April 28, 2023, 9:38pm

I think the process for making proposals is to simply open an issue on the CommonDataModel github page and use [Proposal] in the name. I added @cce’s proposal here.

Also thanks for the explanation @Mark. I think that makes sense.

Sanjay_Udoshi · April 29, 2023, 2:55am

Mark · May 1, 2023, 1:24pm

98 SE was the last good OS in Microsoft’s line. Vista was an upgrade from XP.

First time anyone on this forum has made me feel old. Is there an OMOP code for that?

rookie_crewkie · May 1, 2023, 2:19pm

No, but there is one that describes the OS itself.

MPhilofsky · May 1, 2023, 2:33pm

Very well played, @rookie_crewkie

clairblacketer · May 1, 2023, 5:20pm

We try stick to semantic versioning but, as Christian says, the model is not a physical piece of software. It is a logical system for how observational data should be organized. The renaming of a column is technically a breaking change but at the time we developed v5.4, the ADMISSION_SOURCE_CONCEPT_ID was not being used across the community at large so we felt comfortable not labelling this as a breaking change. CONDITION_STATUS_CONCEPT_ID was introduced in v5.2 and the addition of that column was not a breaking change in its initial inception. The vocabulary movement from type concepts to status concepts took place over the course of a few years. I would argue that this was a joint vocabulary change + CDM change which is not well covered by the semantic versioning of just one of those items.

We in the CDM WG have worked hard to update our processes to make sure we are both as transparent as possible with the affected groups and as stringent as possible when deciding on which proposals become part of a new version. For CDM v5.4 we completely overhauled our way of doing things by actively reaching out to every group we thought would be affected and getting their sign off. It has been stable for a few years now and I personally think it should remain the current stable version for at least another year or two. We are still moving forward in our working group by hearing proposals but we have made the approval process more robust by requiring use cases and full testing in multiple environments and with real data.

What is being underestimated here is toll that new versions have on the community. I am uncomfortable talking about v7 or anything to that effect yet because we do not understand fully which versions the community are actively using. Moving to a major version release is now a huge lift not only from the CDM perspective (as it takes at least a year to draft and pull together proposals) but it is a huge lift for many of our collaborators as well as our developers.

I do not think we are in a good position to begin moving to a new version of the CDM until we have a better handle and decision on backwards compatibility, which versions of the CDM will continue to be supported by the tools, how long developers have to support a new version, etc. There is also a lot of work being done to update the vocabulary and I feel that updating the CDM at the same time would cause too much confusion and upheaval.

I agree that v6.0 should be handled differently in the documentation and that change will be made this week. I also agree that we can and should continue to hear proposals on additions/changes/updates to the model but I do not think we are in a position to being labelling a new version until some other decisions have been made.

Chris_Knoll · May 1, 2023, 5:38pm

Thanks, Clair. Very interesting read.

I’d like to request that the CDM team get uncomfortable with that reasoning. The semantic versioning doesn’t (by design) account for the use of a piece of software in a broader community. SemVer is only concerned with functionality between versions (logical or physical): Query written for 5.1 should return same result as 5.2 because semver says it should. That is not guaranteed between 5.1 and 6.0.

We shouldn’t be afraid to bump a major version. What we should be afraid of is backwards compatibility concerns, which is almost a ‘transient fear’ of bumping major versions. As much as a concern it is to tell the community ‘Here comes 7.0, get ready for some serious changes!’, it is much worse to say 'Here comes 5.4 with braking changes that violates SemVer principles."

Adam_Black · May 1, 2023, 10:57pm

I agree with Chris. Also I don’t understand why you cannot use semantic versioning with a logical system. I’m not following the “logical vs physical” system thread.

Christian_Reich · May 2, 2023, 1:40pm

@Adam_Black:

Logical: establishes the structure of data elements and the relationships among them. Physical: It actually implements them. The difference is easy to spot. In the model, we have the “data types” integer, varchar(), date and datetime. They are placeholders, intentions. In fact, as far as I can tell there is no one SQL dialect that supports exactly these data types. And that is the whole point. We want to declare what we want to store and how. The DDLs establish these physically with a different set of instructions for each SQL flavor.

Would be nice, but unfortunately this is not an unambiguous rule. A change could be breaking in one database flavor, and fine in another. A good example is date and datetime. In Oracle, they are the same thing. The whole 6.0 situation would not have happened there.

The other specialty we have is the Vocabularies: Some changes are in sync with the CDM, others are independent. Each release is actually a breaking change, since the queries come back differently - by design. If you consider CDM and Vocabulary as a unit we nominally keep breaking things all the time. And yet the system is still working.

Bottom line: We already said we support the Semantic Versioning, but in spirit. Therefore, when the time comes and we accumulated enough collective desire and energy to go to another version we will discuss what to call it without ideological fervor. If you look at the industry, they have done exactly that: Pentium, Vista, Snow Leopard.

Adam_Black · May 2, 2023, 2:13pm

Thanks for the explanation @Christian_Reich. What you describe reminds me of the situation we have in OHDSI-SQL. OHDSI-SQL is kind of like the logical query interface while the rendered and translated SQL is the implementation (physical).

Even with this distinction it seems like changing a column name in the logical model (e.g. person_id → personid) should be considered a “breaking change” and be released in a major version though.

Would you say it is possible to classify proposals as “breaking change” or “non-breaking change” or does this classification not even make sense?

Chris_Knoll · May 2, 2023, 2:35pm

In either case, semver is about functionality between releases. The different handling of datatypes is a bit of a red-herring. Because the way we want to support 7+ database dialects, we have to adopt a one-size-fits-all approach to get the spec to work on all platforms. This has nothing to do with semver. When you change the capabilities of a version, you have to decide does it prevent existing functions from working (breaking change/major version) or does current functionality remain and the change is just new functionality (non-breaking/minor release).

It is a non-ambiguous rule: what I describe as a 5.1 → 5.2 behavior vs. 5.1 → 6.0 behavior is exactly what semver means. I reiterate that the CDM team should not have a soft-stance on this idea. You can make whatever change you want, but we’re asking that you be honest and faithful about describing it as a minor or major change. The different data handling on Oracle is a side-issue, not related to functionality between versions. What you’re describing is ‘platform incompatibility’ which we should strive to avoid when defining behavior/capability of the CDM.

When thinking about behavior and semver, you could look at vocabulary changes in the same way: Adding new concepts between releases is a minor change. Removing or marking expired concepts is breaking. CONCEPT_ANCESTOR that returns more concepts for a parent is minor. CONCEPT_ANCESTOR that removes concepts is major. Adding CONCEPT_RELATIONSHIP is minor, removing is major.

We could declare that it is a major change if we return different (ie: added) children from a concept ancestor. Depends on what you consider ‘new’ behavior or ‘changed’ behavior. I think that if you add a column to a table it’s not breaking, then adding morer concepts to descendants is also not changing…but I could see it either way.

With all due respect, @Christian_Reich , Semantic Versioning is either followed or it is not. If we don’t follow it, then it’s better not to pretend that it is followed and instead adopt a version-labeling scheme based on dates or timestamps. But, good luck trying to get tools to support behavior changes based on dates instead of discreet versions.

But we could treat CDM versions separately from Vocabulary versions, but we’ve already run into several challenges related to Vocabulary version changes. We can’t look at one release date of a Vocab and compare it to a different release date and understand anything about the functional changes. So, we have 2 options: trust the vocabulary is compatible for the given study context (risky, and probably something you don’t want to do for safety studies) or force the same vocabulary version across data sources (hard to do in a network). But, from a software perspective, we never put anything in code that would depend on a specific version of the vocabulary (and this has led to issues in the drill down reports in atlas where the concept_hierarchy table no longer builds correctly).

Mark · May 2, 2023, 5:35pm

Hard to do when certain big players refuse to play by standards… and I don’t mean Oracle.

Then there is AOU, which tends to mix and match vocabulary and CDM versions; I am not debating your argument, just pointing out the complexities.

Chris_Knoll · May 2, 2023, 8:13pm

Not familiar with that one, and google wasn’t able to help.

Mark · May 2, 2023, 8:50pm

All of Us Research Program, put on by the NIH. Several here are tied to the program, in one form or another.

Edit: put in links

Pulver · May 4, 2023, 9:38pm

Beyond it not being a currently supported production version, I am unclear on the current status of v6.0. Is consideration being given to resuming its development, applying to it some improvements made to v5.4, perhaps in a parallel fork? Is there a consensus on roughly when a decision will be made? Or is the route by which decisions/deadlocks should be settled, itself a contentious matter?

My group, which has been using v6.0 for several years, might offer suggestions for changes, but if development of 6.0 is frozen, such contribution appear to lack a path for meaningful consideration. As we are building a large federally-funded publicly available database, we prefer to stay as close as possible to the “standard”.

Is the consensus that we erred in adopting 6.0 before it was fully blessed and penance in the form of conversion to 5.4 is requisite for communion?