OHDSI Home | Forums | Wiki | Github

Release v20240229 Issue

During our update to the latest version, we’ve noticed some significant changes. Variables from the condition domain are now in the observation domain, and standard concepts have been converted to non-standard. To consistently stay up to date, changes within the CDM and mappings are necessary.
Do you provide migration scripts and/or what is the suggested process for migrating existing databases to the new vocabulary version?

1 Like

Welcome to the community, Christian!

There have been quite a few changes indeed, as outlined in the release notes. You can also see the planned work for the next release in the roadmap.

Changes are to be expected. Source vocabularies are getting shifted and improvement work is being done (if you want to learn more I’d encourage you to sign up to the Vocab WG meetings, where we spend some of the time discussing future changes; you can also come there to discuss the questions like this in more details).

Migration would very much depend on how you set up your ETL, which makes producing standardized scripts hard. For example, did you use one common look-up table to then distribute the rows (events) to the corresponding OMOP CDM tables based on domains of the standard concepts or do you have those movements hardcoded? If the former you are in luck, if the latter you would need to modify the scripts according to the new domains.

Changes are to be expected. Source vocabularies are getting shifted and improvement work is being done

I understand that changes are inevitable and often beneficial. However, adopting semantic versioning for your releases could significantly enhance user experience. Semantic versioning provides clear guidelines on when and how changes might affect users, making it easier for them to adapt without surprises. This practice is widely recognized for its benefits in software development and user support, as outlined in an informative article on Medium about versioning principles (Major.Minor.Patch. An illustrated guide to semantic… | by omrilotan | Fiverr Tech | Medium). Incorporating such a system could mitigate potential issues related to adapting to new versions.

Migration would very much depend on how you set up your ETL […]

The requirement to adjust the ETL process with every new release is indeed challenging. While understanding the necessity of some changes, this process can become quite cumbersome over time. Have you considered the possibility of developing a migration script that automatically updates databases between versions? Such a script could offer a smoother transition by minimizing manual adjustments and reducing the risk of errors. This approach could significantly alleviate the burden on users during upgrades, ensuring consistency and efficiency in adapting to new releases.

Incorporating such a system could mitigate potential issues related to adapting to new versions.

For products that are developed linearly, Year.Month.Day is not different to Major.Minor.Patch. Libreoffice, for example, made the switch in this direction recently, and Ubuntu does it since 2004. And since Athena releases are now tied to calendar dates, it only makes sense to keep them calendar labeled. There are many important source vocabularies (like ICD family, RxNorm and SNOMED), that are updated regularly in a set calendar cycle, so having Athena release date on the label is important for users to know what state of included vocabularies to expect.

Labeling changes as breaking could be good, but it is often impossible to know what changes actually are breaking for users, as it is all very data-specific. Anything can be a breaking change for small subset of users, while being benign to most others. RxNorm source once silently moved ado-trastuzumab emtansine to be effectively a subtype of trastuzumab, and we just routinely pulled this change. But on a clinical level, two are actually considered very different drugs, and it broke then ongoing study.

The requirement to adjust the ETL process with every new release is indeed challenging. While understanding the necessity of some changes, this process can become quite cumbersome over time.

Up until this release, Athena vocabularies were following a rolling release model. Vocabulary team would update the most requested (or the most outdated) vocabulary, update it’s dependencies and publish the delta about once in two months. This meant that all changes above content level were by necessity small and atomic, and easy to adapt to – or it would, if deployed instances would indeed download every version and incorporate small changes on the fly. In reality, vocabularies were downloaded once a year, and small changes (documentation for which was spread over multiple pages of small release notes) accumulated, compounding to bigger problems. So people would often not update vocabularies at all.

In addition, we are moving away from thinking about individual vocabularies as individual atomic products that people may or may not need updated. Vocabularies gradually turn into a single comprehensive ontology, and should be updated, tested and shipped as a package.

Have you considered the possibility of developing a migration script that automatically updates databases between versions? Such a script could offer a smoother transition by minimizing manual adjustments and reducing the risk of errors.

For the rolling release model, this was never a consideration. But even now, this script would be very hard to design. Every OMOP CDM instance uses different technology stack, and different users may update from a different from current version of vocabularies – combinatoric explosion of just these two factors makes testing unfeasible.

In addition, updating just the existing OMOP-converted data may not even be enough: there is no canonical ETL pipeline that we could target and adjust with the script. Stacking patches on converted data is also not an option: we would only be able to target the converted data, with no reference to the source. For example, if SNOMED deprecates a single key concept as ambiguous, only the source data can help with disambiguating from replacement options. So people would still need to adjust their ETL pipelines according to release notes, and we are back at square one.

t