OHDSI Home | Forums | Wiki | Github

SNOMED flavors: versioning and standardization in OHDSI vocabulary

In Vocab workgroup meeting at OHDSI EU 2022 right now. Per @mik 's suggestion, I’m posting a forum thread to initiate a discussion about how to handle SNOMED and what may be possible in the future.

Currently (@Dymshyts @Christian_Reich , correct me if i got it wrong), we ingest SNOMED-US, SNOMED-UK ,and SNOMED-International, and then we create one composite SNOMED version based on the de-duplicated union of those concepts. Since each of those 3 vocabularies have their own release cadence, this results in a challenge of when to create the SNOMED composite for use in OHDSI vocabulary. This challenge limits the number of SNOMED releases and also can produce apparent inconsistencies between versions.

A question raised during the workgroup discussion: could we treat the 3 vocabularies are separate vocabs, rather than create a composite? or could we decide to focus only on one SNOMED vocab (international?) to be our primary standard? or some other ideas of how to maintain SNOMED (and its associated source code vocabulary mapping) to minimize the delay new versions cause to the OHDSI integrated release?

Perhaps we could get perspectives from our friends at SNOMED to establish recommended best practice.


I for one love this idea! :smiling_face_with_three_hearts:
I think Jim Case (jca@snomed.org) is our man on this one. I believe he’s back from vacation the first week in July.

Seems like the international is the easiest option. But I also understand/get the value on national codes.

For a more informed decision, we can do a descriptive study first on how many country specific codes ever appear in data.

If we try to run a “portable study” (one R packege runs on multiple datasets) - it seem to call for “portable terminology”.

I agree, let’s see what our SNOMED colleagues suggest.

We could make the national codes non-standard and map them to the international SNOMED codes. But then those of us with EHR data would potentially have a lot of “jumps” from one terminology to another, increasing the risk of loss if the international SNOMED isn’t treated as a parent code system by SNOMED. We definitely need to find out SNOMEDs future roadmap for their vocabularies. Also, I remember discussion about extending SNOMED. Has the community moved forward with the SNOMED extension vocabulary?

The Vocabularies and their interactions with source vocabularies, mapping from non-standard to standards, and the way the Vocabulary is handled by the creator of the Vocabulary are all very complex. I’m interested to hear others opinions. Especially insight fro the Vocab team and the folks at SNOMED.

I would be extremely reluctant to make national concepts non-standard. We would lose a lot of valuable information if we were to map these to the nearest parent concept(s) in the international editional.

If we truly want to be able to perform analyses using interntional data sets, I suspect we will eventually need to incorporate national editions of all OMOP’d countries to create the ‘global’ superset of SNOMED.concepts.

Just my thoughts…

Not exactly (@zhuk @Eduard_Korchmar please join the discussion). This is what SNOMED does themselves. In the end, we get a fully controlled and de-duplicated set of concepts where the property of “version/extension” just tells us that the code is created to cover the local needs and not (yet) promoted for global use.

It could be a choice if we wanted to limit the granularity or size of the OMOP standard vocabulary.
There are not many duplicates between the SNOMED extensions (besides several flaws) because when one creates a concept within a local SNOMED extension, it’s all based on the relevant content of the International version of SNOMED. So each concept adds some more information. Why would we want to drop this information?

It doesn’t really limit the number of releases we can produce. All the extensions we use are being released at least twice a year (as the Internation SNOMED is). But it creates a natural time gap because the Extensions are based on the specific version of the Int SNOMED which has to be released first and then used as a basis for the Extensions. As for me, an additional delay of 2-4 months doesn’t sound too painful because SNOMED is almost never used as the “source” vocabulary. We basically don’t lose the codes in ETL this way. For the emerging things like COVID, we can support ETLs with OMOP Extension concepts.

At the same time, each SNOMED release triggers pretty much changes for the vocabulary mappings and ETL. From the perspective of the minor/major release concept, I’d even intentionally limit the number of SNOMED releases in OMOP to an annual release only.

Sounds like really a good option, and we already know some examples: READ vocabulary has too many mappings to the UK version of SNOMED so such datasets as CPRD will suffer much.

100% agree. Because the modeling rules for the content are the same (or almost the same), we basically get more well-controlled high-quality content, so why would we limit it to the International version of SNOMED?

and that is what we did, but now the earliest (International) is roughly one year old, before we make it available in Athena.

Are there any use cases when the newest SNOMED is needed?
Can we agree that the one year gap between SNOMED refresh and its availability in Athena is acceptable?

Oh I wish. Unfortunately, we do indeed get a mishmash from 3 editions, two of which (US and UK) are local editions, and are not obligated to cross-coordinate to eliminate mutual duplicates. Worse, they may not be based on the same version of International SNOMED – UK version only updates International base bi-weekly. This effort is on our part, here’s some code. There are also manual changes not reflected in the code.

I agree. Mixture of 3 different editions is unviable to be updated more often. Focusing on regularly updating just universal international edition, however…

I would push for a different approach. Ditch everything but SNOMED International. Local differences in each national edition are introduced usually to accomodate mapping targets for local terminologies. Such concepts are overly specific for general case, and are not expected to change hierarchy of International concepts.

Allow SNOMED OMOP extensions to be built locally, ingest differences in 2billion concept space and only allow International concepts in concept set definition at Atlas stage.

A toolset to build SNOMED vocabulary from any SNOMED RF2 source is more feasible to support than any combination of local editions – and we already have to do this, even if only vocabulary team uses this now. In current Vocabularies it is done for UK and US, but there are little technological differences for any other edition.

I agree. In the CPRD Aurum UK primary care database there are over 23K national SNOMED codes in use. These national codes account for 12% of all SNOMED codes in our local vocabulary, but around 44% of all clinical events in the database. A lot of input and review from UK NHS clinicians went into creating these national codes, and I believe attempts were made to use core concepts whenever possible. Where a local code is used there may well be a good reason for it. Mapping these to international (core) SNOMED concept would be a logistical challenge but perhaps also risks losing some important local context.
Retaining national codes at the expense of less frequent Vocabulary updates seems like a worthwhile tradeoff to me.

1 Like

@Alexdavv We already limit the granularity and size of the OMOP standard vocabularies. The singular ICD’s are more granular than the single SNOMEDs, but the one ICD to many SNOMEDs is a good, but not perfect, semantic match. The OHDSI community has produced reliable evidence for years using this method. What makes the mapping of UK or US SNOMED to International SNOMED not a good semantic match for the research? I’m not familiar with the UK version and have only dabbled in the US version.

Will our OHDSI resources for Vocabulary management be able to keep pace with the continued harmonization of the 3 SNOMEDs? What if more national SNOMEDs are added? Is our current system scalable? We need to know what is on the roadmap at SNOMED. Do they have plans for one “global” terminology? Or plans to map local editions to the international edition? Or do they plan to continue their current process? How do they suggest we go about harmonization for the OHDSI use case?

1 Like

Yes. Mental health is one area that I expect will be very volatile (in terms of content changes in SNOMED) for the next couple of years. I know there are other clinical domains in which substantial work is being done in SNOMED.

One year seems like a very long time when we consider what a foundational role high quality (and up-to-date) terminolgy plays in knowledge discovery.


The international edition is the the base content for all editions. The national editions just extend the international edition to allow for country-specific content. If a concept in one national edition is requested or needed by another country, the concept will be moved to the international edition.

So… there is no “mapping” from local editions to international… any harmonization I can imagine would just mean replacing the national edition concept with the international parent(s). We would lose specificity for any non-international concept.

Or am I misunderstanding what you mean @MPhilofsky by “mapping” to the international edition of SNOMED?


Hello @piper,

Yes, let me clarify. In my above post where I ask Alexander “What makes the mapping of UK or US SNOMED to International SNOMED not a good semantic match for the research?”, ‘mapping’ refers to the OHDSI Vocabulary team’s process of making codes from a vocabulary non-standard concept_ids and then mapping them to standard concept_id/s. There is loss of semantic meaning when mapping from one code system to another. I’m curious if this loss will negatively impact OHDSI’s mission to produce reliable evidence. The UK SNOMED codes/concept_ids will still be available in the source concept_id field of every clinical event table. This will allow the “off-label” use of the OMOP CDM for local/national use cases.

I have concerns about the feasibility of semantically harmonizing the hierarchies of the US, UK, International, and unknown number of future national SNOMED terminologies. Already this process takes a year as stated by Mik. Then Eduard points out the challenges associated with the process. And you further expand on SNOMED’s current terminology development which will only increase the size of SNOMED and the associated hierarchical mapping work. From a data management, terminology harmonization point of view, this doesn’t seem scalable for our open source community. I also wonder (and would love other’s input), how much should we as a community should do in regards to managing the SNOMED vocabulary and what should we ask SNOMED to do?

Note, I am not part of the Vocabulary team, nor do I have insight into OHDSI’s finances. So, I might be way off. Maybe we do have sufficient funds to expand the Vocabulary team’s resources :slight_smile:

Ah, you’re talking about mapping within the broader OHDSI vocabularies, not mapping from a national SNOMED edition to the international edition. That makes more sense.

Agree, the logistics are the hardest part of all of this.

I wonder if there’s a way to more tightly couple the OHDSI vocabulary maintenance workflows with some of the SDO’s (esp the big ones like SNOMED’s) workflows? For example, instead of loading and processing SNOMED releases as they’re released to the public, see if there’s a way to get more frequent updates to tackle content in more manageable bits?

Not sure what this would look like… but might be worth discussing with SNOMED folks?



I agree with what was said by many. There is no way we can second-guess relationships between the national SNOMEDs to the international ones. We need to rely on what SNOMED does, and they are doing this job. Even though not well synchronized with each other, and sometimes making somewhat arbitrary choices. But again, we are a canoe, we cannot play tugboat to the SNOMED supertanker.

The other question is whether the national SNOMED concepts will cease having vocabulary_id=SNOMED. If we split that up and create a mixed bag, like we do in RxNorm and RxNorm extension, anybody wishing to do so could always roll up local SNOMED concepts to the next available SNOMED international concepts and work with those. SNOMED international would stay SNOMED, and phenotypes would no longer bank of Condition concepts that really are local.


Stating the obvious here: The lack of synchronization we see between editions is an artifact of the file publication proces - not something inherent in the terminology. At any given point in time, the superset of concepts that make up the national and international editions are in sync.

The poblem is that we’re getting a different snapshot of the international edition in each national edition we ingest - the snapshot of the international edition at the point in time at which the national edition is released.

The ideal solution would be access to a set of files representing the global SNOMED vocabulary. I wonder - from a resource perspective - what it would take for SNOMED to be able to generate such files for analytics use cases like we have here?


A consequence of this approach is that countries who are defining and capturing more specific, granular clinical findings and conditions (thinking here about mental health in particular) will not be able to use these to define phenotypes.

What happens to the ability to do research on cutlure (or race- or ethnicity-) specific findings and conditions?

I’m reluctant to lose the kind of granularity we will lose if we were to treat concepts in national SNOMED editions as somehow less relevant than those in the international edition.


No, not at all. Here is the deal:

The hierarchy would stay the same. But instead of having only concepts with the vocabulary_id=SNOMED you would have SNOMED, SNOMED UK, SNOMED US, etc. The local SNOMEDs are descendants of the SNOMEDs, like now, but it will be transparent. If you want to create a phenotype that works globally you better use a SNOMED in your criterion. If you know what you are doing you can use the others, but you won’t by accident pick a UK SNOMED thinking you have a nice well defined concept, and then wonder why it doesn’t find anything in the US. That kind of thing.

But there is another issue: We might map ICD10CM (American) to a non-US SNOMED. Some folks might expect for that not to happen. Let’s see what we actually got.