OHDSI Home | Forums | Wiki | Github

Mapping One-to-many ICD-10 to SNOMED Codes

Hi all,

We are mapping from non-standard ICD-10 to standard SNOMED for research purposes within the NHS. We understand that where an ICD-10 code maps to many SNOMED codes we should map to many rows and populate n rows in our database with the potential options. As researchers cannot rely on these SNOMED codes (because we cannot specify which best describes the ICD-10 in question) it seems this approach expands the number of rows in the DB without adding value for the researchers.

Has anyone taken an alternative approach to this? Is there any scope for the standard OHDSI approach to change so that we can keep in line with OHDSI standards without bloating the DB unnecessarily?


ICD-10 to SNOMED mappings are accurate. The researchers need to define the disease of interest using standard concept_ids. The OHDSI community has a lot of resources to help researchers. I would suggest pointing them to the Phenotype Working Group.

Do you have a few examples of 1:M mappings which don’t “add value”?

Not that I know. Eliminating rows would cause information loss and this is one of the biggest concerns for a health researcher when source data are mapped to the OMOP CDM.

Well, if a better approach is found and OHDSI makes it standard, then it would be inline with OHDSI standards. Do you have a suggestion?

As a FYI:
SNOMED is the chosen vocabulary due to its ability to be flexible with post-coordination; broad and granular coverage of conditions; and applicability throughout our world wide network of researchers.

The family of ICD’s is not flexible as a whole. Most ICD code systems are not broad or granular. And only the WHO ICD code system has coverage throughout the world. Many countries have country specific ICD code systems.

In order for US researchers to collaborate with our colleagues in the UK using ICD, we would both have to map all data to ICD-10 as the standard. The US uses the ICD10CM code system which has ~100,000 codes and ICD-10 has ~16,000. That’s a lot of information loss.

Like @MPhilofsky said, SNOMED is a standard for OMOP for a reason. There are multiple benefits to using ontological systems, including multiaxial hierarchies, arbitrary granularity and semantic attribute-value system. OMOP CDM has use-cases in network studies and concatenating data stored in databases in multiple formats. If the research is limited to occurrences of specific ICD10 codes, rather than phenotype-based cohort definitions, it is likely that OMOP CDM is not a very useful tool for this use-case.

Hello, I’m James Cockayne, I’m the lead developer for our migration tool (GitHub - answerdigital/oxford-omop-data-mapper). I work with @Joe_Asher.

I just wanted to chime in and potentially clarify Joe’s original question a little.

We as a team value SNOMED and we think it is the correct clinical coding system, we want to use it.

Our query orientates around the more fundamental database design choice of inserting a record many times while enumerating each related item in the one to many relationship.

We think this could cause issues because

  • A researcher would have no means to group the rows into a single event, if they wanted to count events for example
  • If we needed to add row to table, that has a foreign key to table that has an “exploded” ICD10 code (many rows per event), we would have no clear row identity to refer to when forming a foreign key. (or would these records get inserted many times too?)
  • The SNOMED codes that are inserted would be formed using a “snapshot” of the relationships between the ICD10 and SNOMED. If these relationships are improved the rows would need to be deleted and inserted again before the mappings could be available to researchers.
  • I’m unsure if this could happen, but if a table happens to have more than one concept field that had to be mapped from a one to many code system (eg ICD10/OPCS4) then a user would need to insert a record for every permutation of SNOMED codes.

A proposal I would make to avoid all of these problems would be to record the concept as the origin code (eg ICD10) and then utilise the built in concept_relationship table to find these records with the proper SNOMED codes. This could be in the form of a database view, or as an extension to the query.

For example, the ICD10 code M13.86 Other specified arthritis, lower leg has two SNOMED codes 128137003 Disorder of lower leg and 3723001 Arthritis. If we wanted to search for the SNOMED code Disorder of lower leg we could use the following query

from cdm.condition_occurrence co
	inner join cdm.concept_relationship cr
		on co.condition_concept_id = cr.concept_id_1
	inner join cdm.concept c
		on cr.concept_id_2 = c.concept_id
where c.concept_code = 128137003 -- Disorder of lower leg
	and cr.invalid_reason is null;

Let me know what you think to this idea.



All good points. But still:

Yes. By day. You already have that in your data. Not all conditions etc. are in pre-coordinated concepts, and the patient has records of, say, diabetes, insomnia and ingrown toenail at the same day, rather than a “diabetes-insomnia-ingrown toenail” combination ICD10. The “count” doesn’t make any medical/scientific sense anyway. There is no such a thing as one condition. It’s a complex world of pathogenetic events, complications and symptoms.

That is not OMOP. Why would you need a row number? Is there a meaning for such number?

That is correct. That is why most people don’t do incremental ETLs, but do them in big bulks. The Vocabularies shift. But they do that anyway, one-to-many relationships or not. What used to be a valid concept a year ago might no longer be one today.

No such thing in OMOP. Every table has exactly one concept_id field for the standard concepts.

That is already there! You have the source_concept_id, where you store the incoming ICD10. And your researchers, who are addicted to them, can find them there. Wouldn’t be standard OMOP, but avoids having lengthy debates with your internal customers.

Hi @Christian_Reich,

Thanks for the clarifications. Yes this all sounds very reasonable with the current schema.