Methodology for Converting Source-To-Concept-Map to Concept/Concept_Relationship (2 billionaires)

roger.carlson · February 27, 2024, 4:05pm

One of the OKRs for the Healthcare System Interest Group (HSIG) is to create a methodology for moving the data in the SOURCE_TO_CONCEPT_MAP (STCM) table into the CONCEPT and CONCEPT_RELATIONSHIPS (C/CR) tables in the 2 billionaire range.

The STCM was originally created for OMOP v4 for the purposes of allowing the mapping of local codes to OMOP Standard Concepts. With the advent of OMOP v5, however, that functionality was transferred to the C/CR tables. However, for backward compatibility reasons, the STCM was not deprecated. The idea was (presumably) that folks would gradually migrate to the newer method.

Unfortunately, the STCM still remains in wide use for several reasons:

The Book of OHDSI recommends its use.
It is relatively simple to implement.
USAGI (a standard tool that helps with mapping) works with the STCM format.
Local mapping can be maintained by a separate team unfamiliar with OMOP.
There is no standard or recommended way to maintain the C/CR method.
Moving from the STCM to C/CR can involve a lot of ETL code modification.

Also unfortunately, there are several reasons to stop using SCTM:

Codes mapped in SCTM are not visible in ATLAS (OHDSI’s cohort creation tool) and other standard tools.
SCTM is not flexible enough to map the more subtle relationships available with C/CR like “Maps To Value”.
Hierarchies are not supported using STCM.
As an organization’s mapping becomes more robust, the SCTM will remain limited.

Additional questions:

What are the costs, benefits, and ROI (return on investment) with this move?
What is the recommended course for organizations new to OMOP to implement local codes? SCTM or C/CR?
How can organizations using STCM best convert their process to C/CR?
Is a hybridized method possible?
Should the STCM table be deprecated in future major OMOP releases (>6.0)?
Whatever happened to the Wide Mapping Table?

The HSIG would like feedback, stories, problems, solutions, opinions, etc. around this issue.

Thanks.
@MPhilofsky @Eduard_Korchmar @Daniel_Smith @Yacob_Tsegay_Gebrete @roger.carlson

Pertinent links:

Daniel_Smith · February 27, 2024, 9:12pm

Cross posting this issue, pertinent to the adoption of a C/CR process in light of coordination with custom concepts across several institutions:

Creating a registry of custom concept_ids (2-Billionaire Club) to avoid collisions across networks - Vocabulary Users - OHDSI Forums

aostropolets · February 28, 2024, 4:20pm

Melanie presented a poster back in 2020 on this exact topic and I feel it answers all of your questions: https://www.ohdsi.org/wp-content/uploads/2020/10/Melanie-Philofsky-Philofsky-Mapping-Source-Codes-Poster.pdf

TLDR: if you can do C/CR do it (you’ll need to learn some vocab rules so that your instance doesn’t violate basic conformance checks like having ‘Maps to’ only to standard concepts).

Eduard_Korchmar · February 28, 2024, 5:11pm

While Melanie’s paper a useful basis for creation of STCM to C/CR conversion model, the actual process has never been formally described. And QA SQL script, although, once again, very useful, is not a good basis to base the logic on:

Nor design decisions, nor implementation is explicitly documented;
Uses concept_*_stage data model, which is not a part of OMOP CDM, and just used by Vocabulary Team as implementation detail for authoring.
Does not cover modern cases – like “Maps to” and “Maps to value” targets domain consistency (“Meas Value” for “Measurement”)

What was discussed on HSIG call is a proposal to formally model the process of conversion, in documentation and/a as a conversion script. There is institutional inertia that keeps people using STCM – and if we want people to switch to C/CR, we need to answer some very crucial questions, not the least of which is:

The conversion is not a trivial task. I am familiar with OMOP, so I can imagine what a process would look like, but every OMOP instance has it’s own established solutions and conversion scripts, often oriented to target STCM to store and update mappings. Until we have a clear model in place, we can not convince people, how easy or difficult it is for any OMOPized dataset to make the jump to C/CR.

aostropolets · March 23, 2024, 2:39am

Absolutely agree, we do need a formal description/convention. @Eduard_Korchmar, given your expertise with the vocabularies and OMOP CDM, would you like to initiate and lead such an endeavor? That would be much appreciated. Or, if somebody already volunteered, it would be good to know their name to send help and questions their way.

Eduard_Korchmar · March 25, 2024, 1:43pm

I do intend to do that. As a part of a planned OHDSI Python package, I want to include a module for a framework for converting data between STCM and 2billion concept space; this will by necessity include writing design documentation for the reference implementation. Once this exists, both the documentation and the library can be iterated upon following results of pilot projects. I plan to present a development proposal for this on the upcoming Open-Source Workgroup call on April 19th.

I do not have a proper announcement to link until then, but there is a long form forum message:

roger.carlson · May 13, 2024, 6:05pm

I am also currently working on an approach. Mine is not so comprehensive as Eduard’s, which I’ve seen and is quite impressive.

It seems to me, however, that there is no single best approach because there are multiple use cases.

New to OMOP: Network Study
An institution may be looking at building an OMOP instance only to contribute to a specific network study. They may never load the data into the standard OMOP tool stack at all. The Source_To_Concept_Map (STCM) is by far the simplest and fastest method. It is also the method detailed in the Book of OHDSI and works with USAGI.
New to OMOP: Enterprise-Wide Use
An institution may be interested in OMOP for enterprise-wide use including implementing the full OMOP tool stack. These will want data to display in Atlas, but may also want to use STCM in the early phase development.
Expanding Single Purpose OMOP to Enterprise
An institution may have a limited OMOP instance with already build ETL that uses the STCM. They will need to use Atlas and the other OMOP tools, but re-writing all of their ETL may be a hurtle. These may want to leave their ETL in STCM, but also write 2billionaire values to Concept and Concept Relationship (C/CR). There may or may not be ROI on re-writing the ETL.
Common Shared Mapping
Institutions part of a network group may want to share common mappings, which can produce conflicts (Creating a registry of custom concept_ids (2-Billionaire Club) to avoid collisions across networks)
Jackalope+ Users
If an institution is using Jackalope+ for mapping their only choice of export format is C/CR.

There are no doubt others. The point is that using STCM may be all someone needs. Or STCM may be a good place to start and then convert to C/CR. Using both STCM and C/CR may make sense for some. And others may decide to go directly to C/CR.

roger.carlson · June 4, 2024, 3:53pm

So, when it comes down to the brass tacks of actually coding a solution, for the simple case of mapping local codes to standard codes, and ignoring reserved code blocks, I haven’t been able determine the following:

When a vocabulary/mapping is updated, is it permissible to delete >= 2Billion and re-assign new concept_ids

or

Do concept_ids need to be preserved and maintained?

The simplest method, by far, is to delete and re-write from either the USAGI csv files or STCM. Would this cause a problem with Atlas? I don’t think it would, but I’m not certain.

@Eduard_Korchmar @MPhilofsky @Daniel_Smith

Eduard_Korchmar · June 5, 2024, 9:05am

This completely depends on whether local 2billion concepts are explicitly used to define cohorts or not. If cohorts are defined by non-standard 2billion *_source_concept_id instead of their standard mapping targets, or if 2billion concepts themselves are made standard and used in concept sets, then it may be required to “lock” these concept_ids to preserve integrity of cohort definitions between versions.

MPhilofsky · June 7, 2024, 2:56pm

The best practice is to maintain the original 2billion concept_id when moving mappings from the STCM to the C/CR table. Since it may be hard to know all downstream uses of a concept_id, I would keep it the same.

If it was used in an Atlas concept set or cohort, it would break if you changed it.