One of the OKRs for the Healthcare System Interest Group (HSIG) is to create a methodology for moving the data in the SOURCE_TO_CONCEPT_MAP (STCM) table into the CONCEPT and CONCEPT_RELATIONSHIPS (C/CR) tables in the 2 billionaire range.
The STCM was originally created for OMOP v4 for the purposes of allowing the mapping of local codes to OMOP Standard Concepts. With the advent of OMOP v5, however, that functionality was transferred to the C/CR tables. However, for backward compatibility reasons, the STCM was not deprecated. The idea was (presumably) that folks would gradually migrate to the newer method.
Unfortunately, the STCM still remains in wide use for several reasons:
TLDR: if you can do C/CR do it (youāll need to learn some vocab rules so that your instance doesnāt violate basic conformance checks like having āMaps toā only to standard concepts).
While Melanieās paper a useful basis for creation of STCM to C/CR conversion model, the actual process has never been formally described. And QA SQL script, although, once again, very useful, is not a good basis to base the logic on:
Nor design decisions, nor implementation is explicitly documented;
Uses concept_*_stage data model, which is not a part of OMOP CDM, and just used by Vocabulary Team as implementation detail for authoring.
Does not cover modern cases ā like āMaps toā and āMaps to valueā targets domain consistency (āMeas Valueā for āMeasurementā)
What was discussed on HSIG call is a proposal to formally model the process of conversion, in documentation and/a as a conversion script. There is institutional inertia that keeps people using STCM ā and if we want people to switch to C/CR, we need to answer some very crucial questions, not the least of which is:
The conversion is not a trivial task. I am familiar with OMOP, so I can imagine what a process would look like, but every OMOP instance has itās own established solutions and conversion scripts, often oriented to target STCM to store and update mappings. Until we have a clear model in place, we can not convince people, how easy or difficult it is for any OMOPized dataset to make the jump to C/CR.
Absolutely agree, we do need a formal description/convention. @Eduard_Korchmar, given your expertise with the vocabularies and OMOP CDM, would you like to initiate and lead such an endeavor? That would be much appreciated. Or, if somebody already volunteered, it would be good to know their name to send help and questions their way.
I do intend to do that. As a part of a planned OHDSI Python package, I want to include a module for a framework for converting data between STCM and 2billion concept space; this will by necessity include writing design documentation for the reference implementation. Once this exists, both the documentation and the library can be iterated upon following results of pilot projects. I plan to present a development proposal for this on the upcoming Open-Source Workgroup call on April 19th.
I do not have a proper announcement to link until then, but there is a long form forum message:
I am also currently working on an approach. Mine is not so comprehensive as Eduardās, which Iāve seen and is quite impressive.
It seems to me, however, that there is no single best approach because there are multiple use cases.
New to OMOP: Network Study
An institution may be looking at building an OMOP instance only to contribute to a specific network study. They may never load the data into the standard OMOP tool stack at all. The Source_To_Concept_Map (STCM) is by far the simplest and fastest method. It is also the method detailed in the Book of OHDSI and works with USAGI.
New to OMOP: Enterprise-Wide Use
An institution may be interested in OMOP for enterprise-wide use including implementing the full OMOP tool stack. These will want data to display in Atlas, but may also want to use STCM in the early phase development.
Expanding Single Purpose OMOP to Enterprise
An institution may have a limited OMOP instance with already build ETL that uses the STCM. They will need to use Atlas and the other OMOP tools, but re-writing all of their ETL may be a hurtle. These may want to leave their ETL in STCM, but also write 2billionaire values to Concept and Concept Relationship (C/CR). There may or may not be ROI on re-writing the ETL.
Jackalope+ Users
If an institution is using Jackalope+ for mapping their only choice of export format is C/CR.
There are no doubt others. The point is that using STCM may be all someone needs. Or STCM may be a good place to start and then convert to C/CR. Using both STCM and C/CR may make sense for some. And others may decide to go directly to C/CR.
So, when it comes down to the brass tacks of actually coding a solution, for the simple case of mapping local codes to standard codes, and ignoring reserved code blocks, I havenāt been able determine the following:
When a vocabulary/mapping is updated, is it permissible to delete >= 2Billion and re-assign new concept_ids
or
Do concept_ids need to be preserved and maintained?
The simplest method, by far, is to delete and re-write from either the USAGI csv files or STCM. Would this cause a problem with Atlas? I donāt think it would, but Iām not certain.
This completely depends on whether local 2billion concepts are explicitly used to define cohorts or not. If cohorts are defined by non-standard 2billion *_source_concept_id instead of their standard mapping targets, or if 2billion concepts themselves are made standard and used in concept sets, then it may be required to ālockā these concept_ids to preserve integrity of cohort definitions between versions.
The best practice is to maintain the original 2billion concept_id when moving mappings from the STCM to the C/CR table. Since it may be hard to know all downstream uses of a concept_id, I would keep it the same.
If it was used in an Atlas concept set or cohort, it would break if you changed it.