Jackalope
Hello! Our team at Sciforce had developed a new tool, that allows to create users to create their own standard concepts in SNOMED hierarchy using SNOMED CT Compositional Grammar, without breaking compatibility for network studies. The tool is meant to be applied at ETL stage to increase mapping coverage for important concepts that do not have a precise counterpart – and to guarantee internal compatibility of custom Standard concepts through automated evaluation against SNOMED hierarchy. You can not use these concepts to define concept sets, but they will be placed “correctly” in the CONCEPT_ANCESTOR table, meaning you can define concept sets as usual, and have these new concepts included for you.
Jackalope was pre-presented on OHDSI APAC and Health Data Interests WG community call, as well as a poster presentation at OHDSI '22 Symposium.
The tool is obviously open-source. We are looking to make Jackalope if not standard, than usual and accepted solution for OMOP CDM ETL, and we look for community input.
How to participate:
The most important step for development of Jackalope now is to live-test it with actual data. This would achieve the following:
- You get better coverage of mapping of your source data to OMOP CDM.
- We get to improve Jackalope, bringing it closer to real-world application use case.
- Entire OHDSI community benefits from the introduction of a brand new approach to Standardization of data.
If you are not sure if ETL approach of Jackalope is applicable to your data, you can help us a lot by providing any number of examples of concepts that you could not map to OMOP CDM, and had to deal with loss of precision, or making other concessions.
Of course, contributions in form of convention suggestions or commits to Jackalope source code are also welcome.
If you want a better idea of how is everything implemented, unfold the collapsed text below. Also, community calls & symposium links contain a lot of information.
FAQ is displayed on click.
Q: Why not just map everything to existing Standard concepts?A: Mapping to Standard concepts is a reliable and a correct way to represent source data in OMOP CDM – in absolute majority of cases. That is why these concepts are Standard. However, in some cases (e.g. new technologies, domain-specific terminology or local specifics), mapping to existing Standard concepts may lead to a loss of precision.
Q: Which concepts can be standardized through Jackalope?
A: There are two answers to this question: a mathematically correct one and a honest one. In mathematically correct and practically useless sense, every imaginable thing can be post-coordinated through arbitrary number of SNOMED sub-expressions, which can be all reliably evaluated through Jackalope to obtain a consistent result. Realistically, you would use Jackalope to obtain relatively simple refined versions of already existing concepts from SNOMED subhierarchies (e.g. combining of diabetes mellitus with a specific complication or creating a missing medical imaging procedure by combining a specific modality, topography and other details). Full potential of the Jackalope lies somewhere in-between.
Q: How does it work, exactly?
A: SNOMED Expressions are evaluated against SNOMED RF2 source, trying to find if not exact matches, then a set of close semantic parents. Unless exact match is found, new Standard concept is then created (with VOCABULARY_ID of Jackalope and DOMAIN_ID inherited from parents) and placed in the existing SNOMED Vocabulary hierarchy. The expression itself is preserved as an entry in CONCEPT_SYNONYM table for both source and Jackalope concept, for future re-evaluations.
Q: Can we use post-coordination, multiple "Maps to" or manually curated local standard concepts instead?
A: Yes, but Jackalope has multiple benefits compared to this approach, that mainly have to do with compatibility.
- Any relationship built from SNOMED expressions can be re-evaluated every SNOMED release. Jackalope will even find an exact Standard match, if it gets added.
- Expression, written once, will not break "silently" like a partially deprecated multiple mapping would.
- Synonymous expressions will get coordinated to the same entity and even get assigned the same CONCEPT_CODE, trivializing deduplication between different OMOP CDM instances.
- SNOMED's compositional grammar has a maintained standard spec, and FACT_RELATIONSHIP or manual standard concepts are unmaintainable "crutches".
Q: How to write expressions for Jackalope to evaluate?
A: Full disclosure: it is neither common knowledge, nor an easily automatable task. There is a guide on this released and maintained by SNOMED authoring organization (link hub), and there are commercial tools for authoring, and even published research into using NLP for automation of the process. We look into developing similar toolset as a part of Jackalope, but it is lower on priority list. For our testing, we wrote them fully manually. It may take up to 15 minutes to write an expression for a concept for a person familiar with SNOMED CT internals.
Q: What are specifications for Jackalope implementation?
A: We tried our best to separate interface from implementation. We have separate documentation describing Jackalope as ETL process (what should be a shape of an input, expected output and process rules), and we would love to host them on the official OHDSI resources.
Q: How does deduplication work?
A: SNOMED internal logic allows to establish semantic isomorphism of differently phrased expressions through the concept of canonized normal form. Textual representation of this form is ran through BLAKE-2b hashing algorithm to obtain a 25-byte length hexadecimal value, that is then placed in CONCEPT_CODE field. Any two concepts that are generated with the same CONCEPT_CODE are synonyms with the probability of 1-16^(-50).
Tagging people who shown interest or participated at various stages of development:
@Polina_Talapova @MPhilofsky @mvanzandt @Christian_Reich @cgchute @Agota_Meszaros @mikecjohn @willhalfpenny @Alexdavv @mari.kolesnyk