OHDSI Home | Forums | Wiki | Github

'Maps to' origin and quality metadata

The ‘Maps to’ relationship is a very convenient and important relationship that is always used during ETL. I was wondering though, is there some metadata on these mappings? e.g. the origin of the mapping and the correctness of the mapping (do the two concepts have the exact same meaning?).

The background for my question is this; besides mapping quantities mapped/unmapped, we would also like to capture mapping quality. Many mappings exist, but are not perfect. Even though 100% of the source concepts are mapped, it does not mean that it is a lossless translation. Therefore it would be interesting to output a statistic like ‘74% are mapped to an exact match’, ‘20% are mapped to a higher hierarchy’ and ‘6% are mapped to a lower hierarchy’.

Thanks!

3 Likes

@MaximMoinat Thanks for bringing up this important idea! Did you come up with a convention for representing this? Can you describe the use cases where you are leveraging or would like to leverage this metadata? I’ll bring this topic to the attention of @Ajit_Londhe and the rest of the Metadata and Annotations WG. This is a basic issue and one that a straightforward solution should be able to address.

1 Like

Friends:

We have been thinking about this for a long time in the vocab group. The origins are of three kinds:

  • “Stolen” from the source as is
  • Inferred, often cobbled together from a chain (e.g. a VA Class is connected to a VA Product which is connected to a NDC which is connected to a RxNorm)
  • Built de-novo.

We could make these more transparent, but then what’s the use case? It won’t help you to decide on the quality of the mapping.

That is easier said than done. For example, diabetes in SNOMED does not contain its complications (diabetic nephropathy, diabetic retinopathy). But in ICD it does. Now what? They have exactly the same description, and they mean the exact same disease. Or not? We are working on a publication delineating these, and @schuemie will give me the heat for it still not being ready, but we will.

@Christian_Reich I don’t mean to raise issues that are settled to best extent they can be. My curiosity may be satisfied by the paper once it comes out.

Until then, can you help me understand why the basic distinction between a Dx with complications and Dx without complications lacks practical consequences?
Imagine that I am a clinician researcher who wants to do a study on the diabetics with one of the complication subtypes you mention. And I know that my source data includes ICD codes that get me both. The process for defining my cohort using my local OMOP instance looses that distinction by mapping to SNOMED.

Isn’t the use case that I need to be advised to include concepts for the complication apart from the SNOMED code? I might not think to do this automatically because I’m a clinician who know the data is there in the Dx code in my local data source. Capturing this loss of information so it can be surfaced to guide users to do appropriate cohort definition seems like an important use case.

@Andrew:

You nailed it. This is a problem, and right now we have no solution other than for you to check the hierarchy and to make sure you get the complications added. Of course, most researchers outside the OMOP system don’t use hierarchies at all, they do lengthy code lists.

We are working on a solution called “SNOMED Extension”. Essentially build concepts that mimic the source (ICD in this case), but behave like SNOMED and has all the right attributes and hierarchical relationships. And we are talking to snomed.org for them to actually incorporate it properly.

Let’s see.

1 Like

@Andrew: This is a good example of a common misperception about ‘information loss’ that comes from vocabulary mapping. @hripcsa’s recent JAMIA paper nicely summarizes this issue and demonstrates that its not nearly the problem people worry it is.

To the specific example in diabetes: let me illustrate the idea with ICD-9-CM 250.12 - “Diabetes with ketoacidosis, type II or unspecified type, uncontrolled”, which currently maps into 2 SNOMED standard concepts: 40482801 - Type II diabetes mellitus uncontrolled, and 443734 - Ketoacidosis in type 2 diabetes mellitus. Here, there is no information loss but the ICD9 concept has been disentangled into its two components. To your question, there could certainly be relevant clinical use cases for when a clinician only wants to study ‘patients with T2DM’ and other relevant clinical use cases when a clinician only wants to study ‘patients with diabetic ketoacidosis’. Both of these use cases are fully supported, whether you conduct your analysis off your raw data and create ICD9 codelists or whether you use the CDM and create standard conceptsets.

Now another example: ICD-9-CM 362.0 - Diabetic retinopathy. This only maps into SNOMED to one standard concept: 4174977 - Diabetic retinopathy. Again, no information loss. But importantly, if I’m a clinician interested in the first use case, finding ‘patients with T2DM’, then I should consider whether a patient with a diabetic complication is sufficient evidence that the patient has diabetes. That’s just a phenotype evaluation problem, and is no different if you did a source analysis defining diabetes as ‘250x’ (which would miss some complications) or if you create a standard conceptset of ‘Diabetes mellitus’ and all descendants (which in SNOMED, will not ‘roll up’ all associated complications).

@Patrick_Ryan I didn’t mean the change in granularity of the Dx would not be remediable. So my use of the term “loss of information” was ill advised - only meant to refer to the amount of info contained in the code the clinician-researcher is used to dealing with.

To rephrase my point in terms of your clarification - there is value in capturing the process of disentangling concepts in the course of mapping so users can be informed of relevant specificity needs when defining cohorts.

Well, our ambition is to solve this problem, even though it is less pronounced as @Patrick_Ryan explained. By solve I mean create a system where the semantic meaning is preserved and does not change through harmonization and mapping. It is an interesting and important task, because no matter how much you explain and put into metadata most folks are not going to dig deep into this, and just assume. We will make it work. :slight_smile:

I want to support Maxim’s proposal to capture exact-matchness. (setting broad-to-narrow to a side for the sake of a simple proposal).

In light of use of RWD for serious decisions, it is crucial that we have metadata for a relationship.
Consider this ICD9Proc to SNOMED mapping http://athena.ohdsi.org/search-terms/terms/2003964

I would like to extend Maxim’s proposal with the following:

  • Add new convention that talks specifically about mapping relationships and their functional role in ETL. https://github.com/OHDSI/CommonDataModel/wiki/RELATIONSHIP

    • relationship_id=‘xxxx’ (non-standard to standard, maps to, other?) is used for data transformation of source data to standard concepts (or where is it formally described in the specs)
  • Add new convention 6 here: https://github.com/OHDSI/CommonDataModel/wiki/CONCEPT_RELATIONSHIP

    • For subset of relationships in CONCEPT_RELATIONSHIP (those that are (in RELATIONSHIP table conventions defined as supporting source to target data transformation) the exact match nature of the mapping is clearly documented (and let’s talk about how; new column, extra table or FACT_RELATIONSHIP, local-only column)

@Vojtech_Huser:

Couple points:

  1. The Vocabulary isn’t metadata, it’s reference data. I.e. it’s completely independent from the actual clinical data.
  2. As said before: We want to clean this up. We don’t need any extra fields or relationships or anything. “Maps to” should mean semantic equivalence, and if we need to go uphill we create an extended concept so the “uphillness” becomes part of the ordinary hierarchy.
  3. This is not a problem of how to represent or “capture” this information, it is about having this information. Because it is not trivial, as described above.

Will roll out our plans in the WG soon.

1 Like
t