Translating source codes to standard concept_ids results in Conditions NOT of interest

MPhilofsky · February 8, 2021, 6:42pm

While completing a comparison between cohort #1 pulled directly from our source using ICD9CM/ICD10CM codes and cohort #2 from the CDM using standard concept_ids, we came across a large difference in the number of Persons between the two cohorts.

Since we were replicating a study from the EHR, we used the standard concept_ids for the ICD9CM/ICD10CM codes. One of the inclusion criteria is the Person must not be pregnant. The list of codes to use to define a pregnancy Condition includes ‘O29.021’,
Pressure collapse of lung due to anesthesia during pregnancy, first trimester. This code maps to 3 standard concept_ids including Atelectasis, which is not a disease found only in pregnancy. By including it in our inclusion criteria, Persons not pregnant, the cohort’s definition was changed. And we eliminated many Persons from cohort #2 by the addition of the Atelectasis concept_id.

When a researcher wants to use ICD9CM/ICD10CM to define their concept set, should we query the table/domain of the standard concept_id, but pull data based on the source_concept_id? What do others do? Are there any other source vocabularies which map a source code to > 1 standard concept_id and changes the definition of the cohort?

For Atlas use, I will direct researchers to define ICD9CM/ICD10CM as source concept_ids and not translate to standard concept_ids when doing non-network research.

QI_omop · February 9, 2021, 2:30am

This happens quite a bit and created a significant problem for my analysis. I have reported it here. The solution I was told is to always start from SNOMED and not ICDs. But that is hard to convince my client to go that route. Another solution is SNOMED extension. I am still waiting for that one.

Vojtech_Huser · February 9, 2021, 9:40pm

I just replied to the forum post you a linking. I agree that mapping updates could be a problem. If OMOP claims superiority to other models because of it, it hurts the OMOP model reputation if it is has few mapping bugs here and there.

Dymshyts · February 16, 2021, 9:21am

Well, it wasn’t a bug, but a wrong usage of a mapping.
If A maps to B,
A maps to C
then it means A maps to B and C,
but in the case above it’s treated as A maps to (B or C).

yeah, we need to invest time in making 1 to 1 mappings, for this purpose we need to create our own target terminology (SNOMED Extension, or we can name it differently as the new terminology is not obliged to follow SNOMED rules strictly.

mgkahn · February 21, 2021, 5:03pm

Posting by @Dymshyts above took awhile to sink in. Interesting implications on writing queries that require translation from a source code with dreaded 1:N mappings.

Assume:
ICD1 --> SC1 (Standard Concept 1)
ICD2 --> SC2 & SC3 (a 1:N map)
ICD3 --> SC4

Researcher wants cohort with any one of {ICD1, ICD2, ICD3} because that’s what they know or that’s what has been published elsewhere that they need to match (I understand that the preference is to get the researcher to pick the desired standard SNOMED code(s) directly to prevent mapping).

Based on @Dymshyts post, the correct query based on above mappings should be:

SC1 OR (SC2 AND SC3) OR SC4

[[Interesting side conversation is if (SC2 and SC3) must be tied to the same visit_occurrence]]

If this is the correct translation for 1:N mappings in queries, do any of the OHDSI tools perform this logic?

Mark_Danese · February 21, 2021, 7:42pm

You also need to make sure that no other unwanted ICD codes are mapped to SC1, SC2, SC3, and SC4.

Mark_Danese · February 21, 2021, 8:28pm

I should also mention that, after a lot of internal discussion on this issue several years ago, we decided it was easier to query the source codes. The only wrinkle is that ICD2 (1:N map) will give two records in the condition occurrence table, so one needs to be careful not to count the source record twice if the count is important. Typically, we ignore multiple records for the same code on the same day, so it isn’t much of an issue in the real world. Obviously, this is not the way everything is set up with OHDSI tools, so it won’t necessarily work for everyone.

mgkahn · February 21, 2021, 8:31pm

Yes – you are describing the N:1 challenge. I had a longer version of my post that included one of those examples but I left it off to keep my post focused only on the 1:N challenge.

And also yes to using source_concept_id queries. What we miss by doing this is the use of concept hierarchies, which only exist for standard concepts.

Chris_Knoll · February 22, 2021, 2:06am

Nested logic in cohort definition criteria is how you would do (A OR (B AND C) OR D), but written as: (A OR D OR (B AND C)).

Chris_Knoll · February 22, 2021, 2:07am

Yes, another feature in the Atlas UI for cohort definitions is you can now count records on distinct start date so that will also avoid the issue of double counting same-day records. You can also count distinct visits.