OHDSI Home | Forums | Wiki | Github

Disentangling celebral infarctions


(Patrick Ryan) #1

In the 06082018 vocabulary, I see several ICD9 and ICD10 source codes for concepts related to: “Persistent migraine aura with cerebral infarction”, which have two mappings to SNOMED standard concepts: 1) persistent migraine aura, and 2) cerebral infarction. On the surface, that seems reasonable to me based on the words alone. http://www.ohdsi.org/web/atlas/#/search/Persistent%20migraine%20aura%20with%20cerebral%20infarction

The issue I’m running into is for my analytical use case, I want to identify ischemic strokes, which would not include these migraine codes but would include other types of cerebral infarctions. Any suggestions for how to manage this situation (without resorting to source code selection)?


(Eldar Allakhverdiiev) #2

Hi @Patrick_Ryan,
I see no way to capture this in Concept sets.
Because of mapping to 2 concept_id’s there will be 2 records occurring the same day (and I assume with the same visit_occurrence_id - but it’s the question to ETL process of your data sets).
Based on this you may use additional criteria: cerebral infarction with 0 occurrences
of persistent migraine aura the same day (or within the same visit)


(Chris Knoll) #3

I think it’s a safe assumption hat if one source code maps to two occurrence records, they will be using the same visit occurrence that the source record will be based on…but, it’s always worth confirming it with your ETL logic.


(Christian Reich) #4

Why don’t you want the migraine one, @Patrick_Ryan ? It’s an infarction.


(Patrick Ryan) #5

It’s a fair question that I’m wrestling with. My intention is to find ischemic stroke. My clinical colleagues inform me that, while ‘persistent migraine with cerebral infarction’ may involve some vasoconstriction that could potentially lead to a stroke, it on its own is not a stroke in the more contemporary sense. The ‘other’ cerebral infarctions (including descendents) are consistent with clinical expectations and also inline with prior literature ‘validating’ the ischemic stroke definition. But to reconstruct the literature-based ICD9 definition, I would have to resort to a source-code-specific conceptset, because the migraine codes map to ‘cerebral infarction’ but I don’t want them for my particular use case.


(Seng Chan You) #6

I cannot think of a solution to your problem, @Patrick_Ryan

Overall, the accuracy identifying stroke in administrative data is not so good. But the accuracy can be improved by adding ‘inpatient /ED diagnosis only’ criteria. How about adding this criteria?
( https://www.ncbi.nlm.nih.gov/pubmed/27426016
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3412674/ )

Usually, if the migraine is only related with vasoconstriction, not caused by actual stroke, the patient would not be admitted.


(Christian Reich) #7

And why is excluding the migraine not a solution, as @Eldar suggests?


(Seng Chan You) #8

@Christian_Reich,Excluding the migraine can be a solution.
But if one wants to write a paper, excluding migraine in stroke patients seems a little bit weird to reviewers or other readers.

As I said, I don’t have a good solution for this… And This is why phenotyping is becoming more important in OHDSI.


(Michael Kahn) #9

Patrick’s conundrum of using SNOMED mappings from ICD9/10 criteria is one that has been raised before. Many of our investigators come to us with a prior sentinel publication that has very specific ICD codes that our investigators want to match EXACTLY so that their study can be linked to the published definitions. As we know, mappings between ICD and SNOMED is not one-to-one. That is, “round trip” mappings from ICD -> SNOMED -> ICD does not always return the original set of ICD codes. In the past, others (unnamed!) on this thread have argued that the additional ICD codes are often also relevant and accepted by investigators or have very low counts or really represent local coding practices rather than being clinically meaningful. But our investigators push back on the addition of ANY additional ICD codes that are not present in the sentinel publication. This is when we drop back to using source_concept_ids to assuage the investigator’s need to be able to say that their cohort exactly matches the previous publication.


(Patrick Ryan) #10

Hahaha @mgkahn, yes, I certainly subscribe to using standards, and I recognize the consequences of that position. The legacy of using ICD9 codelists that were previously published as justification for future work is an inertia that we as a community will continue to have to push hard to overcome. My real question here is really more clinically motivated: if there are differences in types of cerebral infarctions (e.g. some are infarcts that are part of migraines, and some are infarcts that are full-blown strokes), then can or should we have a way to differentiate these types?

As a more specific illustration of the dilemma, I can currently select the descendants of ‘cerebral infarction’ and that finds me concepts with greater specificity (and in ICD9 world, the ‘stroke-related infarctions’ all map to these more specific concepts, so if all I cared about was replicating a ICD9-based publication, I can do it without issue). But it is possible that source data may have something that does not provide added specificity (e.g. in ICD10 world, there’s a nonbillable code of I63, which is just ‘cerebral infarction’), and depending on the circumstance, I may want these non-specific data to be included. Perhaps the answer is: ‘all non-specific cerebral infarctions should to be considered equivalently’, in which case the mapping question becomes: ‘is the cerebral infarction listed in the migraine source codes REALLY a cerebral infarction?’


(Gowtham Rao) #11

Transient ischemic attack (TIA) vs stroke. The later has infarction, former has ischemia.


(Christian Reich) #13

@qiongwang: Want to help?


(Seng Chan You) #14

I realize how annoying this situation is.

I found no one has persistent migraine aura with cerebral infarction (ICD10 G43.6x) in Korean NHIS-NSC database.

@Patrick_Ryan, @Christian_Reich, Could you count how many people have this diagnosis code in your databases?
(ICD-9CM: 346.6x, ICD-10CM: G43.6x)

After reviewing bunch of papers, I concluded that 433.x1, 434.x1 in ICD-9 CM and I63x in ICD-10 CM are the most validated diagnosis code in previous papers.
In OMOP, descendants from concept_id 443454, 4043731 can include all maps to OMOP concept_id for these ICD codes.
However, as @Patrick_Ryan said, this include unwanted ‘maps from’ ICD code such as G43.6x, I97.81x, G46.5, G46.6, G46.7, I97.8x, 346.6x, and 997.02
I hope to exclude ‘G43.6x’ and ‘346.6x’ because this code has never been validated in the previous studies.

So we have three options:

  1. We can validate concept id of 443454, 4043731 in OHDSI (Recently, @jswerdel proposed PheValuator. And I can validate this by using Ajou university DB) - the best option
  2. We can make a stroke cohort definition with excluding terms for same-day migraine
  3. We can count how many people actually have these diagnoses in multiple databases. If no database has this condition, then I would be relieved.

(Patrick Ryan) #15

Hi @SCYou, just so its documented:

The options that are currently available to the OHDSI community for creating cohort definitions are:

  1. Create a cohort definition that uses a conceptset that contains standard concepts. Pros: it will be a cohort definition that can run across the OHDSI network, independent of the source coding scheme. Cons: the standard concepts may have mappings from certain source codes that others may prefer not to see in their phenotype definition. In this example, when defining ‘cerebral infarction’, one group could reasonably argue that they should include ICD10 G43.6x ‘persistent migraine aura with cerebral infraction’ (because it explicitly suggests the presence of cerebral infarction), while other researchers could reasonably argue that this isn’t the same clinical construct of cerebral infarction that they are looking for.

  2. Create a cohort definition that uses a conceptset that contains source codes. Pros: you can tailor your list of codes to whatever level of precision you want (e.g. cherry picking the ICD9 and ICD10 codes without regard to how they map up to SNOMED). Cons: this cohort definition will only be applicable to databases that use the same source codes.

My general preference is to support global research across the diverse community of researchers and databases throughout OHDSI, so I advocate for #1 whenever possible, but there are certainly instances when #2 is necessary and ‘good enough’ if you know you are only doing your study in your own data. In either case, both of these alternatives are fully supported in the OMOP CDM and also fully supported using ATLAS as OHDSI’s standard platform for defining and instantiating cohorts.

Now, to the specific question of ‘what did I do for stroke when we were designing the protocol for LEGEND?’, I did the same investigation tat you did @SCYou, I empirically evaluated how often the questionable ICD9/10 codes for ‘migraine with cerebral infarction’ actually arose across my databases. And it wasn’t never, but it was very uncommon…less than 1% of the total number of strokes across all the databases. The other thing I noticed is, in a good chunk of the cases, a person who had the ‘migraine with cerebral infraction’ code also had a ‘cerebral infarction’ code, meaning they’d be picked up in our phenotype definition whether or not we included that code. So, in the interest of enabling global research, I went with approach #1, keeping in this code that some would argue to keep in and some would argue to keep out, recognizing that it actually doesn’t have any practical impact one way or the other.

In general, I think the approach we should be taking is not to subjectively argue for or against a given code, but using the data to determine whether that code makes a difference in the prevalence and composition of a given phenotype. And I agree we should be trying to more systematically validating our phenotypes. One of the compelling aspects of @jswerdel’s PheValuator approach is that it provides an objective basis to compare the operating characteristics of alternative definitions. So, from this example, if we were worried about the impact of inclusion/exclusion of the ‘migraine with cerebral infarction’ code, we could create 2 phenotypes and then run PheValuator to see the impact on sensitivity/specificity/positive predictive value. And since the indepedent prevalence of the code in question is so low, we’d see that it doesn’t make a difference and so probably not where we should be spending our time wringing our hands.


(George Hripcsak) #16

See also this paper on the effect of vocabulary mapping:

On 9 phonotypes, if you were thoughtful, the error in the cohort was only 0.0026% maximum due to the mapping. Part of the good performance was redundancy like Patrick points out (patients have several different codes). So it points to going for Patrick’s option #1.

The truth is that the other measurement error far outweighs the mapping errors, so the time should be spent using alternate coding sources to verify the diagnosis if that is important rather than perfecting the codes.


(Seng Chan You) #17

I appreciate your great contribution for validating OHDSI system again, @Patrick_Ryan @hripcsa
I discussed about this with @schuemie yesterday.

As @Patrick_Ryan suggested, I will generate various cardiovascular cohort with diverse specifiers (+/- inpatient visit +/- primary diagnosis +/- imaging study +/- including or excluding certain conditions) and then evaluate the PPV by manual chart review.

I’d be happy if other institution can join this work.

And then, the sensitivity and specificity can be validated again by @jswerdel 's PheVulator.


(qiongwang) #18

(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)


t