OHDSI Home | Forums | Wiki | Github

Phenotype Submission - Transverse myelitis

Clinical Description

Authoritative Source:

https://www.ncbi.nlm.nih.gov/books/NBK559302/

Abstracted from authoritative source:

Overview:

  • rapid onset weakness, sensory deficits, and bowel/bladder dysfunction

  • 50% of patients are complete paraplegic with virtually all of the patients having a degree of bladder/bowel dysfunction

  • Infections leading to TM include, but are not limited to, enteroviruses, West Nile virus, herpes viruses, HIV, human T-cell leukemia virus type 1 (HTLV-1), Zika virus, neuroborreliosis (Lyme), Mycoplasma, and Treponema pallidum.

  • Urinary retention may be the first sign of myelitis

  • There is a bimodal peak between ages 10 to 19 and ages 30 to 39.

  • The incidence of transverse myelitis is approximately 1 to 8 new cases per 1 million people per year.

Presentation:

  • The onset of transverse myelitis is acute to subacute.
  • Neurologic symptoms are prominent. Symptoms include motor, sensory, and/or autonomic dysfunction.

Assessment:
To diagnose transverse myelitis, a compressive cord lesion must be excluded first. Exclusion is usually performed by magnetic resonance imaging (MRI). This is followed by a confirmation of inflammation either by a gadolinium-enhanced MRI or lumbar puncture (LP).

Plan:
The standard of care and the first-line therapy for the treatment of transverse myelitis is intravenous glucocorticoids. High-dose intravenous glucocorticoids should be initiated as soon as possible. There should not be a delay in treatment while waiting for test results. There are few contraindications to glucocorticoid therapy. Potential regimens would include methylprednisolone or dexamethasone for 3 to 5 days.

Prognosis:
Most patients with idiopathic transverse myelitis should at least have a partial recovery. This recovery should begin within 1 to 3 months and should continue to progress with exercise and rehabilitation therapy. Recovery may take years, and some degree of persistent debilitation may exist.

Differential diagnosis:
A differential diagnosis for Transverse myelitis should include any diseases causing myelopathy. Such examples would include compressive myelopathy from herniated discs, vertebral body compression fractures, epidural abscesses/masses, and spondylitis.

Regular Expression:
weak|deficit|dysfunction|parapleg|enteroviru|nile|hiv|htlv|zika|neuroborreli|lyme|mycoplasma|treponema|compressive|mri|magnetic|CT.(brain|head)|MRI.(head|brain)|gadolinium|lumbar|glucocorticoid|predniso|dexametha


Phenotype Development:

Logic description: events with a diagnosis of transverse myelitis indexed on diagnosis of transverse myelitis, related spinal disease or symptoms of transverse myelitis, followed by a diagnosis of transverse myelitis within 30 days. Events have a 365 days washout period. The events persist for 1 day. Symptoms of Transverse Myelitis included asthenia, muscle weakness, myelitis, paresthesia.

Cohort Submission:

  • This cohort definition has cohort id # 63 in OHDSI Phenotype library (pending peer review).

Phenotype Evaluation:

Insights from Cohort Diagnostics

  • Was performed on 10 data sources, available on https://data.ohdsi.org/PhenotypeLibrary/ see cohort id C63: [P] Transverse myelitis (or symptoms with transverse myelitis) 365dWO (1Ps, 0Era), with the largest data sources having counts ~ 10,000 persons.
  • Incidence rate: as expected incidence is higher in 20 to 50 years range. The rate was about 0.001 to 0.01 per 1,000 per year, but is still higher than expected range of 1 to 8 new cases per 1 million - suggesting specificity error. Another observation is an increase in 2015/2016 compared to previous years in most US data sources. This also suggests that the coding change in USA in 2015 (ICD9CM to ICD10CM had an impact on persons being identified with this condition).
  • Index event breakdown: Some data sources appear to index on SNOMED ‘Transverse myelopathy syndrome’ while other data sources index on SNOMED acute transverse myelitis. The most common icd9cm vocabulary was 341.20 Acute (transverse) myelitis NOS, 341.22 Idiopathic transverse myelitis; and icd10cm was Acute transverse myelitis in demyelinating disease of central nervous system, G37.3. Note the change in semantic meaning from ICD9CM to ICD10CM, specifically the emphasis on ’ in demyelinating disease of central nervous system’ suggesting co-existence of demyelinating diseases like Multiple Sclerosis. e.g., Acute transverse myelitis (concept ID 139803) was the primary concept in the US and JMDC databases. In the US databases there were a significant number of index events with the concept Idiopathic transverse myelitis (concept ID 134330). In CPRD all subjects were included with an index event of Transverse myelopathy syndrome (concept ID 443904).
  • Visit context: majority of persons (>80%) appear to be in outpatient setting. This suggests that there are large number of persons who have the code while being in an outpatient. This is surprising considering the seriousness of the presenting symptoms of rapid onset generalized weakness. I would have expected higher urgent, emergency room or inpatient stays.
  • Characterization: About 10 to 20% persons have multiple sclerosis. About 10% have muscle weakness. More than 20% had a MRI and at least 10% had CT. The presence of relatively high multiple sclerosis in the 30-day time window post-index and in the 31-365D window post-index may suggest specificity error, but presence of multiple sclerosis is NOT considered a disqualifier for having acute transverse myelitis, infact acute transverse myelitis occurs in high frequency among persons with multiple sclerosis. Prednisone was started (drugEraStart) in about 10% of persons.

Limitation of this cohort definitions (source of errors): This cohort definition has many errors, and its use should be with caution.

  • compared to the clinical description - this cohort definition appears to do a decent job with sensitivity but may suffer from specificity issues.
    • Potential loss of sensitivity may be because of not using ICD10CM code of acute flaccid myelitis. But it is unclear if this code represent transverse myelitis.
    • Potential loss of specificity are supported by the lower prevalence of concepts related to strengtheners (symptoms, signs, tests, treatments) in the temporal period around cohort start date. We did observe about 20% MRI utilization and 10% steroid utilization - however, considering the seriousness of this illness and its expected relatively rapid progression, we would expect higher utilization of emergency room and inpatient stay. Although most persons identified appear to have some form of neurological disorder - that may be related to but not be acute transverse myelitis. Further the incidence rate appears higher than anticipated.
  • It is possible that the shift from ICD9CM and ICD10CM and the absence of equivalence code may explain the pattern seen in incidence rate plot. We did not evaluate if replacing G04.82 Acute flaccid myelitis would fix the specificity errors.
  • Cohort end date is currently fixed at 1 day after cohort start date. This may be improved as persons who have Transverse Myelitis are expected to have transverse myelitis and its sequela for months if not years, it does not resolved.

Operating Characteristics

PheValuator: not done

** Patient profile review** I performed a review of random sample of 20 individual cases on two data sources to understand the characteristics of persons, This was not conclusive

Great overview; I will take on this peer review.

1 Like

Transverse Myeliits review

Recommendation: Accept, agree with the recommendation of caution. This definition aligns the that of the FDA, and merits inclusion in the phenotype library given its widespread use.

I reviewed the clinical description and sources, have offered iterations and comments here. The diff here highlights the initial suggested edits.

I examined 4 databases in cohort diagnostics.

I reviewed 20 cases in Truven CCAE using CohortExplorer. I used the regex below, which I adapted in part from Gowtham’s.

enteroviru|nile|hiv|htlv|zika|neuroborreli|lyme|mycoplasma|treponema|scleros|lupus|connective|scleroderma|rheumatoid|antiphospholipid|ankylosing|sjogren

Weak|deficit|parapleg|dysfunction|urinary|retention|Magnetic|MRI|gado|Computed|lumbar|myel|inpatient|hospital

glucocorticoid|methylpred|predniso|hydrocort|dexameth|globulin|ritux

paral|plegia|rehab|constip|inconti|

cerebro|stroke

As an aside, I found it useful to break the regex into categories, which pertain to risk factors/causes, diagnostic evaluation, alternative diagnoses, treatment, and prognosis. I suspect this lands on a similar cognitive framework to what @aostropolets is working on in her cohort review tool, but applies them to the CohortExplorer timeline.

This evolved as the review proceeded. With future reviews, I’ll be more explicit in designing the regex at the outset, (for instance, compiling it while reviewing the clinical description, and possibly with reference to the vocabulary, maybe even PHEOBE).

It might be useful to have multiple regex boxes available in CohortExplorer, with a ‘check mark’ to activate or deactivate the search based on the box contents.

I maintained columns in a spreadsheet to note the presence of:

RiskFactors
Dx: Condition count >3
Dx: MRI
Dx: inpt
Dx: alt Dx
Treatment
Prognosis

That structure contributed to the adjudication of case. Admittedly, I didn’t have hard rules in their application, which was ultimately about the preponderance of evidence in case, which those columns helped organize. I also created a column to track my level of confidence.

Of those 20 cases:

14 positive (6 with high confidence, 5 low confidence, 3 moderate).

1 unclear ( very sparse record)

Depending on allocation of the unclear cases, this would translate to a PPV of 0.75-0.8

13/14 true positive cases have MR of spinal canal.

In cohort diagnostics, at least 30% have an MR spine performed in CCAE, and less in other data sources. It’s hard to know the ‘cumulative’ MRI rate across the different covariate names corresponding to an MRI procedure occurrence, and they aren’t additive as there will be overlap across them.

But it is interesting that the impression of the MR rate in cohort diagnostics is less than the 13/20 observed here. This makes a case for characterizing using other cohorts (e.g MR spine as a cohort).

All positive cases have condition counts >3.

Only 4 cases appeared to receive treatment appropriate to incident transverse myelitis. This seems very low, for a condition where you would almost certainly seek expert guidance, and institute any therapy that might help. Overall, I suspect that we aren’t capturing all the drug exposures in some inpatient stays.

This caused me to revise my initial thoughts that we should require evidence of therapy with glucocorticoids - there would clearly be a big drop in sensitivity.

11 of the ‘true positives’ appear to have inpatient records that corroborate the diagnosis. The 4 that are outpatient based, look more like prevalent disease, and not incident disease.

5 negative cases (2 moderate, 3 low confidence).

4 have alternative diagnoses (spondylosis, CVA, hereditary neuropathy, chiropractic / acupuncture predominant record). I checked cohort diagnostics across those conditions, the only one with significant prevalence in CCAE is spondylosis with myelopathy (approx 10%); could consider excluding that, and possibly CVA (given higher incidence), around time of index.

Possible iterations:

These recommendations should be tempered by the fact that case review was performed on a single data source with a small sample size. so these may overfit the definition to that data source and sample.

  1. In the ‘real world’, incident Transverse Myelitis would almost certainly precipitate an inpatient stay. Restricting to inpatient stay may cause sensitivity issues if you are looking to pick up prevalent transverse myeltits, but may be more specific to incident transverse myelitis. This also depends on your confidence / understanding of the visit table mapping in your data source.
  2. Consider restricting to patients with an MRI of the spinal canal within 60 days of incident disease; and indexing on the MRI if it comes before the condition, to reduce it as a possible source index date misclassification.
  3. Exclude spondylosis with myelopathy, which appears to be a source of specificty error in CCAE, and consider excluding CVA around the time of index.
  4. Increase the number of condition occurrences required to >=3; this may be relevant to xspec.
2 Likes

Thank you @Evan_Minty for this review. This phenotype is now accepted to the OHDSI Phenotype Library

This is cool. Very thorough. I would have a lot of trust in this phenotype.

Of course, my main critique stands: The phenotype should declare what it optimizes for (if anything) and how. the original @Gowtham_Rao trusts the codes (does nothing for specificity or sensitivity), but improves timing of the onset (by using the symptoms as index). @Evan_Minty’s iterations optimize specificity by adding inpatient and MRI as inclusion criteria, adding spondylosis as an exclusion criterion and demanding repeat diagnosis codes. Each criterion must be annotated for it’s purpose, except of course the Condition concept itself.

You are correct @Christian_Reich . We can think of this problem as an optimization problem of - what error do we want to trade. Simplified - Do we want to trade sensitivity to get higher specificity or higher sensitivity to get lower specificity.

For the accepted cohort atlas-phenotype.ohdsi.org/#/cohortdefinition/63 based on our understanding of the disease + how patients experience this disease + how care is delivered for this disease - we anticipated specificity issues (i.e. false positive). @Evan_Minty review of the cohort using CohortExplorer he anticipates a PPV of 0.75 to 0.8. Insights from CohortDiagnostics available here we expected FALSE positives as supported by higher than anticipated incidence rate, high use of Transverse Myelopathy syndrome.

What is still missing is numeric quantification of such errors using PheValuator. @jswerdel has agreed to help me with this, but he is running behind and so this post does not have PheValuator output (yet).

But, this actually gives me an idea. What do you two think of adding additional flavors of cohort definitions for the same clinical idea - transverse myelitis, all based on @Evan_Minty feedback above.

a) suggestion 1 from @Evan_Minty In the ‘real world’, incident Transverse Myelitis would almost certainly precipitate an inpatient stay. and Restricting to inpatient stay may cause sensitivity issues. So, i propose we build such a cohort and see its performance characteristics using PheValuator.

b) suggestion 2 restricting to patients with an MRI of the spinal canal within 60 days of incident disease

c) suggestion 3 Exclude spondylosis with myelopathy

d) we can also do various combination of 1, 2 and 3

Regarding xSpec , @Evan_Minty and @jswerdel maybe we can build it together and @Evan_Minty if you are willing and able, you could use cohort explorer on the xSpec cohort?

What do you both think of this proposal to further enhance this work? @Christian_Reich would that be an approach to solve the optimization problem you brought up?

Agreed that the sens / spec tradeoff exists more generally. Another, in conditions that have an associated subacute / chronic component, is the tradeoff implicit to focusing on incidence disease only, or incident and prevalent disease.

Some iterations I suggest will tailor the definition to incident disease (e.g. MRI of spinal canal).

Others may improve specificity in both incident and prevalent disease (e.g. increasing the condition concept count, excluding spondylosis with myelopathy).

Yes, absolutely.

Given the importance of the statistical model produced by phevaluator in the subsequent adjudication of cohort definitions, my feeling is a profile review of xspec might be one of the more important activities to integrate into our phenotype development process. Maybe more important than the profile review of the final cohort definition.

The goal of the xspec review is not to nitpick occasional false positives / negatives, which are handled in the noisy labelling experiment, but to surface larger ‘class based’ classification errors that would otherwise be integrated into the model based silver standard.

I think for a condition like transverse myelitis as more acute and dramatic in onset, and not sub-acute or chronic (like DM, HTN which are indolent/smoldering for a long time before clinical diagnosis). Plus, considering that the median disease period is about 3 months, i think we can think of this as a short-term disease vs DM, HTN that maybe life long. Considering these two points, i would argue that we should try to model the true date of disease start (represented by cohort_start_date) and disease end (represented by cohort_end_date). If we have precise cohort_start_date and cohort_end_date estimates, then we can extrapolate that cohort_start_date is the incident date (i.e. use for incident estimates), and other dates after cohort_start_date to cohort_end_date is useful for prevalence estimates.

So @Evan_Minty i will go ahead and build the cohorts based on restrictions discussed above.

@Evan_Minty (and hopefully @Christian_Reich ) would you be interested in exploring cohort as feature covariates ? This is an experimental functionality that @schuemie has developed for FeatureExtraction here https://github.com/OHDSI/FeatureExtraction/pull/167 I think @Evan_Minty you also brought this up - instead of looking at code level covariates - we may get higher characteristic proportions for MRI utilization if we had a MRI cohort.

What do you think of proposing/building such cohorts that may be either symptoms, signs, procedures for transverse myelitis and then using them as Cohort Covariates - and see if its more informative?

I’d agree with respect to acuity in onset, but in terms of persistance, from your clinical description:

I’d hope to see evidence of that trajectory in the observational record - in as much as there’s resource use implied in that rehab & disability.

In some of the reviewed cases, it appeared that patients ‘dropped in’ to the record with established TM, and in some, ongoing disability. They still wind up qualifying based on our current definition. I’d consider that ‘prevalent’ TM, or prevalent disability from TM.

Agreed - as a ‘minimum viable experiment’ aggregating across concepts that represent an MR of the spinal canal.

@Evan_Minty , i received the PheValuator results from @jswerdel

cdm sensitivity95Ci ppv95Ci specificity95Ci npv95Ci
cdm_optum_ehr_v1892 0.608 (0.530 - 0.683) 0.601 (0.523 - 0.676) 1.000 (1.000 - 1.000) 1.000 (1.000 - 1.000)
cdm_optum_extended_ses_v2013 0.814 (0.768 - 0.855) 0.649 (0.601 - 0.695) 1.000 (1.000 - 1.000) 1.000 (1.000 - 1.000)
cdm_truven_ccae_v2008 0.738 (0.678 - 0.793) 0.665 (0.605 - 0.722) 1.000 (1.000 - 1.000) 1.000 (1.000 - 1.000)
cdm_truven_mdcd_v1978 0.823 (0.767 - 0.870) 0.662 (0.604 - 0.717) 1.000 (1.000 - 1.000) 1.000 (1.000 - 1.000)

This is promising results and looks similar to what was estimated using the combination of cohort diagnostics and cohort explorer. @Evan_Minty notice the PPV upper bounds was 0.7 which is around what you had anticipated. Note: the specificity here is not material because this is a rare disease.

So the challenge is to improve sensitivity and positive predictive value. To improve PPV, we have to reduce False positives and we can use the ideas you shared above (restricting to inpatient, MRI, spondylosis etc) but they may reduce our sensitivity.

As much as I love the idea, unfortunately this is just a piece of illustrative evidence. Reason: To do its job, PheValuator relies on its ability to detect the ground truth with the help of a probabilistic model using the other data. But we don’t know how well that works. It may do that very well, or it may miss entirely. In addition, if PheValuator had the secret sauce to detect ground truth we would use just that, instead of cobbling together a deterministic phenotype performing equally well at best, but probably much worse.

Because each criterion optimizes for something, but also deteriorates another coefficient, because it, also, is not perfect. Therefore, I am skeptical about throwing a ton of criteria at a phenotype, because we may just swing the resulting classifier back and forth. We need to use the criteria sparingly.

Yes, that’s another flavor. Optimize for the underlying prevalent condition (longitudinal, precise index not important, exit should follow the prognosis), or for the flare-ups (short, precise index important, exit dates only to avoid the onsets bleeding into each other). The former you need for good prevalence calculations, the latter for incidence rates.

But thinking about the whole debate about the library: We should probably get off these fixed gold-standard type phenotypes. Because depending on the study and the underlying data you need to optimize different aspects.

You have a bunch of use cases (flavors in @Daniel_Prieto’s lingo): Study settings (what population are we working in), unique outcomes, outcome rates, exclusion criteria for other cohorts, indications, covariates (though I wouldn’t know what to optimize this for, the ways of a probabilistic model are mysterious), counterfactual predictions, etc. They translate into different optimizations.

The library might be a nice Lego set with a core criterion, and a table of optimization criteria, each to be applied for a certain purpose, and mixed as little as possible (so they don’t neutralize each other):

Optimization for Criterion
Sensitivity acute Conceptsets + logic
Sensitivity chronic Conceptsets + logic
Specificity acute Conceptsets + logic
Specificity chronic Conceptsets + logic
Precise acute onset Conceptsets + logic
Precise acute exit Conceptsets + logic
Inpatient setting Conceptsets + logic
Outpatient setting Conceptsets + logic

The provision of those conceptsets would be a huge win already, but only if you can use them for a purpose. PPVs and the like - forget it. With all the permutations of criteria you got way too many.

@Christian_Reich could you please elaborate this

Sure. I should not use undefined terms.

Right now, the idea is that each phenotype is the best, most validated, most informed set of criteria. There is one such set for each entity. A monolith. A Lego castle.

But we need the building blocks.

I havent heard that from anyone in the OHDSI Phenotype Development and Evaluation workgroup.

Take for example this thread modeling (cohort definition) Transverse Myelitis. We now know that the accepted cohort id # 63 has problems with sensitivity and positive predictive value. We have qualitative evidence from CohortDiagnostics and CohortExplorer, and quantitative evidence from PheValuator.

Now, you can propose cohort id 74, 78, 88 (numbers made up) that have different operating characteristics based as seen in the qualitative and quantitative methods used above. All of those can go the OHDSI Phenotype library, as long as their operating characteristics are described and documented.

Now these operating characteristics - are what can also be called as Flavors. some operating characteristics maybe better for one use case/study, while other operating characteristics for another study/use case.

If you want a definition that has high sensitivity but low PPV, thats fine…
If you want a definition that has high PPV but low sensitivity that fine…
or you can take something that have moderate PPV and moderate sensitivity…

thats why its called a library :slight_smile:

Exactly.

Difference is: you propose a number of castles (ready to go cohort definitions) per condition, one for each flavor. I propose to have a lego kit for each condition with different bricks. The user will pick which bricks (second column in the above table) by flavor (first column) and finish the castle himself.

Either one could work.

Are you proposing something other than Cohorts? Something that maybe used to build cohorts?

So there are two parts

  1. Development
  2. Evaluation

I think the evaluation is important. To evaluate we have to apply the Cohort definition on One or more data sources and see it’s performance characteristics. above we are using cohort diagnostics, cohort explorer, PheValuator applied on about ten data sources. So it is empirical.

Are you thinking in terms these two parts, i.e. including running on many data sources and checking how the operating characteristics are doing

t