OHDSI Home | Forums | Wiki | Github

Phenotype Phebruary Day 12 - Parkinson's disease & parkinsonism

I’d like to introduce thoughts from our joint team (at UCLA and Northwestern University) on Parkinson’s disease and parkinsonism. Thanks to @Patrick_Ryan for encouragement to participate even as we have yet to setup our own OMOP-CDM instance.

Goal: to develop and evaluate algorithms (and impact of criteria used within them) that assess incidence and prevalence of

  • Parkinson’s disease (PD) and
  • neurodegenerative parkinsonism disorders (PD included)

Clinical description

Parkinson’s disease (PD) is a neurodegenerative disease that has been estimated to be increasing in prevalence based on large scale epidemiologic work. It is the most frequent form of neurodegenerative parkinsonism, itself a subset of parkinsonism syndromes. More details below.

PD is considered the condition when there is specific degeneration of the substantia nigra dopaminergic (DA) neurons over time producing the disease. It is estimated that well over 50% of DA cells will degenerate before clinical symptoms appear. This is the classic and core definition; however, increasingly there is appreciation of that PD itself is a multisystem disorder with involvement of other neurotransmitters and other systems within and not within the nervous system (often noted as non-motor symptoms – constipation, cognitive impairment, sleep disorders, autonomic dysfunction, depression, anxiety etc).

PD is the most common form of neurodegenerative parkinsonism (80-85%).
Neurodegenerative parkinsonisms (other than PD) are defined by having parkinsonism plus other neurologic systems that are affected by degeneration besides the specific PD degeneration described above. There will be described in more detail below – and include progressive supranuclear palsy (PSP), multiple systems atrophy (MSA), corticobasal degeneration (CBD) and others. Frequently, patients with neurodegenerative parkinsonism may be diagnosed with PD early on before neurodegenerative features become sufficiently prominent for clinical diagnosis over time. If patients have a clear diagnosis of PSP or MSA (for example), those are not considered PD, but are within neurodegenerative parkinsonism.

A good reference and overview of PD as a clinical syndrome, its early diagnosis, and its evolution over time is Armstrong and Okun (2020 Diagnosis and Treatment of Parkinson Disease: A Review - PubMed )

Clinical diagnosis of Parkinson’s disease (PD)

Parkinson’s disease is a clinical diagnosis during life with no lab/imaging studies that definitively establish the diagnosis. Parkinson’s disease is the most common form of parkinsonism.

Parkinsonism is defined as the syndrome of

  • rigidity
  • rest tremor
  • bradykinesia (slowness of movement)
  • postural instability
    Each of above can be recorded in a variety of different ways – and (except for rest tremor) tend to be highly nonspecific. The well-established practical UK Brain Bank criteria for parkinsonism is that 2 of the above 4 findings are present with at least 1 of the two being either rigidity or bradykinesia. Note, tremor is not required. The current Movement Disorders 2015 consensus criteria defines parkinsonism as bradykinesia with or without rest tremor, rigidity, or both – and of course – must not be attributable to other causes (e.g. slowness from depression, drugs, illness, catatonia, lack of motivation, other neurologic disorders). Many of these criteria are often considered routinely impractical outside a movement disorders specialty evaluation/setting.

Parkinson’s disease is clinically diagnosed when patients have parkinsonism and

  • has supportive criteria without significant red flags or confounding findings
    Supportive criteria generally are accepted as
  • clear and dramatic beneficial response to dopaminergic medication – see below
  • presence of levodopa-induced dyskinesia
  • gradual progression over time
    Red flags and confounding findings include
  • early (typically within 3 years of onset of sx) severe orthostatic hypotension
  • early (within 3 years…) severe urinary incontinence/retention, not otherwise explained
  • early recurrent falls (within first 3 years of sx)
  • early severe dementia (controversial) – some criteria consider PD occurring independent of dementia; others will exclude early dementia or certain types of dementia
  • complete absence of parkinsonism progression after 5 years
  • neurologic findings such as ataxia, spasticity, aphasia, apraxia, supranuclear gaze palsy
  • normal DATscan (dopamine transporter scan)
  • absence of secondary causes for parkinsonism:
    • drug-induced parkinsonism (typically has exposure to medication at the time parkinsonism is being diagnosed or considered and improves or does not progress if med is reduced, discontinued)
    • multi-infarct vascular parkinsonism
    • normal pressure hydrocephalus
    • recurrent head trauma, severe traumatic brain injury or concussion with parkinsonism
    • history of encephalitis with subsequent parkinsonism

The beneficial response to dopaminergic medication is considered highly specific, but a failure of response is difficult to prove as most criteria recommend 1000 mg/day levodopa be used without response before definitive criteria of not responding is registered. It is noted that this criteria makes having the prescription of a PD med not specific to making the diagnosis because it is not infrequently used as a “test” prescription to see if the patient will respond or not.

The progression of symptoms is slow and it may take several years before a clinically probable PD case is diagnosed. Quality measures for assessing quality care of neurologists recommend annual re-review of diagnosis for the first 5 years after symptoms.

Conversely, many of the features that define neurodegenerative (non-PD) early in the course of disease (to be described below) become common in late-stage typical PD and one must be cautious not to overdiagnose neurodegenerative (non-PD) parkinsonism if it is clear that one has typical PD but that the patient has simply developed the other neurologic features as part of advanced late-stage PD disease.

Clinical diagnosis of neurodegenerative parkinsonisms (PD is one of them)

Non-neurodegenerative parkinsonism (secondary parkinsonism):
These should be excluded from cohorts of PD and neurodegenerative parkinsonism.
To distinguish neurodegenerative parkinsonism (of interest) from non-neurodegenerative parkinsonism can be challenging, especially since two categories may co-exist or be considered/coded early and over time, clinical clarity will hopefully emerge.

Typically, if a clinical diagnosis of a clearly established non-neurodegenerative parkinsonism is made and that diagnosis correlates with the clinical course over time for that particular etiology (varies by etiology) and which fully explains the neurologic presentation, then the likelihood of a neurodegenerative parkinsonism is considered less likely.

Having a non-neurodegenerative parkinsonism (NNP) of course does not exclude PD (or other neurodegenerative parkinsonism) as the etiology may have unmasked an incipient PD which would be detected by abnormal clinical course that deviates from the NNP etiology over time.

These non-neurodegnerative parkinsonisms (by etiology) are often referred “secondary parkinsonisms” as some degree of warning/exclusionary criteria. Common ones are vascular (multistroke) parkinsonism and drug-induced parkinsonism. Others described below.

Neurodegenerative parkinsonism (non-PD):
Among neurodegenerative parkinsonism, above clinical guidelines help distinguish PD from non-PD neurodegenerative parkinsonism.

The non-PD neurodegenerative parkinsonisms are defined by degeneration of other neurologic systems besides the specific PD pathology. Neurodegenerative parkinsonisms tend to progress over time, so time again becomes a distinguishing element as features below emerge gradually and can be nonspecific or not routinely assessed well enough to be difficult to recognize are present in an EHR.

Common neurodegenerative parkinsonisms (besides PD) and their associated symptoms / neurologic system involved:

  • PSP – progressive supranuclear palsy – parkinsonism + early falls & supranuclear gaze palsy

  • CBD – corticobasal degeneration – park-ism + asymmetric dystonia, apraxia, alien-limb

  • MSA – multiple systems atrophy – park-ism + autonomic dysfunction (bladder/orthostasis)

  • MSA-c – multiple systems atrophy – MSA + cerebellar ataxia

  • MSA-p – multiple systems atrophy – MSA + parkinsonism predominant subtype

  • SDS – Shy Drager Syndrome – MSA + predominant autonomic dysfunction

  • SND – striato-nigral degeneration – older term, not much use - considered parkinsonism without response to levodopa without other features above

  • LBD/DLB – parkinsonism + early dementia with fluctuating mental status, hallucinations and sensitivity to side effects of dopaminergic medications

  • LBD – Lewy body dementia or DLB (dementia with Lewy bodies) are considered part of the spectrum of parkinsonism – this is controversial, though for phenotyping neurodegenerative parkinsonism, it is reasonable to include within the “neurodegenerative parkinsonism” group that:

Often “atypical parkinsonism” or “parkinsonism” are used by neurologists to mean neurodegenerative parkinsonism though the specificity of the terms itself is often in question.

  • “parkinsonism” is often used in documentation (and in coding) to identify the syndrome without committing oneself to the diagnosis of PD (either due to lack of findings, progression, and sometimes due to lack of experience or confidence in making the diagnosis). And unfortunately many EHR systems map “parkinsonism, not specified” to G20 (ICD10: Parkinson’s disease, primary parkinsonism), reducing confidence in G20 being as specific to PD as it could be.

Here’s a nice simple overview (1 page!) provided by the Parkinson’s Foundation for parkinsonism and/vs Parkinson’s disease:

A basic overview of these conditions are in the same above PD reference as well:
Armstrong and Okun (2020 Diagnosis and Treatment of Parkinson Disease: A Review - PubMed )

Phenotype development

Here’s the practical “clinician” oriented summary of the clinical diagnosis considerations above:
This hierarchy is not matched perfectly to SNOMED/OMOP-CDM hierarchy which follows.

  1. Parkinsonism syndrome
    **1.1. neurodegenerative parkinsonism (this is of interest #2) **
    1.1.1. Parkinson’s disease (this is of interest #1)
    1.1.2. non-PD neurodegenerative parkinsonisms (aka “atypical parkinsonism”)
    1.1.2.1+ list of these: PSP, CBD, MSA (MSA-c, MSA-p, SDS), LBD, DLB, SND….

1.2. non-neurodegenerative parkinsonism (not of interest, but are often in the differential diagnosis under active consideration and if definitive and if the sole condition can be exclusionary); also known as “secondary parkinsonism”
1.2.1+. examples: secondary parkinsonism – drug-induced parkinsonism, vascular/multiinfarct parkinsonism, normal pressure hydrocephalus, postconcussive parkinsonism, postencephalitic parkinsonism, etc

Translating this to vocabulary considerations:
• Parkinson’s disease (specific)
• PD code alone: ICD10 code: G20
• OMOP standard: PD condition: 381270
• Parkinsonism codes (neurodegenerative parkinsonism)
• ICD10 codes: G20, G90.3, G23.1, G23.2, G31.83, G31.85, (G23.3)
• OMOP standard concepts: PD, MSA, MSA-P, MSA-C, SDS, SND, parkinsonism w/orthostatic hypoTN, CBD, PSP, LBD
• Non-neurodegenerative parkinsonisms
• secondary parkinsonism: ICD10 codes: G21.*
• vascular parkinsonism:
• normal pressure hydrocephalus:
• drug-induced parkinsonism

And here’s the hierarchy of OMOP-CDM vocabulary codes mapped to ICD10 codes available

Literature on algorithms:
Review of many algorithms comes down to extraction of common elements with highly variable applications in several different combinations.

A good summary of 18 studies looking at algorithms for PD and parkinsonism (Harding et al 2019: Identifying Parkinson's disease and parkinsonism cases using routinely collected healthcare data: A systematic review - PubMed )
You can see a wide variety of PPV and sensitivity measures – and likely due to differing nature of cohorts, and as we would suspect, differing criteria for defining cases (PD or parkinsonism). Several of the broader ones also include secondary parkinsonism, mostly because there is a desire to capture broadly early cases of PD when it is often misdiagnosed or confused with secondary parkinsonism. Many used chart review as gold standard classification with some relying on neurology examination (this is another topic – how to standardize validation chart reviews across sites that our group is working on).

The CDC has created the National Neurologic Conditions Surveillance Survey (NNCSS) with PD as one of the first use cases (National Neurological Conditions Surveillance System (NNCSS) | CDC). In discussions with CDC (not published or available), a comprehensive review of algorithms seeking a case definition for PD revealed more than 120+ algorithms published. This is an opportunity to leverage the OMOP-CDM community to start the process of comparing algorithms in a uniform scalable way will be an important contribution. Our group is helping advise CDC on this process.

The elements that are most often used for PD / parkinsonism algorithms are as follows:
• Diagnosis conditions that support dx:
o Specific PD diagnosis
o Broader parkinsonism diagnoses
• Diagnosis conditions that represent competing or alternative diagnoses that, when are confirmed would exclude PD/neurodegenerative parkinsonism, but may co-exist or be an incorrect diagnosis early in course
• Diagnosis position (primary diagnosis or not for that encounter)
• Encounters (ambulatory only or ambulatory/inpatient/ED inclusive)
• Specialty coding the diagnosis condition
o Neurologist coding diagnosis conveys more confidence in diagnosis code
o Movement Disorder neurology subspecialty (not in OMOP-CDM vocab) conveys even further confidence in diagnosis coding
• Medications that support dx
o Specific PD dopaminergic medications (levodopa ingredient and dopamine agonists)
o Other PD medications fairly specific to PD (amantadine, COMT-inhibitors, MAO-B inhibitors, a few others)

All of above feature a look-back time (typically 1-5 years).
For diagnosis codes – typically # of coding events within lookback time as a threshold
For medications – either

  • duration of meds must be sustained or
  • of prescriptions of meds over lookback time

After manually reviewing all of criteria above in our local dataset, our group has presented an abstract proposing a tiered approach to assess effect of criteria: Folie et al Abstract presented at Movement Disorders Society 2021.

Recommended approach for OMOP Phenotyping exercise

Focus on a tiered approach for a systematic assessment toward specific PD while keeping a broad tier for parkinsonism (neurodegenerative).

Using above abstract as the framework, we note that we had focused on PD and not neurodegenerative parkinsonism, so it excluded specific neurodegenerative parkinsonism syndromes.
So there is need to add/create a separate set of tiers that do include a broader range of parkinsonism.
So we propose investigating the following tiers which we would recommend for the OMOP Phenotype February consideration:

algorithm cascade omop.xlsx (9.6 KB)

Inclusion diagnoses:
ICD10/9 code for PD: ‘G20’, ‘332.0’
ICD codes related to neurodegenerative parkinsonism (ICD10/9) includes ‘G23.1’, ‘333.0’, ‘G31.83’, ‘331.82’, ‘G23.9’, ‘331.6’, ‘G90.3’,‘G23.2’, ‘G23.3’,‘G31.85’

The ICD codes related to red flags (confounder codes) includes ‘F20’, ‘F20.0’, ‘F20.1’, ‘F20.2’, ‘F20.3’, ‘F20.5’, ‘F20.8’, ‘F20.81’, ‘F20.89’, ‘F20.9’, ‘295’, ‘295.0’, ‘295.00’, ‘295.01’, ‘295.02’, ‘295.03’, ‘295.04’, ‘295.05’, ‘295.1’, ‘295.10’, ‘295.11’, ‘295.12’, ‘295.13’, ‘295.14’, ‘295.15’, ‘295.2’, ‘295.20’, ‘295.21’, ‘295.22’, ‘295.23’, ‘295.24’, ‘295.25’, ‘295.3’, ‘295.30’, ‘295.31’, ‘295.32’, ‘295.33’, ‘295.34’, ‘295.35’, ‘295.4’, ‘295.40’, ‘295.41’,‘295.42’, ‘295.43’, ‘295.44’, ‘295.45’, ‘295.5’, ‘295.50’, ‘295.51’, ‘295.52’, ‘295.53’, ‘295.54’, ‘295.55’ , ‘295.6’, ‘295.60’, ‘295.61’, ‘295.62’, ‘295.63’, ‘295.64’, ‘295.65’, ‘295.8’, ‘295.80’, ‘295.81’, ‘295.82’, ‘295.83’, ‘295.84’, ‘295.85’, ‘295.9’, ‘295.90’, ‘295.91’, ‘295.92’, ‘295.93’, ‘295.94’, ‘295.95’, ‘F25’, ‘F25.0’, ‘F25.1’, ‘F25.8’, ‘F25.9’, ‘295.7’, ‘295.70’, ‘295.71’, ‘295.72’, ‘295.73’, ‘295.74’, ‘295.75’, ‘F21’, ‘301.22’, ‘G24.1’, ‘G24.2’, ‘G24.3’, ‘G24.4’, ‘G24.5’, ‘333.6’
And including as further confounder codes related to secondary parkinsonism includes ‘A52.19’, ‘094.82’, ‘G21’, ‘G21.0’, ‘G21.11’, ‘G21.19’, ‘G21.2’, ‘G21.3’, ‘G21.4’, ‘G21.8’, ‘G21.9’, ‘332.1’, ‘G91.2’, ‘331.5’

Proposed medication list for PD: any med with levodopa ingredient (typically forms of carbidopa/levodopa); pramipexole, ropinirole, rasagiline, rotigotine, apomorphine, entacapone

We likely will need to consider a separate cohort specification for developing a phenotype for specific non-PD neurodegenerative parkinsonisms, but we can start with above.

I would welcome input from this amazing community!

1 Like

@allanwu Thank you so much for your leadership in driving this conversation, for your courage to contribute, and for the extremely clear and informative clinical description and current landscape assessment. I learned a lot by reading this. There’s clearly a lot of subtlely here with differential diagnoses and non-specific symptoms that make it a fun phenotyping challenge. Tagging @aostropolets and @callahantiff because we were just talking yesterdy about ideas of how to we leverage external knowledge on symptoms and differential diagnoses as part of the phenotype process, and this is a tremendous examplar to allow us to sink our teeth into.

Thank you also for providing a very clear rubric to follow to consider Possible / Probable / Definite cases of PD. It was very easy for me to follow your instructions that create a standard ATLAS cohort definition to do what you are looking for.

I’ll use this thread to illustrate the cohort definition design, because it provided a few opportunities to bust out some less-commonly-used tricks that others may enjoy learning about:

We start with the entry event, which is any condition occurrence of the Parkinsonism conditions (Parkinson’s disease, Progressive supranuclear palsy, Dementia with Lewy Bodies, Coritcobasal degeneration, and multiple system atrophy):

You’ll note that I created a separate conceptset for each of the Parkinsonism subtypes. That way we can reuse them as individual components, rather than having them all combined into one composite conceptset.

Those conceptsets were pretty straightforward to create based on the ICD codes that @allanwu provided. I’ll show the conceptset expressions just to highlight how concise they are (even though there can be many source codes that roll up to standards):

and for completeness, here’s the Parkinson’s drug conceptset:

and the ‘Parkinsonism confounder conditions’ conceptset we’ll use later

Ok, so now let’s walk through the inclusion criteria.

#1. has at least 1 Parkinson’s specific code:

Note above, we are able to ‘re-use’ that Parkinson’s conceptset that was in the entry event and now is used again here in this criteria. Since the entry was the earliest event of any of the Parkinsonism codes, we know that PD-specific code must be on or anytime after the index date.

#2. has at least 2 encounters with Parkinson’s specific code

Note here, to requiring two separate dates with PD codes, changed the first componet to be ‘with at least 2’ (changed from 1) then clicked the button that said ‘using all’ and changed it to ‘using distinct’, then selected ‘Start Date’ from the dropdown.

#3. has 2 encounters with PD code that are at least 365d part

Note here, I use the trick of a ‘nested criteria’ (which you find by clicking the ‘Add attribute…’ button and selecting the last item in the list. This allows us to ‘reset’ the index date for the criteria, and this case it means, we say, ‘must have at least 1 PD code which itself has at least 1 PD code that falls 365d after the first PD code’. This logic requires 2 codes with at least 365d gap between them.

#4. has Parkinson’s medication

Two notes from above: 1) I have chosen to use ‘Drug Era’ but I could have used ‘Drug Exposure’ as the domain (you’ll see what I used Era in the next criteria). 2) I’m looking for drug after the entry event. It is possible that a person may have exposure PRIOR to their first observed diagnosis, particularly if we have prevalent cases of PD in our dataset. However, given Allan’s description and insight that drug may be used to ‘test for’ disease before diagnosis, I thought it better to focus on post-index exposures here.

#5. has Parkinson’s medication with duration >6 months

Note this is the same criteria as #4, except that I added an extra attribute: ‘Add Era Length Criteria’, and set this to be ‘Greater Than’ 183 days. This is why I used the DRUG_ERA table, to take advantage of the pre-processing we do in the OMOP CDM to create episodes of continuous exposure. If we didn’t have that already in place, implementing this seemingly-simple logic ‘duration > 6 months’ would be extremely painful.

#6. has encounters with PD code across at least 4 years

This is probably the most complicated of the inclusion criteria. You’ll see I use the ‘nested criteria’ again, but I nest the nesting criteria 3 times. Basically, I’m saying: ‘you must have a PD code for which at least 365d later you have another PD code, for which >365d later there’s another PD code, for which >365d later, there’s another PD’. This guarantees that there’s at least 4 years with a PD code appearing.

#7. has no Parkinsonism confounder conditions

Note, we look for the absence of any confounder conditions and we use all time pre- and post-index.

There were two criteria that @allanwu proposed that I did not implement, and I’ll discuss both briefly:

“(7) with at least one code coming from neurologist (neurology / movement disorders visit)” - if we want to implement this, we need to define how to recognize neurologist participation in care. On a admittedly quick scan, I couldn’t find any particular CPT/ICD10PCS codes that specifically highlight procedures that ensure its a neurologist involvement (even though there are many codes that involve neurological evaluation). As a community, we are still harmonizing on provider specialty, and I know many databases either do not provide this information or if its particularly noisy (including not mapped into visit occurence table). Many of the NUCC concepts aren’t specific to Neurology (they combine Psychiatry and Neurology), so I’m just out of my depth to implement this reliably. (Just as a demonstration, one can use ATLAS to require visit with Provider Specialty like this:)

“(8) + ratio of PD to other Parkinsonism confounder code is >2:1” - this is not a function currently supported in ATLAS. But, personal opinion, I find these types of heuristics hard to defend on any sort of clinical grounds, usually they are serving as some crude proxy for some other idea that one has in mind that may be able to modeled more appropriately. In this particular case, we also look for ‘no parkinsonism confounder codes’ and we see that doesn’t impact the cohort substantially (see below) so I don’t think we need to try to get too fancy with this rule.

And, here’s the results from the MarketScan CCAE database. I’m showing here the attrition table from ATLAS (go to Generation tab, Generate the cohort on your database, then click ‘View Reports’ button, you can then toggle between ‘Intersect view’ and ‘attrition view’). Intersect view helps understand the independent impact of each criteria. Attrition view lets you see the inclusion criteria applied in sequential order. Given the framework Allan provided, I’ll show the attrition view here:

We can see that we start with 105,243 patients with a Parkinsonism code, of which 83.17% (87,532) have at least one Parkinson’s specific code. Of those, 63k have at least 2 encounters with a PD code, and 33,757 have 2 codes that are more than 365d apart. 87% (29k/33k) of these patients had at least one Parkinson’s disease medication, and 84% (25k/29k) of those with a drug had at least one duration of exposure greater than 6 mo. The criteria had the largest proportional impact on the cohort was requiring at least 4 encounters across 4 years: only 9,238 met this criteria. Note, this can be partly a reflection of the database, which contains privately insured patients, who there is both an issue with persons not having long continuous observation and also an issue with persons > 65 transitioning to Medicare. The last criteria “no Parkinsonism confounder conditions” did not have a major impact, with >91% of those patients remaining for the final cohort count of 8,480.

Note, now that we have this one cohort definition that implements almost all of the Wu criteria (modulo Neurologist visit and majority rules confounders), it is easy to create a definition that relaxes the criteria to find all ‘Probable’ or ‘Possible’ cases. You just have to delete the criteria that are not relevant. For illustration purposes, I’ve provided 3 cohorts on ATLAS-phenotype, and I’ll ask @Gowtham_Rao if he could be so kind as to run CohortDiagnostics on these 3 definitions, so that we can see how the patient characteristics may vary across these variants:

I’d be really curious to see if PheValuator could give us insights about the sensitivity/specificity tradeoffs between the definite/probable/possible classification that @allanwu has proposed. @jswerdel this may be a fun demonstration case where the cohorts are more akin to PheValuator 1.0 framing of ‘prevalent chronic disease’ but for which PheValuator 2.0 should still be applicable (just with using longer feature windows that extend beyond the acute post-visit window)

Fun stuff! I’m eager to hear from others about where we go from here…

Many thanks for this proof-of-principle example of how to take parkinsonism/PD ideas and translate into OMOP-CDM cohort tools
The phenotyping approach I proposed was designed as an opportunity to see how Atlas cohorting could be used and to show its potential to my team.

This post will respond to the phenotyping comments @Patrick_Ryan posted.
A separate post will discuss our use cases (which we do have in mind) and next steps. Suffice to say, this is just the starting point and we think that this work can be an essential contribution for comparing different proposed definitions (and individual criteria) developed/published by different teams (and organized by purpose).

Again, as proof of principle, and the degree of practical complexity, there are further tweaks that can be made (and/or tested) in even the conceptsets that are being used. (for example the PSP conceptset should be concept code 192976002; the Lewy body dementia conceptset should also include concept code 312991009 - senile dementia of Lewy body type); always guided by clinical consensus as has been promoted.

Specialty coding for conditions:
The idea that codes for parkinsonism/PD that are dropped by neurologists are more specific than those are supported by literature, e.g. Optimizing Algorithms to Identify Parkinson’s Disease Cases Within an Administrative Database (nih.gov)
and is thought to shorten the time between symptom emergence and clinically probable diagnosis.

I agree with Patrick that the linked Provider ID specialty to Visit Occurrences is what is supported by CDM and, just as other discussions suggest, having good clinical judgment is needed to include Neurology, and remove the child concepts that are not relevant to the category of “neurologists who are assumed as making an observation of Parkinsonism if it was or was not present”. Our board certification is the American Board of Psychiatry and Neurology (its the same organ (:slight_smile: ) which creates these sorts of challenges. Movement Disorders neurology is an commonly accepted specialty, but not an ACGME Fellowship supported one, so is basically not capturable within OMOP-CDM.

As to the heuristic of 2:1 ratio of supporting conditions vs confounding conditions, it is loosely adapted from the most current diagnostic criteria for PD 2015 MDS clinical diagnostic criteria for Parkinson's disease - PubMed which explicitly states a criteria of “presence of red flags counterbalanced by supportive criteria” and goes further stating 1 red flag needs 1+ supportive criteria; 2 red flags needs 2+ supportive criteria and cannot have 3 red flags. As I noted before, most of these criteria are designed for movement experts to apply and not generally documented clearly. The OMOP analysis shown clearly shows that this criteria did not have much impact on the attrition plot and helps support dropping this criteria. I love this empirical approach to assessing these proposed criteria.

Attrition view is very interesting and exactly what we were looking for as a way to compare and contrast relative effects of criteria with empirical data (across datasets if needed) for all those algorithms that have been published (and continue to be).

The way attrition view is constructed is dependent on linear application of criteria laid out. Does the intersect view allow more of an absolute view of how much each criteria is contributing to the whole cohort? @Patrick_Ryan could we see that?

I look forward to reviewing the test case “definite, probable, possible” Atlas definitions.

Again, many thanks to this community. I will post again later about our use cases as to why we are doing this (preview - working toward methods to test how a surveillance registry for PD would or could function).

Thanks @allanwu , excited to continue this discussion with you, your team, and the broader community.

A couple comments:

  1. The cohort definitions that I posted should definitely been seen as the START of this development journey, not the end. The conceptsets I created we based purely on the ICD codes @allanwu provided, but I did NOT go through our usual recommended practices, including PHOEBE, to assess whether any of the ideas could be expanded or revised. That certainly would be a good next step. As a nice illustrative example, the CONCEPT_ID 40391011, ‘Progressive supranuclear palsy’, which Allan suggests should be in the PSP definition (and I would 100% agree), it is useful to note that, at least in the OHDSI partners who contributed Concept Prevalence results (which are underlying PHOEBE), only one database has ever used that code, and it’s only 1077 records, so we can learn quickly that it isn’t expected to make a big difference. But this type of investigation can and should be done for each of the conceptsets to make sure we’re complete (and I’ll use this an another opportunity to reinforce, we really need the OHDSI community to come together and share Concept Prevalence results, otherwise, investigations like this won’t generalize to the broader network of databases)

  2. ATLAS / Cohort Definitions / Intersection view. Yes, i should have posted both. Here’s the Intersection view for the Definite cohort.

We have the same number of persons as before, but now, when you look at the inclusion rules, what is being reported is the independent evaluation of each rule. (Basically, you see what % of the entry events satisfying each rule). So, for example, of the 105,243 total events, 83.17% satisfied rule 1: “has at least 1 Parkinson’s specific code”, and 47.71% satisfied rule 4 “has Parkinson’s medication”. This view lets us know which inclusion rule had the biggest impact overall (not in sequential order), and we can see that it is rule 6 “has encounters with PD code across at least 4 years”, with only 9.66% satisfying this criteria. In contrast, rule 7 “has no Parkinsonism confounder conditions” is the least restrictive rule, with 94.30% of entry events satisfying this requirement.

  1. I recognize the value and potential utility of using provider specialty in a phenotype, and there can be many reasons why a diagnosis isn’t considered ‘verified’ until a specialist has confirmed it. So, by no means, dismissing the consideration to include such a rule in a definition. Our challenge is really just about how this information is represented in source systems, and how that translates into the OMOP common data model. in OMOP CDM, we do have a PROVIDER table, which allows for clinician identifier to have an associated provider specialty. And PROVIDER_ID can be associated to any VISIT_OCCURRENCE or domain-specific table (e.g. CONDITION_OCCURRENCE, DRUG_EXPOSURE, PROCEDURE_OCCURRENCE, etc.). However, the extent to which source data themselves contain provider specialty data or that these information have been properly ETLed into the CDM structure is an area that we have not fully reconciled. One can create a definition that depends on this information, but they’d need to go into the analysis eyes wide open that they would likely need to review how provider specialties made their way into the CDM no a case-by-case basis. Purely for illustrative purposes, I show below what happens when I add an extra criteria requiring a visit record with a neurology-related provider specialty in the CCAE database.

Here’s the extra rule I added (and I’m sure @allanwu or others would argue against some of my selections of specialty concepts):

And here’s the intersect view of results:

and the corresponding attrition view of the results:

Bottom line, we lose another ~1000 patients from the ~8,500 patients that originally qualified by imposing this restriction, despite patients satisfying of all other criteria.

@Patrick_Ryan @allanwu I am awestruck by how much you have done in just a couple of days.
@Patrick_Ryan – amazing leadership. I am an OHDSI novice, but I have an interest in a few specific conditions, Parkinson’s and heart failure being two where personal and professional interest intersect, as well as diagnostic processes, and long covid in particular, from a purely professional point of view. Hope to follow progress and contribute if I can.

Here are the PheValuator results for the Parkinson’s disease (PD) cohorts:


I included a simple cohort with a single code for PD, the first time in a subject’s history. The simple algorithm had very high sensitivity and moderate PPV compared to the other 3 cohorts. The extra inclusion criteria for the 3 other cohorts produced a substantial increase in PPV compared to the simple cohort algorithm especially in CCAE (~71% → ~98%) and Medicaid (~75% → ~97%). However, increasing the requirements from “possible” to “probable” to “definite” seemed to do little to the PPV estimates in comparison to each other - all were very high. The extra inclusion rules did come at a cost of a decrease in sensitivity from “possible” to “definite”. In CCAE, the “possible” algorithm captured about 60% of the subjects (PPV - 98%) whereas the “definite” captured about 21% (PPV 98%).

1 Like

Thank you @jswerdel for running this proof of principle and including the single code comparison. The results are extremely informative to my team as a demonstration of what these tools can do and we will be following up using Atlas demo site and standing up our own instance.

A few points and questions of clarification.
The findings are not surprising since the Wu Possible cohort definition that @Patrick_Ryan created already included many of the specific criteria needed to identify PD patients and these results suggest the diminishing returns on PPV/specificity/sensitivity on the later additions of highly specific criteria: 4 years of codes (Probable) and lack of confounding codes (Definite).

Clarification – my reading of PheValuator depends on the xSpec cohort (and xSens) cohort. To run these 4 test cohorts, what cohort definition was used for xSpec? Did you train xSpec on one of the CDM datasets or was xSpec trained on each? And if so, wouldn’t the PPV be even higher than it already is for that CDM set used for xSpec - or is the PPV just showing the normal variation in these datasets between the training dataset and the test dataset?

Also, the training dataset of xSens typically uses a set of patients estimated from the prevalence of disorder, so did you use a 1:1 ratio or something else for the ratio of xSens pts vs xSpec pt counts in training the PheValuator model?

Thanks for helping us understand these tools!

Thanks for the PheValuator comments @allanwu. Here is the xSpec for PD. As we had plenty of subjects in each data set, I trained the models on each of the datasets and applied the model to the evaluation subjects from that dataset. It is interesting to see that the younger subjects (CCAE) had a lower PPV than the older subjects (Medicare) in the simple algorithm. This may be that the older subjects in Medicare, though we are calling this a prevalent cohort, may have had PD for a longer period prior to the first occurrence in that database and were well into treatment during the 2 years of observed data used to inform the model. So more symptoms, more treatments, etc.

PheValuator uses about a 1:4 case:non-case ratio when training the model. The model is then recalibrated for the population ratio of cases to non-cases based on the prevalence.

One bit of clarification, the xSens cohort (in this case subjects with a single code in their record) are used as possible PD subjects. The non-cases used in the modeling process are a random sample of subjects from the database who are not in the xSens. The attempt here is to remove any possible PD subjects so that the cases (from the xSpec) are subjects with a high probability of having PD and the non-cases are subjects with a low probability of having PD.

Hope this helps. Please let me know if you want to discuss more.

@allanwu thank you so much for including this topic and for all of the wonderful and very detailed information that you provided. I especially enjoyed the papers that were referenced in your explanations. In particular, the 2020 JAMA review article. I was hoping that I could ask you a few follow-up questions given what I learned from that article? Disclosure – I am not a doctor, nor an expert in PD, but want to understand how knowledge of the mechanisms that cause the disease can be used to aid in its diagnosis.

  1. The article included procedures (and a few diagnoses/symptoms) that could be used to help diagnose and/or identify relevant patients in both the “Prodromal” and “Diagnostic” periods.

    • For the “Prodromal” periods, the article mentioned rapid eye movement sleep behavior disorder (439007), which is diagnosed (at least in part) via polysomnography (45890139). It also included genetic information, which I know may not always be available. Could including the concept Family history of Parkinson’s disease (4182334) be helpful? I am curious if there would be any value in including these concepts in a definition to identify patients in the “Prodromal” period? Perhaps it could be used in combination with the existing criteria and perhaps filtered such that it must occur prior to the earliest date of PD diagnosis (or associated PD-inclusionary event)?
    • For the “Diagnostic” period, I am curious as to whether or not the procedure concept for SPECT CT brain dopamine transporter study using ioflupane (123-I) (36685547), also with clever date-based filtering, would be helpful? Granted, I did glean from the article that this concept might be most useful when trying to determine if one has a Parkinsonism disorder – so perhaps it could be used early on or as part of a tier-based phenotyping system that you mentioned.
    • Although not in the “Diagnostic” period (I believe). Is there any value to including, again with appropriate date filtering, concepts related to treating advanced PD like deep brain stimulation (4046932)?
  2. The approach that you described makes complete sense, but also seems to largely focus on those patients that the review article would consider to be in the “Diagnostic Period”, rather than trying to also identify people in the “Prodromal Period”. Am I correct to assume that this is largely due to how challenging it is to confidentially identify patients with PD using EHR data?

  3. Somewhat related to the prior question – while, my understanding and knowledge of the disease are very limited, I understood from the article (and others that I read) that while preventative treatments are still largely unknown if they were to become available, it would be valuable to identify patients as early as possible (i.e., before the symptoms that constitute a formal diagnosis appear). Further, not only would it be important to be able to identify patients in the “Prodromal period”, but it would also be important to subtype them based on whatever knowledge is available (ideally using recognized molecular mechanisms and diagnostic knowledge)? I ask this question as a way to gather information for some questions I hope to ask you in a future post (related note below).

I also hope to follow up and get your thoughts on some additional ideas I have for taking a more translational approach to phenotype PD patients.

Thank you so much in advance for your time and consideration in answering my questions!

t