OHDSI Home | Forums | Wiki | Github

Phenotype Phebrurary 2023 - P1 - Acute Pancreatitis

Target Clinical Description

Summary: Acute pancreatitis (AP) is an acute inflammatory process of the pancreas, suspected in patients with severe acute upper abdominal pain but requires biochemical (Serum lipase > 3x) or radiologic evidence to establish the diagnosis. Presentation: AP is categorized as mild (no organ failure or complications), moderately severe (transient organ failure/complication that resolves within 48 hours), or severe (persistent organ failure of ≥1 organ). AP is different from Chronic Pancreatitis (CP) which may be asymptomatic for long-periods, interspersed with abdominal pain. There are no diagnostic criteria for CP, and is a clinical judgment based on imaging studies, typical patient history and absorption tests. Management of AP is mostly done in an inpatient setting fluid replacement, pain control, nutrition management, and intravenous hydration. If a cause for acute pancreatitis is found, e.g., gall stones, ERCP/cholecystectomy should be performed. If associated with hypertriglyceridemia, an insulin drip may be administered. Epidemiology: AP annual incidence proportion is 600-700 cases per 100,000 people in the US. Reported incidence is around 5 to 35 per 100,000 persons per year. Prognosis: AP patients can fully recover. AP minimum median duration is 1-7 days, and the maximum median duration is 30 days. A new AP episode can independently reoccur in the same patient after recovery from a prior episode. Disqualifiers: Persons with CP should be considered to be ineligible to develop AP, even though they may have flares that mimic AP. Hereditary/congenital pancreatitis: these conditions are considered distinct clinical entities. Differential diagnoses: acute mesenteric ischemia, perforated viscus, intestinal obstruction, peptic ulcer disease, hepatitis, cholangitis, cholecystitis. Strengtheners: gallstones, alcohol use, drugs associated with AP. Complications resulting from AP include acute peripancreatic fluid collection, acute necrotic collections within 4-weeks of AP onset, portosplenomesenteric venous thrombosis, and systemic inflammatory response syndrome.

(Talley, Nicholas J., G. Richard Locke III, and Yuri A. Saito, eds. GI epidemiology. John Wiley & Sons, 2008.)

Designated Medical Event - MedDRA PT terms: Autoimmune pancreatitis, Ischaemic pancreatitis, Oedematous pancreatitis, Pancreatitis, Pancreatitis acute

Phenotype Development: We made design choices in the cohort definition based on the following reasons.

  • Note: this phenotype was developed and evaluated as part of the OHDSI 2022 workshop led by Jamie Weaver @jweave17 .
  • [Empirical Decision] Limit to inpatient and ER visit: During the initial phenotype development we observed individuals to have new onset acute pancreatitis during outpatient visit (as defined by visit_concept_id). Clinicians in the room expressed angst with this finding, as they expressed that it is extremely unlikely that persons with true/suspected acute pancreatitis would be managed exclusively in an outpatient setting. PheValuator provided evidence, that despite the use of Outpatint visit we had good PPV estimates. After the OHDSI symposium 2022 meeting, we reviewed the patient profiles of persons with acute pancreatitis diagnosis managed in outpatient setting - and found that several persons, although had the visit_concept_id of Outpatient Visit, had another event code for emergency visit (CPT4 or revenue code), while persons who had an outpatient visit without the corresponding CPT4 or revenue code received a follow-up care. This was determined by patient profile review using tool CohortExplorer https://github.com/ohdsi/cohortExplorer (not submitted). This provided some empirical data, via patient profile review, and strengthened the clinician expectation that the care should be in ED or Inpatient. So we decided to modify may enter if they are simultaneously in an inpatient or ER visit, or had a procedure code indicating that the care was emergent.
  • [Design Choice] Chronic Pancreatitis: all persons with a history of chronic pancreatitis at any time on or before acute pancreatitis are not eligible. This decision was decided a-priori as part of the clinical description.
  • [Design Choice] Hereditary Pancreatitis: persons with hereditary pancreatitis at any time are not eligible.


  • We submit the following cohort definition for peer review. See cohort id #251 in peer review pending status of the OHDSI Phenotype library ATLAS
  • Cohort Definition Logic Description: we are identifying multiple events per person for acute pancreatitis, i.e., one person may have more than one record. But all these records should start (index on) an inpatient or emergency setting, they should have no history of chronic pancreatitis or hereditary or congenital pancreatitis. Because the median expected duration of AP is 1-7 days, we decided to define the end date of the phenotype after 7 days from the last date of continuous care. If a person then subsequently has acute pancreatitis within 180 days, we assume that to be a continuation of the previous acute pancreatitis. We start with a broad entry event criteria that is based on a Concept Set called ‘Pancreatitis’, but we require the co-occurrence of ‘Acute Pancreatitis’ concept set within 1 day. We believe this design allows for improving specificity while also improving index date misclassification (See below). For detailed logic please read human readable text on data.ohdsi.org/PhenotypeLibrary.
  • These have been evaluated on 11 data sources. Cohort Diagnostics output is available on data.ohdsi.org/PhenotypeLibrary (see cohort id 251).

Phenotype evaluation Acute Pancreatitis (251):

  • Impact of inpatient and ER restriction: this phenotype should be only studied in data sources that have good capture of care in inpatient or ER visit. Using this cohort definition on data sources with incomplete or poor capture of inpatient or ER visit is expected to lead to sensitivity errors. In the 11 data sources evaluated, we observed <5 counts in 5 data sources as those data sources are not expected to have inpatient or emergency room visit related data. In our patient profile review, persons who entered on a non-inpatient or ER setting (do not have the procedure code) appeared to be for follow-up of acute pancreatitis (not submitted).
  • Impact of requiring ‘Acute Pancreatitis’ concept set: As described above in logic description, we allowed persons to enter based on ‘Pancreatitis’ and then restricted to those with ‘Acute pancreatitis. Removing this rule would increase by about 2% to < 5%.
  • Impact of removal persons with Chronic Pancreatitis: Removal of this rule led to loss of about 5 to <10% persons. We (not reported here) studied the population level characteristics of persons with chronic pancreatitis and observed that they had different baseline characteristics (higher rates of abdominal pain, nausea, vomiting, ER utilization, alcoholism, chronic liver disease, diarrhea) suggesting they are a distinct phenotype. So, we expect this rule to increase specificity with minimal to no loss in sensitivity.
  • Impact of removal of persons with hereditary pancreatitis: the numerical impact of this rule to sensitivity is minimal (<.1% loss). Persons with this phenotype appear very young (not reported here) - so we expect the use of this rule may improve specificity.
  • Diagnostics Persons vs Events and Time In Cohort: we observe a rate of about 1.05 per persons. We don’t know if this is high or low, but find this to be consistent across the data sources evaluated. Time distribution diagnostic suggests that most persons, if they have a subsequent visit for acute pancreatitis have it within 180 days – as atleast 90% of persons have less than 30 days of cohort era. This indicates that our cohort exit strategy is reasonable.
  • Diagnostics – incidence rate plot: We observe the incidence rate to increase with age decile with rates similar when stratified by sex. We observed an incidence rate that was about 10 times above reported, with highest rates in 60 to 69 age deciles.
  • Diagnostics – index event breakdown: CohortDiagnostics index event breakdown diagnostic was not useful for this cohort, as it is reporting on the entry event concept set (which is of visit domain)
  • Diagnostics – visit context: CohortDiagnostics visit context diagnostic was not informative as events are limited to inpatient and ER by design. However, we are observing a large number of persons who had outpatient visit starting simultaneous. These may be the persons who have the CPT4 codes for ER utilization.
  • Diagnostics – characterization: Overall the population level summary characteristics appeared consistent with persons with acute pancreatitis. Notably - ~ 50% had abdominal pain on day 0 , ~ 15% nausea vomiting , ~ 20% biliary calculus , ~ 7% alcoholism , ~ 8% dehydration , ~ 25% had lipase measurement , ~ 50% classified as emergency . There was also evidence of end organ failure commonly associated with acute pancreatitis such as acute renal failure. We observe high use

Summary of evidence on operating characteristics:
Evidence of sensitivity errors: We are observing the use of acute pancreatitis diagnosis codes in outpatient setting without inpatient or ER visit. Based on a limited review of patient profiles we opine that such persons are more likely to be outpatient follow-up of acute pancreatitis rather than new events. This opinion, if wrong, may indicate the presence of sensitivity error.

Evidence of specificity errors: I did not observe in characterization the presence of conditions that are either differential diagnosis of Acute pancreatitis such as mesenteric ischemia, ischemic colitis.

Evidence of index date misclassification errors: I observed in the –30d to –1d about 3 to 5% of persons to have acute pancreatitis diagnosis. These are persons who had acute pancreatitis in an outpatient setting and were admitted in next few days. There was also the presence of acute gastritis, epigastric pain which may indicate progression of subclinical acute pancreatitis being mis diagnosed in early-stages.

Overall we believe we have made sound design choices for this cohort definition. We expect the performance of this phenotype to have good operating characteristics of sensitivity, specificity and index date misclassification – and recommend the use of this cohort definition in studies as indications (target/comparator) or outcomes.

Phenotype Phebruary 2023 - moderator comments:

The post above is considered to have satisfied submission requirements i.e. it has the components of

  1. Target clinical idea is described
  2. One or more cohort definitions have been developed using OHDSI tools.
  3. It has been instantiated on one or more data sources. CohortDiagnostics has been run and results submitted.
  4. Those results were reviewed by the submitter and an evaluation has been posted assessing for measurement errors (Sensitivity, Specificity and Index date misclassification)

It has now been assigned to a peer reviewer. The lead peer reviewer is @Evan_Minty .

We are using week 1 - to discuss about the value of peer review. Another separate thread will be started to discuss about peer review and it’s metrics. Please move generic comments to that thread.

Regarding discussion on this post: The purpose of this thread is to provide feedback to the submitter using a peer review frame of thinking, and for the peer reviewer to make a recommendation regarding the cohort definition.

Hi @Gowtham_Rao

Very nice, and an interesting one to kick things off.

Do you differentiate between ‘mild’ to ‘severe’ AP in terms of detecting more intense medical intervention, inclusive of endoscopic (surgical) intervention?

You removed CP patients and with it appears minimal impact, albeit some acute AP could be viewed as acute on chronic pancreatitis, though this may be minimal. Your reference is 2008, and there has been significant progress in understanding, but also management of AP/CP to date, with a number of international guidelines that could be reviewed and referenced, e.g., NICE-Guideline-Pancreatitis-September-2018.pdf (bsg.org.uk) or Acute Pancreatitis - StatPearls - NCBI Bookshelf (nih.gov), and this particularly last one is nicely laid out.

Under Diagnostics - characterisation, your last sentence ends abruptly, ‘We observe high use…’ - of?



Thanks @Gowtham_Rao for the opportunity to be the inaugural ‘phenotype evaluation’ post for Phenotype Phebruary 2023. I have learned a lot from the process, and from your work on this so far.

By way of background for others, I’m a general internist, I work clinically at a large academic teaching hospital in Calgary, Canada. After my IM fellowship, I did an MSc out of Stanford in Biomedical Informatics. I’m onboarded as a research affiliate at Stanford, where I continue OHDSI work with Nigam Shah’s lab, and have been focusing some time this (sabbatical!) year in helping to advance the phenotype library with Gowtham, Azza, and others.

As we know from our debate topic this week, what constitutes a ‘phenotype peer review’ is a not a settled issue. Prior reviews in this forum have taken the forms of thread conversations that build on the submissions. While transparent (and while the threads themselves can be very insightful), it can get lengthy to parse.

So I’ll also use this post to propose some summary tables that try to condense that material into a different format. The idea there, is that we need to make it easier to parse ‘what’s been done’ in the development and evaluation process, with the supposition that we can invest more trust in phenotypes that have been more broadly characterized across a network of data sources, in different ways, by different investigators.

I won’t completely abandon the notion that we offer comments on the original post, as those discussions can bring a lot of value. And nobody needs to follow my lead on the tables either: how we do this, is the subject of our week 1 debate!

For Acute Pancreatitis, I was involved in the OHDSI symposium where we discussed this phenotype at length. In particular

Clinicians in the room expressed angst with this finding, as they expressed that it is extremely unlikely that persons with true/suspected acute pancreatitis would be managed exclusively in an outpatient setting

I had some of the angst :slight_smile:

I was subsequently involved in the joint profile review where we noted the ‘outpatient + ED proc occurrence’ finding that informed the current version.

It did solidify, in my mind, the value of profile review. One of the broader points to this post is to suggest that the combination of some level profile review (while time consuming) and database level characterization (CohortDiagnostics) can be very synergistic activity. It may be that profile review takes a different form in the near future (those interested should check out @aostropolets work with KEEPER) . And admittedly, different sites have varying abilities to execute on it, depending on data access policies.

To offer some comments regarding the submission post:

AP is categorized as mild (no organ failure or complications), moderately severe (transient organ failure/complication that resolves within 48 hours), or severe (persistent organ failure of ≥1 organ).

This addresses @nigehughes 's question regarding differentiation of mild and severe.

It’s noteworthy that the grading in acute pancreatitis reflects the degree to whichyour other organ systems are failing, not just the exocrine functions of the pancreas. In contrast, many other ‘severe’ organ system afflictions (pneumonia, kidney injury, etc) can often manifest as single system disease.

Anything greater than the mild pancreatitis is more like systemic sepsis. This is at the heart of some discomfort in accepting purely outpatient trajectories, with minimal evidence of additional monitoring or investigative effort. You do want to determine causes and address them if at all possible - it’s a serious disease.

Disqualifiers: Persons with CP should be considered to be ineligible to develop AP, even though they may have flares that mimic AP

An interesting point here in that there may indeed be a difference between disqualifiers (as a clinical notion) and disqualifiers (as a design choice), I’d agree that in this case, it reflects a design choice, more than something that is mutually exclusive clinically.

This really demonstrated the value of case / timeline review. We’ve discussed the ETL implications with @Dymshyts but I also wonder aloud to @clairblacketer f there’s a DQD test inherent to that observation- i.e. outpatient (as visit_concept_id) + ED visit (as procedure_concept_id) should = ED visit?

It may impact other acute phenotypes (e.g’s on data.ohdsi.org include appendicitis, or even review of the 'inpatient hospitalization). And this is not a dig at ETL or the CDM: the visit table in source data is usually a hot mess.

The impact on the original (pre symposium) pheValuator run is also noteworthy. And to frame it, let me state up front that I’m a huge pheValuator fan, and at the next OHDSI symposium, I’m going to ask @jswerdel to sign my printed copy of Phevaluator 2.0 , and possibly my chest.

And I’ll leave more for on this topic for our week 4 debates. Suffice to say that the important (and appropriate) role that PheValuator results can (and should) play in rule based phenotype design choices can be heavily conditioned on the distribution learned through xSpec. So some kind of upfront review / human-in-the-loop may have some value add, to ensure it’s on track based on the real world data we have. The goal there wouldn’t be to catch occasional mislabeled cases (inherent to any noisy labelling experiment), but to catch ‘category level’ errors. A review of disjoint predictions between the rule based phenotype, and pheValuator might also add confidence to its determinations (these are not must haves, but trust building activities :))

25% lipase seems low, but I see it also varies by database, which likely reflect the degree to which inpatient activity is captured (low in CCAE, higher in optum EHR).

The visit issue is interesting. If I look at visit context in cohortDiagnostics, I see

It’s easy to look at that and not decipher the degree to which this is an inpatient disease.

In the cohort characterization tab, thanks to the development of ‘inpatient hospitalization’ as a cohort, I can see this more clearly:

Cohorts as Features for. the. win. We need more of these.

That kind of admission rate aligns to what I’d expect based on gestalt - in the symposium, I unscientifically texted and ED MD friend, who felt 4/5 admission rate was their number too. It aligns to some prior work by McNabb-Balter et al, in the National ED Sample database, that suggested ~75% get admitted (although I can’t comment on how clean that DB is, and they used a single ICD-9 code to identify pancreatitis)

I agree here.

I think the biggest source of specificity error remains the ED encounter to ‘rule out’ pancreatitis. It’s reassuring that our hospitalization rate appears to reflect how we think the disease behaves.

I reviewed 15 cases in CCAE via CohortExplorer. I’ve revised those uncertain reviews twice. I found 12/15 were positives (PPV 0.8).

I struggled when cases also included cholecystitis, as they also went on to get their gallbladder removed, and a certain amount of gallstone pancreatitis can be caused by stone passeage. Cholecystitis ranges 5-15% across databases.

Of the 3 negatives, 2 looked like rule out situations, with no adjunct investigations (US, CT), or follow up care. I did observe 2 cases where patients that were not admitted did receive follow up care and investigations appropriate to pancreatitis.

This suggests to me that there’s a role to consider this entity of ‘ed only’ pancreatitis further, and perhaps iterating to require further evidence in those cases, given the lower prior we’d attach to them.

As part of our week 1 debate, we’re considering the role of ‘peer review’. I’m of the growing mindset that we should seek to be transparent as to what’s been done, what’s been learned, what might be left to do.

The issue I’ve raised above may matter depending on the use case (or not? see week 3 debates!). But I don’t see it as prohibitive of it’s entry into a phenotype Library.

We can consider what activities have been completed at a high level:

Original Submission Review 1
Gowtham Rao Evan Minty
Clinical Description Completed (GR) Reviewed
Literature Review Not systematic Not systematic
Design Diagnostics (PHOEBE)
Cohort Diagnostics Reviewed in 11 DB Reviewed in 4 DB
PheValuator pending (in current version)
APHRODITE development
Profile Review (within sample)) 15 pts in CCAE
Additional pending
Profile Review (xSpec) pending (in current version)
Discordant Case Review pending (in current version)

So there are a few other trust building activities we could take on, including a pheValuator run of the current version of the phenotype. I’d be happy to review the xSpec in that planned run, and review some discordant case predictions.

We can also develop a more detailed view of these activities that seeks to align our insights from data, to the clinical description of interest. For that, we can look to the week 1 debates :slight_smile:

1 Like

@Gowtham_Rao, @Evan_Minty:

You guys dug very deep and created an impressive piece of work (apart from the stupid singular “criteria”, which I will not stop complaining about, the word is CRITERION). But I still believe the blueprint we use to do this is flawed:

It appears, but is not made explicit, that the logic is this: you have a typical clinical case in mind (Evan even calls his friends to get the histories) and then model your criteria after it. In this case, the patient falls sick with all the symptoms mentioned, gets rushed to the hospital and no later than a week it is all over. You try to reproduce this narrative as a cohort definition. Then you check the patients falling into that for some other clinical characteristics not used in the definition. Not sure what happens to the counterfactual: patients missed by the approach.

I doubt this will work well. Not only is it not reproducible (unless Evan and his friends become a fixed part of the process), it also does not take into account that the capture systems of EHRs and claims simply do not work that way. They are a mixture of reporting and justifying reimbursement, the rules of which are opaque. I also doubt these textbook cases actually happen all the time. They are mental models. Acute pancreatitis patients are more often than not alcoholics, not known to be compliant with doctor’s advice. Finally, it also ignores the fact that each criterion has it’s own sensitivity, specificity and timing weaknesses, creating complex boolean algebra when combined. As a result, these definitions become not justifiably complex and probably unnecessarily complex.

In line with this, looking at your definition you are making a ton of design decisions without justifying them:

  1. Your index criterion is an inpatient or inpatient/ER visit with “pancreatitis”. Apparently, you assume that “pancreatitis” is used as a diagnostic shortcut for acute pancreatitis if the setting is dramatic (ER). That needs to be be tested, or at least made explicit.
  2. Your second index criterion is an outpatient visit followed by an ER visit “as a procedure”, which is a contradiction in itself. Are you putting in a workaround to a vocabulary issue?
  3. You conclude that an outpatient diagnosis must be a follow-up, unless the same day there is an ER visit. That could be tested.
  4. The first inclusion criterion requires another (!) diagnosis, this time pancreatitis minus chronic pancreatitis (you call that “acute pancreatitis”). Why another diagnosis, and why not do that in the first place?
  5. The second inclusion criterion excludes chronic pancreatitis, even though you say an acute can happen on top of a chronic. The diagnoses actually are not mutually exclusive.
  6. The third inclusion criterion kicks out the hereditary ones. That’s ok, except that they are so rare that the contribution is doubtful. Funnily, that exclusion is true “between All days Before and All days Before index start date”. Not sure whether the SQL would pick up what you probably meant, but again, false precision. My hunch is this criterion has no effect whatsoever.
  7. You talk about other differential diagnoses, but only use those three in your exclusions.
  8. Your exit is after a fixed 7 days (sounds like a good default for an acute condition), but then you combine them if they are up to 180 days apart!! So, is it acute, or chronic?

How do you evaluate this wool ball? If the CohortDiagnostic finds someething odd - now what? There is no way you can dissect the effects of each of these criteria and recommend an improvement.

IMHO, the approach should be (i) engineered bottom-up with a modular approach and (ii) parsimonious: A condition cohort should be defined by a condition conceptset, with some standard exit criterion (fixed depending on whether it is acute or chronic, or through censoring events) and nothing else. Only if the characteristics are unsatisfactory, as seen in CohortDiagnostic or by reviewing patients, should we add criteria to fix what we are seeing. Each new criterion should be labeled as “Optimizing sensitivity”, “Optimizing specificity”, “Optimizing index date” or “Optimizing cohort exit”. We could then evaluate the contribution of each criterion separately. If it doesn’t achieve it should be gone. A library user with a different intention (e.g. needing a very broad definition) could easily decide to omit criteria or add others. We could have a catalog of standardized criteria, such as “If you want sensitivity consider including non-exclusive differential dx” or “If you want sensitivity consider re-diagnosing using Measurements and Procedures” or “If you want specificity consider requiring repetition of the diagnosis within a period of time” etc.

Otherwise we will fall into the same trap and add yet another definition to the pile in the literature, with some arbitrary checks in a set of arbitrary databases. The untestable complexity of the definition will hide our anxiety about its possible weakness.


Thanks @Evan_Minty for doing our first peer review during Phenotype Phebruary 2023. I really appreciate your leadership and participation.

Just to keep to a short question that hopefully can yield a succinct answer: Since you are serving as the role of the peer reviewer, if I were to frame as most peer-review journals do: Has the submitted phenotype been 1) accepted for ‘publication’ (as a designated entry in our phenotype library), 2) rejected, or 3) are you recommending a ‘revise-and-resubmit’?

@Christian_Reich - much appreciate the feedback. It deserves a considered response that Gowtham and I were looking to coordinate but struggled to last week, given commitments. But we will look to get back to you.

@Patrick_Ryan , @Gowtham_Rao . I would suggest revise and resubmit with:

  1. PheValuator results from the current version, to estimate sensitivity and specificity
  2. Cohort diagnostics results of ‘ED only’ pancreatitis (i.e. the ‘second’ branch of pancreatitis
    • with one condition occurrence (i.e. the current branch)
    • requiring a second condition occurrence within 2 weeks (a modification to increase specificity)
    Those can then be compared at a database level.

That resubmission can also look to address many of @Christian_Reich 's questions, some of which relate to ‘proving out’ / clarifying decisions that were made in the development process.