OHDSI Home | Forums | Wiki | Github

Phenotype Phebruary Day 27 - Drug-induced Liver Injury

In this post, we will focus on drug-induced liver injury (DILI) and adjudicating potential cases.

In short:

  1. DILI is different from other conditions in the way that it is a) rare, b) condition of exclusion, с) has a potential causal relationship already embedded in the phenotype

  2. In developing phenotype, we would optimize NPV rather than PPV; focus on defining the conditions of exclusion; maybe create several phenotypes to be used across the network since it’s not clear if one phenotype would fit every data source

  3. Case adjudication for finding new associations needs some smart methods since it’s impossible to review thousands of patients

  4. We lean towards trusting notes (unstructured data) more than a code for DILI in the structured data, but in fact inferring patient status is hard either way and is subjective.

  5. Maybe, discontinuation of the drug after the liver injury can be a useful indicator but we need to figure out how to infer discontinuation reliably.

If that sounds interesting, read on.

DILI is a good example of a condition that perfectly fits observational research. Since it’s so rare, it is rarely captured in small size clinical trials, which points out the need for post-marketing research.

It used to be the most frequently cited reason for drug withdrawal (up to 32% of drug withdrawals, recently decreased following FDA statements). We know about an increased risk of DILI for some of the drugs (e.g., acetaminophen,amoxicillin-clavulanate, antiepileptics). There are many outdated stats regarding DILI prevalence and proportion in liver disorders, but what we know for sure is that it varies geographically and is more common in adults.

Presentation: can mimic both acute and chronic liver diseases, both hepatocellular and cholestatic. Can be symptomatic or only include asymptomatic liver test abnormalities. If symptoms are present, they include malaise, low-grade fever, anorexia, nausea, vomiting, right upper quadrant pain, jaundice, acholic stools, or dark urine.

Suspect when a patient has underlying liver conditions or has taken drugs that metabolize in liver.


  • No other conditions of exclusion (such as autoimmune hepatitis, Wilson disease, viral hepatitis, ischemic liver injury, Budd-Chiari syndrome)
  • Various hepatic lab tests (alkaline phosphatase >2 times the upper limit of normal + ALT/ALP ratio ≤2, bilirubin ≥2.5 mg/dL, etc.)
  • Biopsy (optional)

Treatment: mainly drug discontinuation, sometimes glucocorticoids; for some of the drugs – antidotes (like N-acetylcysteine for acetaminophen and L-carnitine for valproic acid toxicity).

Prognosis depends on the severity and form of injury. In severe cases can progress to hepatic decompensation, encephalopathy etc.

Informatics approach:

Overall notes:

  • Drug induced liver injury is a rare condition and is mostly a disorder of exclusion meaning that all of the other cases have to be ruled out prior to establishing the diagnosis.

  • Given that we want to use the phenotype in safety surveillance, we are interested in a rather specific phenotype. NPV should be prioritized over PPV.

  • It would be nice if we could just take SNOMED 4144765 Drug-induced disorder of liver and be done with it. Of course, it doesn’t get used very often.

Therefore, we need to create a phenotype, which will have three elements:

  • a liver injury (diagnosis, symptom or lab test defined)

  • a drug exposure preceding liver injury (would be cool to use external knowledge to select the drugs that metabolize in liver @callahantiff)

  • no other causes (may get tricky if a patient has an underlying liver disorder that can be both the risk factor for DILI and an alternative diagnosis)

We decided not to reinvent the wheel and take the existing phenotype that eMERGE has previously developed (led by @hripcsa and @chunhua). Others exist, but that one has the most discussion about phenotype development, details on implementation and the same data source at our disposal. They’ve done huge work on developing and implementing the phenotype, which as you can imagine took some time (@hripcsa said ~ 6 months – I’m really surprised it took ONLY 6 months).

Link: https://academic.oup.com/jamia/article/20/e2/e243/710321?login=true#supplementary-data

I’ll post a figure on phenotype development in eMERGE here as I find it quite interesting:

Basically, two institutions developed their algorithms which were then fused. Also, interesting how the gold standard was created there (intersection of the patients identified by the NLP algorithm and by the acute liver injury ICD-9 algorithm).

We borrowed a lot from their phenotypes (ATLAS):

    1. Patients aged 18
    1. Index date: any of the liver injury codes in the original implementation mapped to SNOMED + PHOEBE-augmented. Here, we focused on the hepatocellular form of DILI.
    1. Any drug exposure within 90 prior to the index date
    1. Laboratory values crossing threshold for DILI within 90 days after the drug exposure (the original phenotype also has a criteria of normal lab values prior to DILI. We didn’t include it since here the absence of test is likely equal to a normal test and only required the patients not to have abnormal tests before the index date).
    1. No condition of exclusion (Chronic liver injury, organ transplantation or liver operation, alcohol abuse/liver damage/toxic effects, viral hepatitis, death, overdose)

Informatics findings and discussion points:

    1. PHOEBE is useful in developing the set for acute liver injury, not only because of recommendations but mostly because it allows reviewing the concept set and spotting the commonly used yet not appropriate codes. For example, the ICD9CM code used in the original implementation 573.9 ‘Other specified disorders of liver’ is mapped to the broad ‘Disease of liver’, which we may want as a verbatim concept but don’t want all of its descendants.
    1. Creating the list of conditions of exclusion isn’t trivial either. Here, we basically include all of the chronic hepatocellular disorders even despite the fact that they can be present in DILI.
    1. Since we ran the algorithm on one data source, it was easy to get the list of measurements. Otherwise, it probably would’ve been more challenging both in terms of lab tests and their units.

We iterated over the cohort to remove other obvious reasons for liver injury and finally got 5,259 patients.

The top concepts were Zosyn, acetaminophen, atorvastatin, aspirin, lisinopril, prednisone, lorazepam, amlodipine.

At this point, we wanted to look at individual records, but it is impossible to review them all. Given the low prevalence, reviewing the records at random will likely miss the cases.

So, let’s find the cases (SNOMED 4055224 Toxic liver disease) and try to figure out

  • a) If there are any features that can be used to refine the cohort
  • b) If structured data is enough to establish causality compared to unstructured data

As you can imagine, patients have various age, co-morbidities and prior observation, have various time between liver injury and drug exposure.

A couple of examples:

Patient, F, 53yo: COVID-19, diabetes, acute respiratory failure with hypoxia, asthma, multiple drugs, transaminitis. In charts: Suspect DILI, possibly from clinical trial drug

Patient M, 59yo, chemotherapy, multiple myeloma, non-infectious diarrhea and transaminitis, a dozen of drugs that can cause DILI, resolved. In chart: DILI on melphalan

Patient, F, 68yo, glioblastoma, hypothyroidism, depression, osteoporosis, rash and transaminitis. On chemotherapy, antibiotics. In chart: potential DILI resolved.

Based on the examples, I would say that it’s really hard to establish DILI both in structured and unstructured data, especially for new drugs. It looks like we need better approaches for a) finding those potential associations and b) establishing a causal relationship between a drug and any rare adverse event.

I heard that some hospitals have procedures for reviewing and selecting charts with potential adverse events (done by humans), so that the final adjudication is done on a small sample. I wonder if and how we can scale and automate the process, so I would love to hear your thoughts (and read related papers).


@aostropolets - I would love to work with you on ways we could incorporate external knowledge to construct a candidate drug list. Would you be interested in all drugs metabolized by the liver or only those that are known to cause liver injury? Would drugs that have similar mechanisms of action to those that are known to cause liver injury, but do not currently have an established causal relationship, also be of interest?

Thank you @aostropolets , this is terrific. And great to showcase the work from our friends in eMERGE that’s available in PheKB. I know @judy Racoosin was very keen to see community activity around drug-induced liver injury, so I’m glad that you have started this discussion. And indeed, there’s lots to dig into here.

First, I have to comment that I find ‘drug-induced liver injury (DILI)’ an interesting case for phenotyping, in that it combines together the condition (liver injury) with its attribution (drug), much in the same way that we see some folks looking for diabetes-induced complications, virus-induced outcomes, etc. In these cases, it seems the task for the phenotyper is to identify cases of the condition and potentially impose inclusion criteria that exclude cases that can’t have the appropriate attribution (in this case, liver injury in the absence of any drugs can’t be drug-induced). But then, for DILI, once ‘liver injury without other attribution’ cases are identified by the phenotyper, it seems the responsibility of the pharmacovigilante to perform the necessary characterization and estimation studies to determine if a drug exposure could have a causal effect. That is to say, I’m not sure DILI is only a phenotype problem, but clearly DILI can’t be analyzes in a drug safety context without strong phenotyping in place.

So, this reminds me of @agolozar 's discussion a day ago on phenotyping non-small cell lung cancer (NSCLC) and taking an approach of exclusion: NSCLC = lung cancer (LC) - small cell lung cancer (SCLC), then the problem can be decomposed into: 1) how to make a good phenotype for LC (and estimate its measurement error), and 2) how to make a good phenotype for SCLC (and estimate its measurement error), and 3) combine together these two cohorts by using LC as entry event and SCLC as inclusion criteria, to result in NSCLC and ‘infer’ its measurement error based on the errors from LC and SCLC. Here, we could start by phenotyping liver injury (LI) and then phenotype ‘non-drug-induced liver injury’ (NDILI), with the idea that DILI = LI - NDILI. Here, it seems the critical task is to define the clinical description of ‘liver injury’ (others would to weigh in terms, but I’m suspecting that clinically this isn’t just all forms of hepatic impairment), and then define the specific clinical entities that would definitely fall within NDILI (@aostropolets provides examples here: autoimmune hepatitis, Wilson disease, viral hepatitis, ischemic liver injury, Budd-Chiari…I suspect there are more), As nicely pointed out here, then we have to recognize that excluding persons with NDILI means that any ‘drug-induced’ events that occur within patients with NDILI will not be captured, so that means there will be some sensitivity error in our approach.

I’m keen on other people’s thoughts on @aostropolets 's assertion that:

If our use case goal were to estimate potential public health impact by characterizing the incidence, then I could imagine one desiring a more sensitive definition. While if one was interested in population-level effect estimation to produce a relative risk for causal inference, then one may more concerned with specificity and striving to minimize differential error between exposed and unexposed cases. In both cases, having proper methods to calibrate characterization and estimation results for the measurement error would be preferred to just settling on a definition and assuming no error as we customarily do (looking at you @jweave17 and @chenyong1203 ).

Re: lab values, I know Hy’s Law is commonly used a rule to follow for identifying candidate cases in clinical trials (ALT or AST > 3x upper limit of normal AND bilirubin > 2x upper limit of normal), and indeed such a rule can be implemented in ATLAS to identify cases in datasets that contain these laboratory measures. I suspect others in the community are much more knowledgeable about how else lab values can be applied in objective criteria: for example, it seems there can be other reasons why aminotransferase would be high that makes the labs not necessarily specific (even if data were available). Is it possible to have DILI cases in absence of the ALT/AST/bilirubin elevations, or would observing liver labs in normal range be a suitable exclusion criteria to consider when phenotyping?

Right, I think we (as a research community) are mostly interested in newly marketed drugs and whether those cause DILI. Already known drug toxicity is rather a tool to estimate the performance of a method

I’m very much interested in calibration work by James and Yong, so would love to hear more! Just as you said - we acknowledge that there is a measurement error but do not really adjust our patient characterization for it. One can argue that if such methods exist there would be no bad phenotypes - only those whose measurement error we don’t know.
I would think that normal liver tests would at the very minimum indicate the the observed toxicity is not severe enough for us to care about.