OHDSI Home | Forums | Wiki | Github

Phenotype Phebruary Day 22- Human Immunodeficiency Virus

Another beautiful day in Phenotype Phebruary…2-22-22! Today’s phenotype will be HIV, or human immunodeficiency virus. The virus itself is relatively “new” when we think about some of the phenotypes that we have already explored. This virus a history within the United States (and developed nations) to carry some social and cultural constructs that have shaped the lives of many from the 1980’s to now. While this disease was life threating, it has become more managed with advances in modern medication. A shoutout to @stephenfortin in helping develop this phenotype with me.
As always let’s start with the clinical definition…
Clinical Definition: According to the CDC, human immunodeficiency virus (HIV) is defined as a virus spread by contact of certain body fluids, which attacks the body’s immune system, specifically CD4 cells. The virus is most commonly spread through unprotected sex (e.g., without a condom or preventative HIV medicines), or sharing of injection drug equipment. HIV reduces the number of CD4 cells in the body thereby weakening an individual’s immune system and rendering them more susceptible to opportunistic infection and cancer. Left untreated, HIV can lead to acquired immunodeficiency syndrome (AIDS).

Diagnosis: The only method to determine HIV status is through screening tests. HIV tests are available at many medical clinics, substance abuse programs, community health centers, and hospitals. Home testing kits are also available at many pharmacies or online. Several types of HIV tests exist, including nucleic acid tests (NAT), antigen/antibody tests, and antibody tests.

Prognosis and Treatment: Without treatment, the average survival of individuals with AIDS is approximately 3 years; however, the occurrence of opportunistic infection decreases life expectancy without treatment to 1 year. That being said, taking HIV medicine (i.e., antiretroviral therapy) may enable individuals with HIV to live long and healthy lives, and prevent the transmission of HIV to their sexual partners. In addition, certain measures can decrease the risk getting HIV through sex or injection drug equipment, including pre-exposure prophylaxis (PrEP) and post-exposure prophylaxis (PEP).

As we read this definition, there are some key aspects to consider in developing this phenotype. This is a uncurable disease BUT it can be suppressed (sometimes all the way to being undetectable by laboratory tests). The only way to know if someone has the disease is by preforming a test and the disease must be managed by drugs. These are some important factors to consider because some of these items (laboratory measurements, diagnosis, and drugs) can vary in many observational databases.

There are many studies conducted on patients with HIV, but mostly all use laboratory measurements as a confirmatory for the disease. This is helpful when the databases we provide all have well captured lab data (but they don’t!). Here is a summary of some helpful papers.

Author Year Database Algorithm Sensitivity Specificity PMID
Paul et al 2018 EHR Algorithm 1. Lab + Medications. Algorithm 2. ICD-9 codes, medications, lab tests Algorithm 1 and 2: 78% and 77%, respectively Algorithm 1 and 2: 99% and 100%, respectively 28645207
Antoniou et al. 2011 Claims 48 phenotypes tested. Combinations of physician billing codes, hospitalizations, ED visits, prescription claims, over various time frames See Table 1. Information available only for select algorithms. Sensitivity increases as observation period increases. Specifity >99% for all definitions except single physician claim. 21738786

In Paul et al., the authors found positive lab values and HIV medications (algorithm 1) or positive lab values and HIV medications and ICD-9 diagnosis code for HIV (algorithm 2) to have a sensitivity of 78% and 77%, respectively, and a specificity of 99% and 100%, specifically. It is important to note that the authors had direct access to patient records including labs. Meanwhile, Antoniou et al., tested a total of 48 phenotype algorithms including varying combinations of physician billing codes, hospitalizations, ED visits and prescription claims associated with HIV occurring over varying time periods. Among other findings, the authors concluded that a combination of 2+ physician claims and/or HIV medications occurring within a 2-year period achieved a high sensitivity (e.g., >90%) and specificity (e.g., >99%).

From the scope of literature reviewed we know that labs, drugs, and diagnosis codes will play a role. So we will start with these cohorts:

• HIV diagnosis only 5442
• HIV diagnosis or laboratory measure 5452
• HIV diagnosis or laboratory measure AND treatment 5451
• HIV diagnosis or laboratory measure AND treatment OR 2nd diagnosis 5445
• HIV diagnosis AND laboratory test OR treatment 5441

Our suspicion here is that the diagnosis or laboratory measure gives us the largest pool of patients and maybe we see a small drop when we add treatments (rationale being we lost some people that maybe were rule-outs etc.) The series of cohorts addresses all possible avenues of how a patient could be identified with the assumption that some databases won’t have good laboratory capture, OR in EHR’s where a patient may only ever get the diagnosis once.

Here we can see right away those databases that have laboratory measurement show differences between definitions C2 compared to C5, though these differences are quite small in our US databases. This tells us that all labs are not captured, and we must heavily relay on the diagnosis codes and medications.

Incidence rates:

The incidence rates are higher in males which is expected, and quite a bit higher in MDCD compared to our other US claims/EHR databases. The incidence rates are telling us at face value what we know about the disease, but we aren’t really at our phenotype yet. Much of the literature and clinical practice relay heavily on laboratory measurements but we may not have that, so what else can we learn here?

So now if we want to understand why some people don’t have medications when it is required to maintain low viral loads? We can take a peak at temporal characterization for those with the diagnosis…

We can see that many people have the diagnosis on index, about 80%, and then we dive deeper and see laboratory measurements for about 40% on index, we see on days 31 to 365 we see more diagnosis, labs.

When we look deeper at the cohort for medications, we see anti-retroviral and they occur mainly after index, and in some cases before (likely due to other diseases)

So this short tour of this phenotype raises some questions for our fellow community members to think about, laboratory measurement and diagnosis codes? Tell me what you think…start the conversation and hope to add more color in the coming days…

1 Like

Thanks @rmakadia , this was a nice summary of HIV. And generally a good example of how labs and treatments can be used for phenotyping, and diagnosis codes can be used as a proxy in their absence. I’d be very eager to see how these definitions play out in European datasets as well as other EHRs where labs may be more prevalent (and also where 2+ diagnoses may not be appropriate). Given the global impact of HIV, this could be a really nice area for OHDSI network collaboration to provide a more holistic picture of disease natural history and treatment pathways over time. If I remember correctly, @julie_kohler expressed interest in this a couple years ago, and perhaps others would be interested in joining in this effort.

I’m particularly interested in your very last comment about drugs, that you think anti-retroviral drug use pre-index may be due to other diseases. I dont know these products well or what their other indications are, so my immediate read of that temporal characterization table was that we may be observed cohort entry date misclassification, and that we may want to allow for initial treatment to be a qualifying event (requiring lab or diagnosis some time in the future). But, if the treatments could be used for another disease, then this could cause a different form of index date misspecification (setting the date too early).

Half-baked idea, but if we wanted to correct for index date misclassification but avoid attributing treatments to the wrong indication, could we create a entry event that was ‘treatment with exactly 0 prior diagnosis of alternative indications’, and that way we’d only shift back the index for those that don’t have an obvious explanation? So, for example, since tenofovir could be used for chronic hepatitis B, we’d say entry event could be ‘lab OR diagnosis OR (tenofovir with (nesting criteria: 0 condition records of Hepatitis B before tenofovir exposure)) OR HIV-only meds’. Then, for a person with hepB and HIV, we may not be exactly sure we got the index date correct if we see tenofovir prior to diagnosis, but for other cases, we’d feel better that we got the date of clinical recognition reasonably assigned.

1 Like


Indeed, great summary of the clinical picture/natural history of HIV/AIDS. Just a few points:

  • Survival while categorised as AIDS is on average 3-5 years, albeit survival as HIV+ can be on average 10-15 years (prior to AIDS) untreated in the West, and potentially with treatment survival rates could be not too dissimilar to the general population (i.e., in old age) with optimal viral suppression. Critical goal is suppressed viral load below limit of detection (of assay), and high CD4 count (not included in the phenotype was immunological evaluation), as the latter is a key indices in terms of progression from HIV+ to AIDS (and as defined by AIDS-defining illness)
  • So for monitoring, it would be initial confirmed diagnosis (HIV Ab/PCR), then intermittent VL/CD4 testing as per treatment response evaluation and/or due to progression in disease. VL below limit of detection (LoD) and normal CD4 count (and certainly not below 200) are key indices of suppression of HIV and optimal management with ARVs (if prescribed). A small subset of non-progressors do this naturally without ARVs. Gender, age, coinfection, comorbidities are all co-factors to progression, but also bear in mind longterm AEs with chronic ARV therapy (e.g., early aging effects)
  • What we saw during the COVID-19 study-a-thon and evaluating repurposing of HIV drugs is that it was complicated by the fact that it is rare for ARVs being used in monotherapy, but in combination, so it’s really detecting the combination of drugs (numerous permutations/combinations) and it can be challenging (very much so in the study-a-thon)
  • Linked to the above point is the challenge of utilisation of ARV drugs for other infections, especially HBV, and indeed coinfected HIV/HBV patients who would ordinarily be prescribed ARV combinations utilising HBV-specific drugs like tenofovir within that combination. In terms of index dates, it will boil down I think to specifics when HIV was actually diagnosed (first confirmed test), and for HBV, first confirmed test (albeit some will be exposed/infected, even maybe at the same time as HIV, prior or after) and for some they will have been infected already for life around birth


1 Like

Oh, and sorry, in terms of transmission routes, there is also mother-to-baby, which can be reduced to almost nil with appropriate ARV therapy, and medical acquired (blood tranfusions, procedures, needlesticks), which unfortunately still happens internationally.

For mother-to-baby there is a need for specific ARV intensification or combinations to protect the foetus/baby and clearly more intensive monitoring of both mother and baby prior to and after delivery. Sadly of course in resource-poor settings this transmission route is still an issue, and we need to consider people born HIV+ and living now much longer with ARV therapy too.

Great overview. One comment is we do give drug before the disease happens as you indicated PrEP and even PEP when this fails to halt disease then you develop disease but you wouldn’t have index date misclassification. I have not been diagnosed with CAD but I might start statin therapy as some pre-exposure prophylaxis therapy. Pharmacoepi utopia will be integrated administrative claims data with EHR data (from across health care systems) to truly capture the lab data necessary to close gaps in our phenotypes.

1 Like

Here are the PheValuator results for HIV:

Several interesting things here.

  1. While it’s not surprising that “HIV diagnosis only” has the highest sensitivity, it was a little surprising to see that “HIV diagnosis or laboratory measure AND treatment OR 2nd diagnosis” only caused a small drop in sensitivity while producing a nice bump in PPV. This occurred across the 3 databases tested. That definition seems like a clear winner here. We normally see a significant drop in sensitivity when a second code is added (assuming the lab measure added little - see below).
  2. Surprised to see the large drop in sensitivity overall in Medicaid (those with lower SES) compared to CCAE (those generally under 65YO and working) and Medicare (those generally over 65YO). Anyone have any thoughts on why this might be (as @Patrick_Ryan might say - looking at you @rmakadia, @nigehughes and @Kevin_Haynes)?
  3. The numbers here are in line with the overall cohort counts indicating that adding a lab measure (HIV diagnosis or laboratory measure) doesn’t change the sensitivity. Similar findings for “HIV diagnosis or laboratory measure AND treatment” and “HIV diagnosis AND laboratory test OR treatment”.

I think HIV and Medicaid has a lot of unpacking to really understand. Medicaid and HIV | KFF highlights some demographics of this population. Medicaid beneficiaries with HIV are more likely to be…dually eligible for Medicare. What implications does that have on coding practices? Lot’s of Medicaid and Medicare dollars go towards HIV spending: “30% of all federal spending on HIV care and representing the second largest source of public financing for HIV care in the U.S, after Medicare” There is a lot more to unpack given the benefit distributions.

Very interesting Joel.

I suspect unlike many other diseases that due to the nature of HIV diagnosis, wrt it being very specific to treatment access with a diagnostic test (unlike with so many diseases there being vagaries in the notes or not even a code), the HIV code/diagnosis is much more reliable.

Having the HIV code in the notes is a gateway to services, not to mention public health surveillance and funding requirements.

For Medicaid SES, I wonder as to the variation in coverage for this population, particularly with use of Ryan White Care Act funding for those with limited or no insurance via designated clinics. This is indeed odd.

Medicaid is one of the highest insurance coverage for maybe many HIV+ people on the US, and on their site it states:

In the United States, there are more than 1.1 million Americans living with HIV and Medicaid is a major source of health coverage for those of them who are eligible (PDF 136.26 KB). Before the Affordable Care Act, most individuals living with HIV were ineligible for Medicaid unless they had very low incomes, or were deemed permanently disabled due to an AIDS diagnosis. Starting in 2014, under the Affordable Care Act, states can receive federal Medicaid payments to provide coverage for the lowest income adults in their states, without regard to disability, parental status, or most other categorical limitations. States that implement the Medicaid expansion are likely to provide coverage to some people with HIV, providing care that can help them manage their condition and promote individual well-being. The Affordable Care Act also includes a variety of options for states to offer services to help (PDF 713.99 KB) those who need long-term services and supports at home and in the community.

Perhaps barking up the wrong tree, but can’t think of any obvious reasons for the discrepancy. Do we pick up those funded via Ryan White and/or Medicaid?

I’ve been studying HIV in Medicare and Medicaid populations for a while now. Medicaid eligibility speaks to access to certain subsidies that make meds more affordable. Moreover Part D (used by both Medicare and Medicaid) has to cover ARVs (at least two per drug class – ARVs are one of 6 protected drug classes). That does not mean that the tier of coverage means a low cost. That said work I’ve done with an all payers claims database that shows a largely dually eligible Medicare population uses Part D and gets their medications covered nearly fully.

In terms of coding in Medicare, dual eligibility means that their Medicare will be the primary payer because Medicaid is a payer of last resort. That said there are flaws in Medicare coding of course, particularly once someone becomes admitted to a nursing home and bundled payments are used. for the SNF portion of the stay. At that time it is impossible to look at drugs as they are part of the bundle.

The Ryan White program funds medication access only for the un- or under-insured. Being on Medicare/Medicaid would not usually create an opportunity for HDAP access. Access to HDAP is also challenging for most states other than Massachusetts which is the only state to use a third party to administer the funds, makes the funds more accessible to the people who need it.

I hope this was helpful happy to talk more. Aside: I am very interested in the OHDSI HIV work and would love to talk more about how to do some more work in this arena through OHDSI.

1 Like

Great points. This dual eligibility has significant implications across our OHDSI data resources and lays a foundational example of where a phenotype may have the need for local modifications given the data availability vs. global implementation across all OHDSI resources.

Absolutely and given that Medicaid and who it covers is managed at a state level there is a lot of complexity in working with Medicaid. Medicare is a little easier but even then people can come in and out of Medicare and increasingly people are using Part C which is not the same as original Medicare; it is privately administered.

To further complicate things in Medicaid there are the chronic condition special needs plans that are administered by managed care organizations. For HIV, these SNPs provide care coordination and myriad of services to manage HIV healthcare for low income folks, from what I know, once you become eligible for Medicare, you are no long eligible for SNPs.

I only today requested login to the phebruary atlas. I don’t see the HIV cohort in this github. https://github.com/ohdsi-studies/PhenotypePhebruary/blob/master/inst/settings/CohortsToCreate.csv

Without the special atlas login, how can I see the cohort definitions?

I am curious to review the concept set for measurement definition. We did a 2 year project around HIV. One publication was A Descriptive Study of HIV Patients Highly Adherent to Antiretroviral

Very interesting discussion!

1 Like

Right here: This is the type of tacit knowledge we need to capture (how, apart from asking @Kevin_Haynes?) and incorporate into the phenotype design description. Any ATLAs mechanism (as @Patrick_Ryan calls it now) to address these effects would have to be tagged, and made to work in other databases in other countries where we don’t have artifacts like that.

It goes both ways, as well. Medicare eligibility for people with HIV comes only after they can prove that they are disabled which means they have to prove that they cant work for at least 24 months meaning they often are eligible for Medicaid first by low income and then after 2 years of SSDI they get Medicare. There is a innate latency in the design of their Medicare eligibility. Even still the different types of programs like special needs plans vs original vs advantage plans all impact accessibility of care. Sort of applies to some of the conversations raised in the Characterization meeting yesterday where considering the context of care and quality is important not just the condition or the outcomes.

That’s an interesting paper! And its great to see that it agrees with my paper on the same subject but with regard to access to ART while admitted to the nursing home setting. Economic Barriers to Antiretroviral Therapy in Nursing Homes

The cost of drugs is largely covered by Part D.

I am already doing more work using CMS source data but would love to explore this more from an OHDSI angle!