OHDSI Home | Forums | Wiki | Github

Phenotype Phebruary Day 21- Prostate Cancer

Hi everyone,

It is Phenotype Phenbruary-Day 21 and it is time to talk about prostate cancer. Some of the materials I am presenting here are based on previous works from the PIONEER study-a-thon, the Oncology WG (specifically the work @mgurley is leading on deriving initial disease episode from discrete diagnosis and an amazing discussions and team work. A shout out to @dkosareva, @Ajit_Londhe and the Amgen team, @mmayer, @mgurley, @Adam_Black.

Clinical Definition
Prostate cancer is a cancer of the prostate gland, an organ of the male reproductive system located below the bladder and surrounding the urethra. It is the second most common malignancy in men and the fifth leading cause of death worldwide. Median age at diagnosis is approximately 67 years.

Clinical Presentation

  • Most prostate cancer patients are asymptomatic at the time of diagnosis and are diagnosed during screening.
  • Local growth of the tumor can lead to symptoms of urinary obstruction including urgency, hesitancy, nocturia, incomplete bladder emptying and decrease urinary stream. These symptoms, however, are nonspecific.
  • Some patients may also present with symptoms of metastasis such as bone pain and pathologic fracture.

Diagnosis

  • Screening for prostate cancer is the primary way to detect localized prostate cancer in asymptomatic individuals, the stage at which the disease is potentially curable. Screening methods primarily involve measurements of the blood serum biomarker prostate-specific antigen (PSA).
  • Digital rectal exam (DRE), PSA levels and MRI are standard diagnostic tools for detection of prostate cancer. PSA measurement is a better independent predictor of prostate cancer than DRE and complements prostate cancer detection efforts. However, both diagnostic procedures (DRE and PSA testing) can be abnormal without prostate cancer being present (false-positive) and can be normal despite the presence of prostate cancer (false-negative).
  • A prostate biopsy is used to assess the presence of prostate cancer if DRE and/or imaging results are suspicious or if the PSA value is confirmed to be elevated or rising without any other explanation.

Prognosis and survival:
The prognosis of the disease varies by tumor grade and stage at the time of primary diagnosis:

  1. 80% organ-confined disease with a 5-year overall survival (OS) between 60% and 99%
  2. 15% locoregional disease. 5-year OS is between 60% and 80%
  3. 5% late-stage disease with distant metastases. The OS is poor in this group: 5-year OS is 30-40%

The survival of patients with prostate cancer is related to several factors, including tumor extent, grade, patient’s age and general health and PSA level.

Treatment
Treatment decision is based on several factors, including tumor stage, histopathological and molecular features of the tumor, level of risk, anticipated life expectancy, overall health, and personal preference.

Rebello et al provide a great overview of prostate cancer management.

Resources:

  1. Rebello, R.J., Oing, C., Knudsen, K.E. et al. Prostate cancer. Nat Rev Dis Primers 7, 9 (2021). https://doi.org/10.1038/s41572-020-00243-0
  2. PDQ Adult Treatment Editorial Board. Prostate Cancer Treatment (PDQÂź): Health Professional Version. 2022 Feb 2. In: PDQ Cancer Information Summaries [Internet]. Bethesda (MD): National Cancer Institute (US); 2002-. Available from: Prostate Cancer Treatment (PDQÂź) - PDQ Cancer Information Summaries - NCBI Bookshelf

Review of the published literature on prostate cancer definition: The table below provides a summary of the validated algorithms for identification of incident prostate cancer.

Author, Year Definition Other inclusion/exclusion criteria Performance
Parlett, 2017 a diagnosis for prostate cancer (ICD-9=185) within 28 days after a prostate biopsy (HCPCS/CPT=G0416, G0417, G0418, G0419, 55700, 55705, or 55706). Date of prostate cancer biopsy was defined as the diagnosis date men with codes indicating a diagnosis or history of prostate cancer, medical findings strongly correlated with prostate cancer, or procedures (diagnostic and therapeutic) associated with prostate cancer prior to their start of follow-up were excluded to ensure prevalent prostate cancer cases and those not at risk for prostate cancer are excluded from the cohort Sensitivity: 91% and PPV=82% using Georgia Comprehensive Cancer Registry as a gold standard
Hollenbeck, 2016 and Shahinian, 2017 at least two diagnoses of prostate cancer (ICD-9=185) + biopsy in the 180 days prior to the first prostate cancer diagnosis men with any claim in the preceding 12-month period that was associated with a ICD-9 diagnosis code of ‘185’ (prostate cancer) or ‘V10.46’ (personal history of malignant neoplasm of prostate) were excluded PPV=99.8% Sensitivity= 88.7% validated using SEER data

In addition to these two validated algorithms, there are several other algorithms in the literature looking at the same combination of diagnosis codes and biopsy. The major difference between all these algorithms is the gap between biopsy and diagnosis date which ranges from 28 to 180 days prior to diagnosis date. In their study, Parlett et al looked at different intervals between biopsy and diagnosis date and observed 3% increase in sensitivity and 1% decrease in PPV associated with lengthening the gap between biopsy and diagnosis. However, increasing the interval between biopsy and diagnosis increase the likelihood of including a subset of prevalent prostate cancer patients who are receiving biopsy in their follow up. A good example is prostate cancer patients on an active surveillance protocol.

A couple of questions came up during the phenotype development process and after reviewing the published literature:

  1. What is the appropriate gap between biopsy and diagnosis?
    To addressed this, we banked on some of the prior work on prostate cancer phenotyping in PIONEER and the Oncology WG and looked at the data to get a better understanding of the temporal relation between biopsy and date of diagnosis in claims and EHR. The figure below shows the distribution of the gap between biopsy and initial prostate cancer diagnosis in OpenClaims, suggesting that most biopsies are happening within 30 days of the first prostate cancer diagnosis (+/-30 days).


    We observe a similar pattern in EHR. it looks like most patients receive their biopsy within 30 days of their initial diagnosis date (below).

    Rplot
    Based on what we have seen from different data sources, it looks like a 30 day gap between date of diagnosis and biopsy is a reasonable one.

  1. Can we accurately identify patients who received prostate cancer biopsy in the data? Prior work on Ontario Health Insurance Plan suggests that claims code for prostate biopsy is valid for identifying a patient’s first prostate biopsy Ontario (86% sensitivity and >95% specificity). The generalizability of these results to other databases is not clear though.

  2. Should we exclude patients with prostate cancer related diagnostic or therapeutic procedures prior to index date to ensure we are not capturing prevalent cancer cases? The treatment landscape of prostate cancer is constantly evolving and there are differences in treatment in different parts of the world. If we want to include such criteria, we need to constantly check for changes in the diagnostic and therapeutic approaches of the disease across the globe and update the definition accordingly. But is it necessary to add these definitions? In other word, are we including patients undergoing PCa treatment prior to index date if we rely on the combination of prostate cancer biopsy and diagnosis codes? Good news is that we have an easy way to answer this question. :slight_smile: All we need is to create our cohorts in Atlas, run simple characterizations on the cohorts and look at the frequency of PCa related procedures and medications prior to index date.

  3. Are there cases of prostate cancer that are diagnosed without a biopsy? If so, should we be concerned about excluding a set of patients that are different in their characteristics and outcomes from those diagnosed with biopsy? Are there other criteria we should consider having a broader and more inclusive definition? This was one of the topics that was discussed at length during the PIONEER study-a-thon. We were advised to include an additional criterion of PSA>50 ng/ml since patients with very high PSA levels do not need a biopsy for diagnosis. This is a good example of the importance of close collaboration with experts in the field. This is a recurring question for almost all cancer phenotypes. @Adam_Black has another good use case :wink:

  4. What should be considered as the cohort entry event? Date of biopsy or the date of diagnosis (date of encounter with diagnosis date for prostate cancer)? One can argue that in the presence of a diagnosis after biopsy indicates that the disease was already present at the time of diagnosis and to better understand the nature of the disease, we should use the date of biopsy as the date of diagnosis. But if that is the case, why don’t we go back to the first date a patient presents with a symptom indicative of prostate cancer or first abnormal PSA and use that date of initial diagnosis? At the same time, this sounds like creating a narrative based on what we observe and making several assumptions.

Let’s build some phenotypes:

We are going to build three definitions for incident prostate cancer using different biopsy-diagnosis gaps: 28 days prior and 180 days prior similar to prior publication and +/- 30 days based on our data driven approach. What we are not doing here is creating phenotypes for subtypes of prostate cancer based on risk category, stage and extent of the disease.

Definition 1.
Cohort Entry Events
People with continuous observation of 365 days before event may enter the cohort when observing any of the following:
procedure occurrences of ‘[PIONEER] Biopsy’.
Limit cohort entry events to the earliest event per person.

Inclusion Criteria

  1. Age >= 18
    Entry events with the following event criteria: who are >= 18 years old.
  2. Male
    Entry events with the following event criteria: who are male.
  3. PCa Dx within 28 days of Biopsy
    Entry events having at least 1 condition occurrence of ‘[PIONEER] PCa’ for the first time in the person’s history, starting between 0 days after and 28 days after cohort entry start date.
  4. No history of PCa
    Entry events with all of the following criteria:
  5. having no condition occurrences of ‘[PIONEER] Prior Prostate Ca related obs/condition’, starting in the 365 days prior to cohort entry start date.
  6. having no observations of ‘[PIONEER] Prior Prostate Ca related obs/condition’, starting in the 365 days prior to cohort entry start date; with value as concept: “known present”.

Cohort Exit
The person exits the cohort at the end of continuous observation.

Definition 2.
Cohort Entry Events
People with continuous observation of 365 days before event may enter the cohort when observing any of the following:
condition occurrence of ‘[PIONEER] PCa’ for the first time in the person’s history.
Limit cohort entry events to the earliest event per person.

Inclusion Criteria

  1. Age >= 18
    Entry events with the following event criteria: who are >= 18 years old.
  2. Male
    Entry events with the following event criteria: who are male.
  3. PCa biopsy
    Entry events with at least 1 of the following criteria:
    having at least 1 procedure occurrence of ‘[PIONEER] Biopsy’, starting between 30 days before and 30 days after cohort entry start date.
  4. No history of PCa
    Entry events with all of the following criteria:
  5. having no condition occurrences of ‘[PIONEER] Prior Prostate Ca related obs/condition’, starting in the 365 days prior to cohort entry start date.
  6. having no observations of ‘[PIONEER] Prior Prostate Ca related obs/condition’, starting in the 365 days prior to cohort entry start date; with value as concept: “known present”.

Cohort Exit
The person exits the cohort at the end of continuous observation.

Definition 3.
Cohort Entry Events
People with continuous observation of 365 days before event may enter the cohort when observing any of the following:
condition occurrence of ‘[PIONEER] PCa’ for the first time in the person’s history.
Limit cohort entry events to the earliest event per person.

Inclusion Criteria

  1. Age >= 18
    Entry events with the following event criteria: who are >= 18 years old.
  2. Male
    Entry events with the following event criteria: who are male.
  3. 2nd PCa Dx
    Entry events having at least 1 condition occurrence of ‘[PIONEER] PCa’, starting 1 days after cohort entry start date.
  4. Biopsy within 180 days of first PCa
    Entry events having at least 1 condition occurrence of ‘[PIONEER] PCa’, starting between 180 days before and 0 days after cohort entry start date.
  5. No history of PCa
    Entry events with all of the following criteria:
  6. having no condition occurrences of ‘[PIONEER] Prior Prostate Ca related obs/condition’, starting in the 365 days prior to cohort entry start date.
  7. having no observations of ‘[PIONEER] Prior Prostate Ca related obs/condition’, starting in the 365 days prior to cohort entry start date; with value as concept: “known present”.

Cohort Exit
The person exits the cohort at the end of continuous observation.

Cohort counts

Database Definition 1 Definition 2 Definition 3
AmbEMR 22,274 30,589 216,755
Hospital 16,891 18,903 147,052
OncoEMR 16 23 26,741
OpenClaims 988,366 1,115,041 3,377,557
PharMetrics Plus 104,676 113,874 182,820
UK IMRD 4,313 5,096 9,089

The majority of the attrition we observe across databases is attributed to the biopsy requirement in databases and increasing the gap between biopsy and diagnosis leads to a significant increase in the size of the cohort.

While we are waiting for Cohort Diagnostics results, we did a quick characterization of the three cohorts to see the prevalence of diagnostic and therapeutic procedures/medication prior to index date. We looked for the prevalence of prostate cancer treatments including radiotherapy, systemic antineoplastic therapies, radical prostatectomy, focal treatments, and hormonal therapies in addition to other prostate cancer related diagnostic procedures. The prevalence of prostate cancer related procedures and treatment 365 days prior to index is presented here:

Database Definition 1 Definition 2 Definition 3
Ultrasound, transrectal 79.65 82.14 27.85
Ultrasonic guidance for needle placement (eg, biopsy, aspiration, injection, localization device), imaging supervision and interpretation 73.50 76.00 25.83
Measurement of post-voiding residual urine and/or bladder capacity by ultrasound, non-imaging 20.77 20.34 10.71
Surgical pathology, gross and microscopic examinations, for prostate needle biopsy, any method 19.08
Ultrasound, pelvic (nonobstetric), real time with image documentation; limited or follow-up (eg, for follicles) 2.31 2.24 1.25
Ultrasound, pelvic (nonobstetric), real time with image documentation; complete 1.53 1.52 0.83
Ultrasound, scrotum and contents 1.13 1.11 0.72
leuprolide 1.42 0.09 2.05
bicalutamide 0.97 0.22 2.88

We see a high prevalence of some of the prostate cancer related diagnostic procedures in all cohorts which can be a part of their prostate cancer diagnostic work ups. We did not find any procedure or medication codes for any of the prostate cancer treatments and only observed leuprolide and bicalutamide use in a very small proportion of the patients (<0.01%-2% for leuprolide and 0.2-3% for bicalutamide) and much lower prevalence is much lower for definitions and 2 compared to definition 3.

There is still more to come. Wait for it :slight_smile:

Thank you @agolozar , this is really informative!

I really like the temporal co-occurrence plot highlighting time from biopsy to diagnosis. There’s so much to learn there, particularly given that both the procedure and the diagnosis are fairly specific (so unlikely to be coincidentally co-reported). I’m fascinated by the various patterns you’ve shown across databases. The OpenClaims plot shows an additional peak at ~48 days, I wonder what may have caused that? I find it interesting to think about temporal sequencing and the extent to which we can rely on ordering of observations, subject to ‘noise’ that may be induced through the data collection process or some other mechanism. Datasource 1 appears to have all positive time-to-events, while datasource 2 appears to be mostly negative and datasource 3 looks to be more spread out, so that’s substantial heterogeneity. Here, we might expect that biopsy used to confirm a diagnosis should mean that diagnoses preceding biopsy would be ‘suspected’ but diagnoses following biopsy would be ‘confirmed’. But should we discard a diagnosis preceded by a biopsy if a confirmatory diagnosis is not observed? If not, we may be accepting a ‘suspected’ diagnosis that is ruled out by a negative biopsy. And, as #4 lays out, there’s also the case of how to interpret a diagnosis in the absence of any biopsy (with or without a PSA value).

To your question #5, I would argue that our clinical description should clarify whether we are trying to identify biological onset or clinical recognition and then we can assess the extent to which our cohort entry date may be subject to index date misclassification relative to that target. Assuming our target was clinical recognition (since biological onset is generally unknown for slow-growing tumors), then I would think setting the entry event based on the earliest of diagnosis, diagnostic procedure, or symptoms could be appropriate, subject to an inclusion criteria requiring observation of the minimum elements (here, from your definitions, it appears that’s diagnosis). I suppose my main point here, is that it needn’t be either biopsy or PC diagnosis, both of those events can be candidate entry events.

Awesome - i wonder if this should be a standard diagnostic in cohort diagnostics. It would inform us on index date misclassification. thank you for this illustration. lets get it in version 3.x of cohort diagnostics

I like how you have nicely framed gaps in knowledge

But i have a few comments -

  1. All cohort definitions require a prostate biopsy - but the clinical description does not say that it is essential. Is it possible to diagnose prostate cancer without a prostate biopsy eg imaging studies? How about missing data problem for prostate biopsy?
  2. I have heard, anecdotally, that prostate cancer diagnosis is made for persons suspected to have prostate cancer to justify payment for prostate biopsy? Would it be possible that some of the individuals may have a negative pathology result (same point as @Patrick_Ryan )
  3. As we know, not all prostate cancer is the same - some people describe that a lot of men die with prostate cancer and not from prostate cancer i.e. the prostate cancer in many are indolent and go undetected. Maybe because this is such an important idea - clarifying in the clinical description that this phenotype does not (or does) differentiate this would be useful, as it is known to be subjected to surveillance bias.
1 Like

It is not possible to diagnose a prostate cancer or in general any kind of cancer with certainty without a biopsy. It is a usual protocol that the diagnosis of a cancer cannot be included in the tumour registry if the diagnosis is not supported by the result of a biopsy. The image can be very indicative or even patognomonic but the certainty is only based on a biopsy that determines the type of cancer cells characterizing the particular histological type of cancer.

In addition, it is knwown that many men if they are very old when dying can have prostate cancer that is observed in the autopsy but obviously is not the cause of death. Autopsy-detected incidental prostate cancers are typically small, low grade, and only occasionally locally advanced or metastatic. “Prostate cancer prevalence increases with age, and is detected in over half of men aged ≄90 years.”

This is a great point! I have another definition that requires a second diagnosis and I am doing comparison of the two. I am looking at the two definitions (with and without second diagnosis) to see the differences and similarities. Will share the findings sooooon.

What I did not show here was the timing of biopsy in relation to first diagnosis in tumor registries. We know that the tumor registrars have follow SEER coding instructions (in the US) for identifying cancer cases and their date of diagnosis. So it made a lot of sense to take a look at the distribution of gap between biopsy and diagnosis in tumor registries. Similar to what we observe in the EHR data, the first biopsy can occur before and after the diagnosis in tumor registry. Tighter but similar distribution to what we saw for EHR.

Agree. the description should indicate what the phenotype is trying to represent. The diagnosis of the disease or the first indication of a presence of the disease.

Hi @agolozar,

First, 1 alteration I’ve made to all 3 definitions:

The “Known present” concept (4253628) used as a filter for the “No prior history of PCa” inclusion rule is not in any of our CDMs. It’s not in value_as_string either, so it doesn’t appear to be an ETL issue, but rather the native data simply doesn’t have any values tagged to these observation records. So I removed this requirement. This will make this inclusion rule more restrictive.

From our early results of the PIONEER algorithm (dx + biopsy within +/- 30 days)

  1. We see that biopsy will result in 90% attrition in a GP based source, ostensibly because patients would see different physicians for biopsy.
  2. For US claims, we see ~40% of patients also fall out due to this inclusion rule. Japan claims, the attrition is ~70%.
  3. If we remove the biopsy completely, attrition is of course minimal. I’d like to look at our distribution of days from index to biopsy in our data as well to see if +/- 30 days is still the best choice.

Thanks,
Ajit

1 Like

As @mmayer mentioned, definite diagnosis of prostate cancer is based on histopathological confirmation and through biopsy. But to your point, this is an area where clinical expertise can help the most. How are patients actually managed in the clinic is not always reflected in guidelines and cannot be easily found through literature search. We discussed this at length with experts in the filed and their recommendation was to include PSA>50 as an additional criteria for prostate cancer diagnosis. We can easily test this out and see how this will impact our cohort counts and the diagnostics.

t