Cancer statistics across OHDSI networks: ONCO-ACHILLES

SCYou · July 10, 2018, 10:42pm

Dear colleagues,

As I mentioned earlier, we decided to convert whole Korean cancer patients data into CDM from National Insurance data (2007-2017).

I will extract three components of information from this as the first research:

Quarterly incidence of each cancer from 2008-2017 according to the birth year (5-year base) and sex (and hopefully ethnic groups)
All-cause mortality within 1-year, 3-year and 5-year after cancer diagnosis from 2008-2017 in these quarterly cohorts according to the birth year and sex (and ethnic group)
Whole medical expenditure, cost amount paid by insurer, cost amount paid by the patients within 1-month, 6-months, 1-year, 3-year and 5-year after cancer diagnosis from 2008-2017 in these quarterly cohorts according to birth year and sex.

This looks like ACHILLES (Automated Characterization of Health Information at Large-scale Longitudinal Evidence System), I want to call this project ‘Onco-Achilles’

We can answer to the following questions by this:

Identify trends of cancer incidence according to birth year and sex. Like Jemal et al., suggested in the NEJM paper, we can describe incidence of lung cancer among young women compared to men across OHDSI network (I feel the incidence of lung cancer and breast cancer become higher in young Korean women).
Compare the survival trends of each cancer according to the birth cohorts, insurance types and different countries.
Describe the trend of total cost, cost paid by insurer, cost paid by patients after cancer diagnosis.
Overall, identify the strength and limitations of health care system for oncology patients in terms of incidence, mortality and economic burden
The overall impact of novel treatment on the survival and the cost for cancer patients.
Definitely, the result from this study will be stepping stone for future oncology research in OHDSI.

If you can join this research, please let me know. @Gowtham_Rao @Patrick_Ryan @rchen @rimma

estone96 · July 10, 2018, 8:12pm

@SCYou
Great job!!!

SCYou · August 24, 2018, 2:07am

Hi all,
We’ve just started ONCO-ACHILLES project in Korea

As I mentioned above, we will extract whole data including cost of patients with cancer from National Health Insurance database. I had aimed to extract the whole cost after cancer diagnosis. But I think we can estimate net-cost for cancer if we use large-scale propensity score matching.
If we match the target cohort (patients with specific cancer) with general population by using age group / gender/ ethnicity / all previous medical history except the specific disease (specific cancer) / previous medical cost, we can estimate the net cost as differences in cost between target cohort and comparator cohorts (Also we can estimate the difference of survival by this). In the previous study, Yabroff 's group estimated
the net cost by using matching target and control for age/ gender and socio-economic status only (https://www.ncbi.nlm.nih.gov/pubmed/19536010).

By this, we can generate evidence for the net-medical-cost of various diseases in large-scale way across the world.

How do you think about this, @schuemie @msuchard @Gowtham_Rao @Patrick_Ryan ?

Mark_Danese · August 24, 2018, 3:26am

Just keep in mind that cost (amount paid) is a cumulative quantity over time. The methods for estimating cost can be tricky due to censoring, but there are methods using inverse probability of censoring weights. Note that when a person dies, they are NOT censored when analyzing costs (you know their total cost if they die). Also, there are issues of adjusting for different years (to account for inflation) and different geographies (within and across countries).

Also keep in mind that non-cancer patients get cancer, so you have to define your control group carefully.

To be clear, I am not saying this is a bad idea. It is, in fact, an interesting idea. I just want to help point out some methodological issues to consider.

Regards,
Mark

SCYou · August 24, 2018, 6:43am

Indeed, cost for death, adjusting for different years should be considered.
Also, I should select comparator cohort who don’t have the specific disease after the index, too.

Thank you for the helpful comment, @Mark_Danese !

Christian_Reich · August 28, 2018, 10:52am

You really want to do such a complex endeavor, @SCYou? You are going to have two things against you:

It is really hard to distinguish what is cancer related, and what isn’t. Chemotherapy is easy, all treatment to manage the side effects is not. Think the 3 infusion units for nephrotoxic chemo - that’s saline.
Usually, the cancer related costs dwarf everything else. Which means you will find out that the net delta is almost equal to the total.

SCYou · August 29, 2018, 11:13pm

@Christian_Reich Good point! As you mentioned, if diabetic patients with cancer start to use gabapentin, we don’t tell this is because of diabetic neuropathy or cancer-related pain. Moreover cancer itself can aggravate progression diabetes mellitus and diabetic neuropathy.

That’s why I want to match ‘cancer patients’ and ‘never-ever-cancer patients’ by using CohortMethod package (or large-scale propensity score matching).
If we can match two diabetic patients with similar stage of diabetes, but one with cancer and one without, and then I think we can estimate net cancer cost by subtraction of their medical costs.

(This method is a little bit advanced one from the Yabroff’s paper
https://www.ncbi.nlm.nih.gov/pubmed/19536010 )

Christian_Reich · August 30, 2018, 10:31am

@SCYou:

If somebody can figure it out then you.

You need to overcome a bunch of problems:

What’s the index event? If you use the cancer, your comparator will be non-cancer, which has the problem of index date. If it is diabetes onset and you wait for the cancer to happen you need to have data with long follow-up time, because it won’t happen anytime soon by chance.
Usually, when a patient falls ill with a neoplastic disease, all attention and treatment is focussed on that. In fact, the chemo will prevent a whole bunch of other treatments (e.g. surgery for microvascular complications). So, even though both populations start the same and are nicely matched, after the cancer diagnosis you probably can’t make the assumption that you can just subtract one from the other.
You still might run into a trivial result, where the cancer treatment with all the repeated diagnostic, surgery, chemo and radiation is so much larger that the diabetes treatment (anti-diabetic drugs or insulin) that the comparison ends up big-sum minus almost-nothing = big-sum.

SCYou · August 30, 2018, 11:02am

First of all, thank you for the helpful comment, @Christian_Reich

Indeed the index event would be the major problem of this design. Since I’ll extract the incidence or prevalence of cancer patients quarterly, I can extract large comparator set as age/sex matched population without cancer before and after the index date (The index date will be the first day of the quarter). And then these patients can be matched with cancer patients by large-scale propensity score matching.

As you said, doctors usually target higher glucose level in diabetic patients with cancer than in diabetic patients without cancer, which means diabetic patients with cancer requires less anti-diabetic medication or tests. I cannot overcome this problem…

That’s true.

As you know, there is no perfect way to estimate ‘net cost’. The strength of this approach is the ‘scalability’. We can apply this method to estimate ‘net medical cost’ for any other chronic diseases, like gout, psoriasis, rheumatic arthritis, lupus, and so on. Or we can estimate the cost caused by diabetes mellitus in cancer patients (in this case, we need to set many excluded_concept_ids for matching though).

Christian_Reich · August 30, 2018, 11:13am

What I don’t understand why you wouldn’t make this a self-controlled design: The same patient before and after. All the chronic diseases are still there. Why cohort with all the trouble?

SCYou · August 30, 2018, 11:21am

@Christian_Reich Of course, it would be another good method!
We can try both

rwpark · August 30, 2018, 12:08pm

Good suggestion! It deserves to try. One thing to consider is that self-controlled design is not good for measuring long term effect,

Mark_Danese · August 30, 2018, 1:48pm

You can do a self-controlled design – it depends on what you are trying to do. But most people use a control cohort of some kind. We try and match on location and date of diagnosis (using a pseudo diagnosis date) to try and capture things that are hard to “control” for. They we use statistical control for these kinds of analyses.

Part of the problem is that cancer isn’t an on-off thing. Utilization (and cost) for cancer begins prior to whatever day you pick for “diagnosis”. Also for small samples, self-control can be unstable. Also, self-control requires a lot more observation before diagnosis. Often you only have a year or two if you are lucky, at least in the data I work with. So, as Rae says, it doesn’t really work well for long-term costs.

Mark_Danese · August 30, 2018, 4:16pm

@Christian_Reich makes some interesting points. Often, we don’t worry about having a reference group for costs for the reasons cited. Big cost - little cost = big cost.

However, it isn’t always that simple. For metastatic cancer, survival is short. So while average survival might be 2-3 years at best, even at $100,000 per year, that is only $200,000 to $300,000. The same person without cancer may have a 20 year life expectancy at $5,000 per year (imagine an older person with comorbid conditions and some expensive medications). So, their lifetime costs might be $100,000. In that case, the little cost isn’t so little. Another way to view this is that there is also a cost of surviving. Depending on the question, this may or may not matter. See below from Stokes, et al.