OHDSI Home | Forums | Wiki | Github

Cyclops Discrepency with coxph log-likelihood calculations

Hello all,

I have a dataset that I was hoping to run cox PH regressions on using Cyclops, but in the process of comparing results against the same regression with coxph I found that I was getting different results. The coxph results are agreeing with previous SAS modeling I’m validating against. It seems to be coming down to the two functions returning different values for the log-likelihood. I’m getting values in the scale of -67000 with coxph and -91000 with Cyclops.

I’ve been running the two with every output and modeling option I can find, but I can’t figure out where the two are differing. Its not an issue with how they are handling ties or normalization. They both report the same number of rows and events. I’m inputting the same data.frame and formula object to both functions. To my understanding the log-likelihood should have a similar computation no matter what library I’m using, its just a difference in how the iteration is performed. It seems way too far off to be some kind of simplification or rounding that Cyclops is doing.

I wanted to run my situation by the forum to see if anyone more experienced has a suggestion of where to look.

Thank you for your time.

Small update, I was doing some testing and found that my issue might be with the formula. As is my formula is roughly “Surv(age_start, age_stop, status)~…” where i’m giving R the start and end of each interval and the status within it and using some list of covariates. I was trying to use subsets of my dataset and found that as long as i’m using an interval i’m getting completely off results. However if both are using just a single time, then Cyclops and coxph match much closer.

I was testing different builtin datasets and found there was no issue with an interval vs a single time in other dataframes. Is there something particular about the interval notation that could mess Cyclops up that much?

Another update. I think the issue is that Cyclops is treating my interval as [0,time2) instead of [time1,time2). I was testing with the survival::cgd dataset and found that coxph and cyclops disagreed for datasets where the interval duration was much less than the ending time and agreed for datasets where the interval duration was much larger than the ending time. My original data is yearly observations, so that was why it was so far off. Is this an issue anyone else has resolved or am I better off just switching libraries?

t