OHDSI Home | Forums | Wiki | Github

Treatment cohorts based on closest indication to index


I’ve been trying to replicate an algorithm in Atlas that would specify a treatment cohort based on indication. The logic for assigning the indication relies upon finding the most proximal indication relative to the index date. For instance in this rough sketch, assuming conditions A, B, and C occur within a year prior to index day 0. Patients should be in our Indication A cohort because A occurs closest to index:


Here, patients should be in indication cohort B because B occurs closest to index:

Is there any way in Atlas/CapR to handle this algorithm? I think the challenge is needing a temporal window between the indication of interest and the index date 0 in order to look for other indications.

For Indication cohort A, we can nest criteria for 0 occurrences of Indication B relative to A, but it’s not bounded between A and index day 0, we can only look relative to A’s date.

Any thoughts?


Short answer is no: I don’t think it’s possible for the following reasons:

  1. Cohorts are defined based on a specific criteria, and so you can’t do something like ‘if A is closest to index, assign cohortID 100, and if B is closest, assign cohortId 200’ You’d have to make separate cohorts for your different indications, as you suggest with your nested criteria approach.

  2. As you pointed out: the query/count/group logic doesn’t let you refer to 'external date information (such as the B nested inside A can’t refer to the index date, only A’s start/end dates.

We’ve done complicated things outside of Atlas/Circe for the type of rules that you’re describing: for example, we tried to distinguish a IBD of UC or Crohn’s .based on which happened most recent or based on the number of diagnosis codes prior to index. We created a sort of ‘scoring’ mechanism to determine which category the person belongs to and then put them into the correct cohortId based on the category. This doesn’t help you with your problem, but I’m just sharing my experience.

I’m scratching my chin about how I’d extend the query/count/group function to do what you want, or introduce a new function (I’m not sure what that is yet). But as you build a custom solution to this query, please share the logic, and even better, describe how you might want to use atlas to implement your logic, and that might provide an idea about how to extend the functionality.

1 Like

Thanks Chris! I don’t have the technical implementation in mind yet…maybe after a few coffees :slight_smile: . But, before getting into the nitty gritty, conceptually, is altering CIRCE to allow for nested criteria to depend on either its direct parent or the overall index date something that you have initial concerns about?

Certainly, there’s a challenge of compatibility. Looks like currently we only have a CDM Version attribute, but that doesn’t ensure that the installed version of CIRCE can handle this JSON. Maybe a CIRCE version attribute would need to be added, and then checked for in Atlas/CapR/other clients.

This sounds like a different topic? But I’ve avoided embedding version information into the circe expression because in some circles, making these documents ‘versionless’ has some advantages. For example, let’s say you have 10,000 cohort definitions. let’s say you update circe to a new version that adds new featuers, but its fully backwards compatible. Do I need to update 10,000 cohort definitions to use the new version? I know there’s arguments on both sides here, so I don’t want to get into it here, but I haven’t had any issues with verson-less cohort expressions. But, if you have encountered an issue, I’d ask that you post the issue up to github.

I guess I’m worried that if we add logic to CIRCE to handle this need, sites with older CIRCE versions couldn’t leverage this definition. They’d import into their Atlas and the parser might error out, or alter the intended definition. But maybe the minimum CIRCE version is just documentation that needs to be shared along with the JSON…

It’s just a technical concern: nested criteria can nest multiple times, and I’m not sure what the best way to go to address ‘which parent’ in the nesting it should use as the index date. It’s a UI concern (how do you present the user with the choice) and a back-end concern (how do I reference the subquery alias so that they are unique for each child so that children can reference a parent context subquery)…plus what is the performance consideration when performing these sorts of operations.

The nested criteria in circe-be is both elegant in that the same recursive structure that appears in the JSON is the same recursive structure that yields the queries. But, it lacks the flexibility to arbitrarily reference prior recursive iterations of the query.

1 Like

Yes, I understand, we’ve been very careful that you can take an older definition and apply it to a new version of circe-be and it will work. I think you are describing taking a newer definition and running it on an older implementation, and what would happen in that case is the new attributes found in the new version will be silently ignored in the older version. So, I totally understand your concern about that happening, but the trade off would be to have to assign a version to each expression and then update the version of all your expressions when you want to run it on a newer version of the library. I feel like it’s a more typical use case to take an old definition and run on a new version of the library vs. getting a definition based on a new version of the library and wanting to take it backwards and run it on an older version.

So, thinking on this further, I don’t think there’s any harm in tagging an expression with a version number that records what the ‘current’ version of Circe is when you save. I think this information could be used to issue a warning that when you generate cohort sql, if the version doesn’t match you can log a warning. We can also code a ‘checker’ on the server side to validate the version in the expression is mismatched compared to the installed version of the library. I think this is pretty straight-forward.

A few questions on it: when does the version get updated in the expression? Should it always happen as part of saving the cohort expression to the db? Does the user need to make a chocie? I favor adding a step to the persistence mechanism that saving the circe expression will save it with the ‘currently installed’ version of circe. if a save action occurs, then someone probably looked at it and made the adjustment. in the context of 'read only ’ expresisons where they are part of a study package or something, you’ll never update the version info in the expression (unless you want to take additional steps).


I agree with this approach. Saving is an explicit step, so it makes sense to me that on save, the definition and underlying expression are valid for that CIRCE version (at least).

Hi @Ajit_Londhe and @Chris_Knoll this is a very interesting topic for me too! Did you manage to figure somthing out? I can conrtibute little from the developer side, but was wondering if one could think of a package that assigns indication to a prescription in a probabilistic way based on time proximity as you suggest and other factors like prior analog prescriptions related to the same indication (e.g: metformin close to T2D and years later SGLT2 close to T2D) or absence of similar records in the past (e.g: first time CKD vs recurrent T2D in the past with no Tx changes, likely that CKD triggered SGLT2i). Can one think of developing predictive models in this direction? Just brainstorming a little to resucitate the discussion :slight_smile: .