OHDSI Home | Forums | Wiki | Github

Standardized exchangeable cohort definition - Polling interest

Friends:

Recently, we had a bunch of industry folks come together, hosted by J&J, to discuss OHDSI and what is needed to help adoption and drive use cases. We ended up with a good list of challenges. I am sure the public sector has exactly the same kind of belly ache, so it makes sense to me to deal with this at the community level.

One of the high priority items is the ability to define cohort definitions in a standardized way, to share them and to use them building libraries. We have a whole lot of Cohort Builders based on OMOP CDM floating around, amongst them of course ATLAS (used to be Circe). It does have a cohort definition representation in JSON. Some of the other Cohort Builders have similar solutions, like QuintilesIMS’s E360. However, none of it is public, transparent, or has formal support and maintenance.

The beauty of the OMOP CDM is that you can do such a thing - globally and across a large number of different types of data. But Is anybody thinking about this or working on it, so that we could create a community solution?

1 Like

If you consider how open source software effort works, there is at least one possible venue for being able to make a living, i.e. selling support contracts. Within the open source community “free” means free to copy, use, modify; not necessarily completely free of charge.

Need to come up with some idea how someone’s time can be rewarded.

@Christian_Reich,

Great topic and very timely. Odysseus is actively working on creating a framework to enable Arachne platform integration with various cohort builders and thus this problem is very important to us.

We have discussed a possible approach with @Frank and his team and other folks on OHDSI Architecture calls on a number of occasions and started to outline a set of common metadata attributes that could be used to make a cohort inter-exchangeable and platform independent. In that, we are thinking that there are metadata attributes that are used to describe general cohort attributes (e.g. name, description, author, date time stamp, ID etc…) as well as the actual cohort design definition.

Today, we have started our development by focusing on integrating Arachne with ATLAS with an intent of outlining a model that can propose to the rest of the community as version 0.1. To minimize our initial effort, we are taking a cohort definition as outlined by ATLAS and wrapping it into the more general JSON envelope that contains those common metadata fields mentioned above. It should enable to transport the ATLAS defined cohort as a message. We have also started thinking about a design for the execution engine that would translate this cohort message into a platform specific executable code, similarly to SQL Renderer.

I would be happy to assist in creating a community of work around this problem, possibly as one of the sub-work streams within the OHDSI Architecture WG.

1 Like

@Greg, @Frank:

Very nice. Where does that work happen? In a smoke-filled back room, or somewhere for the community to see?

We have started to discuss and define it in our Thursday OHDSI Architecture calls. But the idea of a smoke-filled back room sounds pretty attractive - it probably would be even more effective than a WebEx call :slight_smile:

With all seriousness though - I think work like this could be done very effectively in a face to face setting, something for us to think about organizing?

It is an interesting idea. In my opinion, the first step in doing this is to precisely define “cohort”. We build cohorts as one part of our study builder. We think in terms of creating, saving and modifying studies, not cohorts. If people want to exchange cohorts, then it should be done in a way that people conducting studies know what has been done, and what needs to be done, about the following issues.

Simple questions include

  • incident or prevalent cohort design
  • inclusion/exclusion criteria
  • date restrictions on the observation period
  • date restrictions on the index date
  • specific enrollment periods to be selected

More complicated ideas include

  • what kinds of “lookback” intervals are defined (min, max, both?)
  • whether maximum lookback periods vary by inclusion/exclusion criteria

Even more complicated (at least in terms of implementation)

  • is the incident cohort defined by the “first event” ever, or just the first qualifying event (defined by selection criteria)

I realize this is a bit off in the weeds from the initial question, but I hope it is helpful in figuring out how to do this. I like the idea of being able to get a “cohort” from a data provider so that I know whether/how to conduct research using it.

2 Likes

Are there JSON ATLAS cohort definition files available as reference samples?

Thanks

You can generate them in http://www.ohdsi.org/web/atlas. Click the “Define a new cohort” button and you can mess with the various primary events, inclusion criteria, and time windows. Export -> JSON will then expose the cohort’s JSON rendering.

1 Like

You can also export the SQL to see how it will execute.

1 Like

@Chris_Knoll, @Ajit_Londhe, thank you for your suggestions. My intent is to be able to get an exhaustive list of OMOP column names that participate in some way in the cohort building process. I am not clear on the point, whether or not a JSON object or an exported SQL will only give me columns limited to my specific request.

Would either method be the way to get a definitive list? Trying many different cohort settings seems likely to miss some fields.

Thanks.

@Mark_Danese, it is interesting that your group considers cohorts in terms of studies , which makes perfect sense to me. If one is interested in doing cost analysis on several alternate interventions for a specific condition, different data may be required than if one is simply concerned about efficacy.

In terms of patients ids only, it may be exactly the same cohort.

With respect to “first event ever” vs. the “first qualifying” events, I’d say that would depend on whether the condition of interest is chronic or not. If someone is studying herpes virus, it would be the “first event ever” while an occurrence of candida infection would fall under the “first qualifying” event.

I’ve missed one or two Architecture calls I think, but I eagerly want to be involved in any discussion of standardizing a cohort definition definition.

The Vocab-Viz tool I’m working on (so I am mostly non-existent until June 31) is looking more and more like a ConceptSet builder, and will probably need some tweaks to the current way of defining ConceptSets.

I’m not familiar with there being a specification document available that describes the complete JSON document that is generated by the ATLAS cohort definition editor. Although I do believe it currently supports all the possible fields within the CDM for use in a cohort definition.

Miredot used to give this to us (it even would show the inheritance structure of the different cohort criteria objects) but it doesn’t give us that anymore.

Their SQL export does not work, I defined various limitations on the condition_occurrence table, but the SQL export hangs. Oh well.

Which SQL export do you mean? I just went to a fairly complex cohort definition on the public site:
http://www.ohdsi.org/web/atlas/#/cohortdefinition/98094
and then went to the export tab, and the sql came right up.

The SQL for the link you provided comes up without a problem, but the one I defined just says “Loading.” There is a bug somewhere.

care to share what ‘the one i defined’ is? is it out on ohdsi.org
somewhere? if yes, what’s the link?

I was about to, but whatever the issue was, it appears resolved now. Thanks.

The self-healing capabilities of software. Just like us people. :smile:

1 Like
t