OHDSI Home | Forums | Wiki | Github

Limitations of Atlas' Cohort Definitions tool?

Dear community, as we launch Phenotype Phebruary, maybe this is a good moment to ask, do we know specific limitations of Atlas’ Cohort Definitions from a conceptual/medical standpoint?

For example, it seems to me it is not possible to express “increase in measurement value.” For example:

  • “HbA1c larger than last available record”

A similar limitation would be phenotypes that require a computed value, e.g.:

  • “compute BMI, find patients with BMI > 30”
  • “compute Child-Pugh score, find patients with score > 7”
  • “HbA1c larger than average of prior 6 months”

It can be hard to try to claim something is not possible, specially when you don’t regard yourself as an expert. But perhaps we could allow the benefit of the doubt here, and share barriers you seem to have encountered along the way.

Kind regards to everyone.

1 Like

Thanks @fabkury , this is a great thread to initiate as part of PhenotypePhebruary:).

I’m hoping that, as we are building phenotypes and consider using ATLAS, we can expose ‘limitations’ based on real need (rather than just unavailable features that aren’t actually required to complete an applied task).

That said, some of the limitations are listed in issue tracker on github for CIRCE (the cohort component underneath ATLAS) that @Chris_Knoll maintains: https://github.com/OHDSI/Circe/issues. And if anyone encounters other limitations, this would be a good place to capture them.

As it relates to phenotypes that I personally have had to build for which I KNOW that ATLAS can’t (yet) handle:

  • currently, the entry event in ATLAS is always the START_DATE in each of the CDM tables. The user can’t change that to END_DATE. So, for example, you can create a cohort of ‘hospital admissions’ but not ‘hospital discharges’ (meaning cohort_start_date = visit_end_date).
  • Also, the user cannot add an offset to the entry event. So, for example, if you wanted a simple phenotype for pregnancy episodes amongst women with live births, you can’t make cohort_start_date = (‘live birth’ condition_start_date) - 9 months. When you use future markers to infer prior events, this could be helpful (and in our more advanced ‘pregnancy episode’ algorithm that we previously published, it would likely enable us to implement in ATLAS rather than having to rely on our current code)
  • Currently, the exit strategy allows for selection of one event persistence option: ‘end of continuous observation’ OR ‘fixed duration relative to initial event’ OR ‘end of a continuous drug exposure’. I’ve encountered some situations where I want to end after continuous observation of non-drug events (for example, if you are dealing with a condition and want to try to end it after the last of follow-up care, you might want to era-fy the data after you no longer see any symptoms, conditions, treatments or procedures indicating active management of the disease). Also, I had a recent edge case where I wanted to use multiple event persistent options (example from our GLP1 reproducibility challenge cohort: I want a person to leave an exposure cohort after EITHER they stopped exposure OR had made to to 365d post-index (whichever came first)).

Other ‘limitations’ that I am aware of but have never personally impacted any of my phenotype development work: ATLAS doesn’t cover all CDM tables: for example, the new EPISODE tables added in v5.4 are not used. It also doesn’t support the COST, NOTE, NOTE_NLP, FACT_RELATIONSHIP. And as @fabkury points out, ATLAS doesn’t do generic computation of composite scores or derived values across sets of measurement values.

All that said, I have to say that I’m always extremely impressed with what @Chris_Knoll has built, because I’ve now made hundreds, maybe thousands, of cohorts using ATLAS and it is quite rare (<1%) that I reach the conclusion of ‘can’t be done’, sometimes it takes a little creativity to model but often I find a solution for the data that I’m working with. So, on expectation, ATLAS is a great place to start if you are starting your journey to build a rule-based phenotype algorithm.


Dear @Patrick_Ryan, thank you for the amazing, rich response.

It seems to me that a lot comes down to the inability to express correlations between CDM rows other than by date values. Please correct me if I’m wrong. Atlas will let you identify one record (one row) in a CDM table by:

  • a concept ID field compared against constants (concept sets), or sometimes joined to another row,
  • a value field compared against constants (concept sets or literal numbers),
  • a date field compared against constants (rare), or against dates of other rows.

So you can’t express relationships between values from two rows, such as “increase in intraocular pressure between visits.”

About scores, such as “disseminated intravascular coagulation score >= 4,” if the clinical computation is simple enough it may be theoretically (but probably not practically) possible to “unwrap” the phenotype into a test for the occurrence of any of the possible ways a group of records can add up to >= 4. I suspect this kind of “unwrapping” may ring a bell in some people’s memories, in the context of calculating scores or in other contexts within phenotyping. The point is this: it’s not always easy to claim something is not possible to express.*

*At conceptual level, I suppose one can argue that only a Turing-complete programming language can promise you it will be able to express any (computable) logic you want whatsoever, and that any GUI tool is usually much below that level of expressive power. So if we cover the vast majority of desired phenotypes, our “phenotyping model” has been a success.

I see Patrick’s point that true barriers (“impossible” phenotypes) are rare, which is true, but here’s my point. Even when you do find a “clever way to do it,” this is also interesting to talk about. Are you sure the workaround is guaranteed to give the same result as originally intended? Or is it an educated approximation? What functionality exactly are you missing, that would allow you to define the cohort without workarounds?

A similar kind of barrier is when you’re developing a phenotype, and the query seems to never finish, and you decide to use an educated approximation – a workaround. Such situations may also give insight into the edges of our models for electronic phenotyping, although query run time (or time complexity) is a much more complex topic. It would be great if we could build a shared understanding of what are the edges of our models, from our real-world experience. I know of at least one phenotyping tool (https://academic.oup.com/jamia/article/28/7/1468/6169466, not FOSS however) that is importantly different when it comes to query complexity, virtually dispensing the need for workarounds regarding query run time. I do not mean to advertise ACE here (much less denigrate Atlas), I am just pointing that our paradigms can sometimes insidiously impart noise onto our results, so let’s talk about them. :slight_smile:

Kind regards!

In my PhD work, I considered query be consisting of query elements. I defined a problem of inter-element parameter passing.

Any query that relies on population SQL query as underlying paradigm will be subject to certain limitations.
Pubmed central link to article: Implementation of workflow engine technology to deliver basic clinical decision support functionality