OHDSI Home | Forums | Wiki | Github

Limitations of Atlas' Cohort Definitions tool?

Dear community, as we launch Phenotype Phebruary, maybe this is a good moment to ask, do we know specific limitations of Atlas’ Cohort Definitions from a conceptual/medical standpoint?

For example, it seems to me it is not possible to express “increase in measurement value.” For example:

  • “HbA1c larger than last available record”

A similar limitation would be phenotypes that require a computed value, e.g.:

  • “compute BMI, find patients with BMI > 30”
  • “compute Child-Pugh score, find patients with score > 7”
  • “HbA1c larger than average of prior 6 months”

It can be hard to try to claim something is not possible, specially when you don’t regard yourself as an expert. But perhaps we could allow the benefit of the doubt here, and share barriers you seem to have encountered along the way.

Kind regards to everyone.

1 Like

Thanks @fabkury , this is a great thread to initiate as part of PhenotypePhebruary:).

I’m hoping that, as we are building phenotypes and consider using ATLAS, we can expose ‘limitations’ based on real need (rather than just unavailable features that aren’t actually required to complete an applied task).

That said, some of the limitations are listed in issue tracker on github for CIRCE (the cohort component underneath ATLAS) that @Chris_Knoll maintains: https://github.com/OHDSI/Circe/issues. And if anyone encounters other limitations, this would be a good place to capture them.

As it relates to phenotypes that I personally have had to build for which I KNOW that ATLAS can’t (yet) handle:

  • currently, the entry event in ATLAS is always the START_DATE in each of the CDM tables. The user can’t change that to END_DATE. So, for example, you can create a cohort of ‘hospital admissions’ but not ‘hospital discharges’ (meaning cohort_start_date = visit_end_date).
  • Also, the user cannot add an offset to the entry event. So, for example, if you wanted a simple phenotype for pregnancy episodes amongst women with live births, you can’t make cohort_start_date = (‘live birth’ condition_start_date) - 9 months. When you use future markers to infer prior events, this could be helpful (and in our more advanced ‘pregnancy episode’ algorithm that we previously published, it would likely enable us to implement in ATLAS rather than having to rely on our current code)
  • Currently, the exit strategy allows for selection of one event persistence option: ‘end of continuous observation’ OR ‘fixed duration relative to initial event’ OR ‘end of a continuous drug exposure’. I’ve encountered some situations where I want to end after continuous observation of non-drug events (for example, if you are dealing with a condition and want to try to end it after the last of follow-up care, you might want to era-fy the data after you no longer see any symptoms, conditions, treatments or procedures indicating active management of the disease). Also, I had a recent edge case where I wanted to use multiple event persistent options (example from our GLP1 reproducibility challenge cohort: I want a person to leave an exposure cohort after EITHER they stopped exposure OR had made to to 365d post-index (whichever came first)).

Other ‘limitations’ that I am aware of but have never personally impacted any of my phenotype development work: ATLAS doesn’t cover all CDM tables: for example, the new EPISODE tables added in v5.4 are not used. It also doesn’t support the COST, NOTE, NOTE_NLP, FACT_RELATIONSHIP. And as @fabkury points out, ATLAS doesn’t do generic computation of composite scores or derived values across sets of measurement values.

All that said, I have to say that I’m always extremely impressed with what @Chris_Knoll has built, because I’ve now made hundreds, maybe thousands, of cohorts using ATLAS and it is quite rare (<1%) that I reach the conclusion of ‘can’t be done’, sometimes it takes a little creativity to model but often I find a solution for the data that I’m working with. So, on expectation, ATLAS is a great place to start if you are starting your journey to build a rule-based phenotype algorithm.


Dear @Patrick_Ryan, thank you for the amazing, rich response.

It seems to me that a lot comes down to the inability to express correlations between CDM rows other than by date values. Please correct me if I’m wrong. Atlas will let you identify one record (one row) in a CDM table by:

  • a concept ID field compared against constants (concept sets), or sometimes joined to another row,
  • a value field compared against constants (concept sets or literal numbers),
  • a date field compared against constants (rare), or against dates of other rows.

So you can’t express relationships between values from two rows, such as “increase in intraocular pressure between visits.”

About scores, such as “disseminated intravascular coagulation score >= 4,” if the clinical computation is simple enough it may be theoretically (but probably not practically) possible to “unwrap” the phenotype into a test for the occurrence of any of the possible ways a group of records can add up to >= 4. I suspect this kind of “unwrapping” may ring a bell in some people’s memories, in the context of calculating scores or in other contexts within phenotyping. The point is this: it’s not always easy to claim something is not possible to express.*

*At conceptual level, I suppose one can argue that only a Turing-complete programming language can promise you it will be able to express any (computable) logic you want whatsoever, and that any GUI tool is usually much below that level of expressive power. So if we cover the vast majority of desired phenotypes, our “phenotyping model” has been a success.

I see Patrick’s point that true barriers (“impossible” phenotypes) are rare, which is true, but here’s my point. Even when you do find a “clever way to do it,” this is also interesting to talk about. Are you sure the workaround is guaranteed to give the same result as originally intended? Or is it an educated approximation? What functionality exactly are you missing, that would allow you to define the cohort without workarounds?

A similar kind of barrier is when you’re developing a phenotype, and the query seems to never finish, and you decide to use an educated approximation – a workaround. Such situations may also give insight into the edges of our models for electronic phenotyping, although query run time (or time complexity) is a much more complex topic. It would be great if we could build a shared understanding of what are the edges of our models, from our real-world experience. I know of at least one phenotyping tool (https://academic.oup.com/jamia/article/28/7/1468/6169466, not FOSS however) that is importantly different when it comes to query complexity, virtually dispensing the need for workarounds regarding query run time. I do not mean to advertise ACE here (much less denigrate Atlas), I am just pointing that our paradigms can sometimes insidiously impart noise onto our results, so let’s talk about them. :slight_smile:

Kind regards!

In my PhD work, I considered query be consisting of query elements. I defined a problem of inter-element parameter passing.

Any query that relies on population SQL query as underlying paradigm will be subject to certain limitations.
Pubmed central link to article: Implementation of workflow engine technology to deliver basic clinical decision support functionality

Thank you for this comprehensive list to highlight some areas where Atlas might be of limiting nature. I have a few questions to follow-up on this:

  1. With regards to exit strategy, is there a way to censor based on age criteria? I do not think this is available in the interface, but it would also be helpful to see how to implement that on SQL.
  2. When trying to select patients in a database that are active within a certain enrollment period at a certain age, do we implement that using observation period table? It seems at the SQL code that when age if specified for observation period, Atlas only includes those that have the specified age at the start of the observation period such that the age calculation is anchored at the observation period start date. Is there any way to tackle that issue in the UI or is there something that I am misinterpreting?

Thanks a lot!

Censoring Events will trigger a person to exit the cohort at the specified time. The limitation is that you need to find an event in the data to trigger the exit. You could specify a censor event using a Visit with age criteria specified for that, but the problem there is that if the person didn’t have a visit, then the person will persist in the cohort longer than you intend. What we would need to implement is a sort of ‘demographic era’ where you can specify an age range and the query will yield the set of people in the database (based on observation period) that are in the data for the period of time that corresponds to the specified age range.

You can so this in Atlas using Observation Period as entry events. You use the ‘user defined dates’ option to specify the event start and end date, where the person must be in the database that completely covers the user specified start/end. In this way, you could define a cohort entry event of soemthing like 2019-01-01, and everyone who is ‘active’ in the data on Jan 1 2019 will be in the cohort with a cohort entry event starting on 2019-01-01 and ending on 2019-01-01 (or a later date if you want to enforce that they are in the data for a specific date range). Then, you can specify a ‘fixed date offset’ exit strategy to keep people in the cohort for a fixed number of days (ex: 365 days). With a 365d cohort persistence, people who are in the data past 2020-01-01 will be censored at 12/31/2019, and people who exit the database before (ie: their obervation_period_end_date is earlier than 2020-01-01) would be censored at their observation_period-end_date (cohorts will always censor at observation_period_end_date)

Thanks a lot for your quick response, very helpful! I just want to make sure that I got it right.

Does it mean that this is not yet a function on Atlas? Is there a way to alter that in the SQL code or are there any examples of it from previous work on Atlas?

This does make a lot of sense. I would essentially want to cover a period longer than a year but individuals do not need to be enrolled at that exact age when their observation period start - I just want to see the count of those who were active at some point within that time frame being aged within the predefined age interval. Based on what you suggested, I tried to somehow apply it on the Atlas UI with an example of capturing those with an observation period in the database being 18-65 between 2009-2013. I would really appreciate if you think this seems correct: https://atlas-demo.ohdsi.org/#/cohortdefinition/1779854/definition

Thanks a lot again!

It’s not a function in atlas, but we’ve written custom sql to generate person-level episodes based on age criteria:

with ages as (
  SELECT 2 as age_id, 0 as age_low, 5 as age_high
  SELECT 3 as age_id, 6 as age_low, 17 as age_high
  SELECT 4 as age_id, 18 as age_low, 34 as age_high
  SELECT 5 as age_id, 35 as age_low, 54 as age_high
  SELECT 6 as age_id, 55 as age_low, 64 as age_high
  SELECT 7 as age_id, 65 as age_low, 74 as age_high
  SELECT 8 as age_id, 75 as age_low, 84 as age_high
  SELECT 9 as age_id, 85 as age_low, 114 as age_high
genders as (
  SELECT 1 as gender_id, 8532 as gender_concept_id, 'Female' as gender_name
  SELECT 2 as gender_id, 8507 as gender_concept_id, 'Male' as gender_name
SELECT ages.age_id*10+genders.gender_id as subgroup_id, age_low, age_high, gender_concept_id, gender_name
INTO #subgroups
FROM ages, genders

INSERT INTO @target_cohort_table (cohort_definition_id, subject_id, cohort_start_date, cohort_end_date)
SELECT s1.subgroup_id AS cohort_definition_id, op1.person_id AS subject_id,
  CASE WHEN YEAR(op1.observation_period_start_date) - p1.year_of_birth >= s1.age_low
    THEN op1.observation_period_start_date
    ELSE DATEFROMPARTS(p1.year_of_birth + s1.age_low,1,1) END
  AS cohort_start_date,
  CASE WHEN YEAR(op1.observation_period_end_date) - p1.year_of_birth <= s1.age_high
    THEN op1.observation_period_end_date
    ELSE DATEFROMPARTS(p1.year_of_birth + s1.age_high,12,31) END
  AS cohort_end_date
FROM @cdm_database_schema.observation_period op1
INNER JOIN @cdm_database_schema.person p1 ON op1.person_id = p1.person_id
INNER JOIN #subgroups s1 ON DATEFROMPARTS(p1.year_of_birth + s1.age_low,1,1) <= op1.observation_period_end_date
  AND DATEFROMPARTS(p1.year_of_birth + s1.age_high,12,31) >= op1.observation_period_start_date
  AND p1.gender_concept_id = s1.gender_concept_id

drop table #subgroups;

This is in OHDSI-Sql syntax, so you’d have to render and translate this sql using SqlRender.

You can run this query on your own CDM, and you’ll get a set of cohort records (if you just run the SELECT at the end) that gives you each person’s membership in each age/gender category that’s defined in the temp table #subgroups. I would use a similar query as this to define a ‘demograpic era’ criteria for building entry events based on age and gender for a period of time.

That is correct except there’s a ‘bug/feature’ in circe that may catch you unaware: when using ‘age’ criteria, it checks the age based on the record’s start date (ie: the observation_period_start_date) and not the user-defined start date.

The fix is easy, tho: move that logic to an inclusion rule like this:

What this does is it takes each entry event (which you made as Jan 1 of each year) and a person will only be included in he cohort at that date if they are in the right age. The reason why it works is that the inclusion rule will work off the dates yielded by the entry events (which you defined the start and end dates) vs. if you use the criteria directly on the observation period criteria, age criteria is based off the start_date of the record.