Limitations of Atlas' Cohort Definitions tool?

fabkury · February 2, 2022, 4:13pm

Dear @Patrick_Ryan, thank you for the amazing, rich response.

It seems to me that a lot comes down to the inability to express correlations between CDM rows other than by date values. Please correct me if I’m wrong. Atlas will let you identify one record (one row) in a CDM table by:

a concept ID field compared against constants (concept sets), or sometimes joined to another row,
a value field compared against constants (concept sets or literal numbers),
a date field compared against constants (rare), or against dates of other rows.

So you can’t express relationships between values from two rows, such as “increase in intraocular pressure between visits.”

About scores, such as “disseminated intravascular coagulation score >= 4,” if the clinical computation is simple enough it may be theoretically (but probably not practically) possible to “unwrap” the phenotype into a test for the occurrence of any of the possible ways a group of records can add up to >= 4. I suspect this kind of “unwrapping” may ring a bell in some people’s memories, in the context of calculating scores or in other contexts within phenotyping. The point is this: it’s not always easy to claim something is not possible to express.*

*At conceptual level, I suppose one can argue that only a Turing-complete programming language can promise you it will be able to express any (computable) logic you want whatsoever, and that any GUI tool is usually much below that level of expressive power. So if we cover the vast majority of desired phenotypes, our “phenotyping model” has been a success.

I see Patrick’s point that true barriers (“impossible” phenotypes) are rare, which is true, but here’s my point. Even when you do find a “clever way to do it,” this is also interesting to talk about. Are you sure the workaround is guaranteed to give the same result as originally intended? Or is it an educated approximation? What functionality exactly are you missing, that would allow you to define the cohort without workarounds?

A similar kind of barrier is when you’re developing a phenotype, and the query seems to never finish, and you decide to use an educated approximation – a workaround. Such situations may also give insight into the edges of our models for electronic phenotyping, although query run time (or time complexity) is a much more complex topic. It would be great if we could build a shared understanding of what are the edges of our models, from our real-world experience. I know of at least one phenotyping tool (https://academic.oup.com/jamia/article/28/7/1468/6169466, not FOSS however) that is importantly different when it comes to query complexity, virtually dispensing the need for workarounds regarding query run time. I do not mean to advertise ACE here (much less denigrate Atlas), I am just pointing that our paradigms can sometimes insidiously impart noise onto our results, so let’s talk about them.

Kind regards!