Since I was the one to open the debate about these definitions, let me propose a synthesis of all that was said to make @apotvien’s life easier. I think we have a good grasp of the elements we want to define, but we have still nomenclature problems with the term Cohort:
Phenotype: A pattern of characteristics in health data (criteria) in a set of people for a duration of time. These observables can be conditions, procedures, drug exposures, devices, observations, visits, cost information, etc.
I think that “pattern” is better than “set”, because it indicates a relationship between the observables or critera (insulin-dependent diabetic: Patient with the Condition diabetes mellitus and being treated with a drug containing insulin).
Phenotype Algorithm = Cohort Definition: A coded set of instructions for approximating a phenotype in a given dataset, which may or may not have complete and accurate evidence about each of the observables and their pattern. Each phenotype can have one or more phenotype algorithms (e.g. T2DM broad, T2DM narrow). The instructions could be heuristic (rule-based) or probabilistic. Heuristic algorithms consist of rules applied to concept sets. Probabilistic phenotypes are implemented using a probabilistic model.**
This is similar to @apotvien|s definition, except there is no more desire involved (desires could be a good thing, but not in the context of these abstract definitions), and that the algorithm doesn’t define members, but rules. And that heuristic rules are also computable, so I took that out. And that the model is probabilistic, not predictive.
Now we need to name the actual instantiated set of members identified through execution of the algorithm. We can (i) call that Cohort, or (ii) we can make Cohort a synonym for Phenotype and call this Cohort Instance. The former means Cohort is the ideal desired pattern of things (insulin-dependent diabetics), the latter denotes an actual set of people and the timelines a certain algorithm or definition has calculated in a database (cohort 123 in database XYZ).
(i) I actually like the idea to use the terms interchangeably. Reason is the avoidance of confusion. Folks who have a hard time calling a drug or device exposure a phenotype can call that a cohort and be happy. Folks who have a hard time calling an outcome a cohort, which is a lot of our traditional epidemiologist friends, can call that a phenotype and also be happy. If we want to be really nice we might even include Rothman’s Population as well. I don’t have a strong feeling about that.
(ii) This how we have used the word Cohort mostly, ATLAS calls it that way (even though the nomenclature in the ATLAS UI badly needs overhauling), and @apotvien et al. proposed it.
Anyway. Whatever we decide:
Cohort/Phenotype Instance (i) or Cohort (ii): An instantiation or execution of the instructions of a Phenotype Algorithm/Cohort Definition against a dataset, resulting in a set of patients and their timelines.
I agree with @Patrick_Ryan that Concept is not a term we want here. Concepts are semantic entities representing medical events or facts, and they are needed for those algorithms.
Now, we still have the precious metals. @japotvien has a Gold Standard Phenotype as “one that is designed, evaluated, and documented with best practices.” What is the “one” thing here? What does it apply to: A Phenotype, as @apotvien has it? Can’t be, because that is an intended ideal we need to approximate, which means, all of them are Gold. A Phenotype Algorithm? Can’t be, because the evaluation and documentation depends on an instantiation. A Cohort (Instance)? That would be the right thing, except it makes it totally not transferrable, and therefore practically useless.
Also, we want Gold. Do we also want to take on Silver? Something that is not fully validated against some truth (the “chart”), but only probabilistically? Bronze - something we pull out of a sleeve after chewing the pencil and scratching our foreheads for a while (which is what 99.9% of what all published phenotypes are today)?