OHDSI Home | Forums | Wiki | Github

Cohort definition based on eligibility criteria free-text

Hi, I am a student researcher at Columbia University, working on the project of seamlessly transform eligibility criteria free-text to OHDSI cohort definition JSON format, here are some questions that I have based on the cohort definition rules:

  1. For “inclusion criteria” and “exclusion criteria”, how to present them separately?
    Currently I did not handle this problem.

  2. There are two attributes that linked to concept: “temporal constraints” (e.g. Has HIV for two years) and “measurement”(has XXX score over 15), would you recommend a way to present these two attribute in the cohort definition?
    Currently I put the “temporal constraints” in the “initial event cohort” part and “measurement” in the “additional criteria” part, but I am not sure if this is the best way to present these two attributes. Your feedback would be indeed appreciated!

  3. For the concept set that does not have attributes related to it (e.g." Has a history of HIV", the concept set “HIV" is extracted, but it has no attribute like “temporal constraints” or “measurement”), would you mind telling me how to present them in the cohort definition?

Thank you!

Reply by Patrick Ryan:

  1. There is no distinction between ‘inclusion criteria’ and ‘exclusion criteria’. Rather, all rules should be specified as ‘inclusion criteria’. So, for example, if there was an ‘exclusion criteria’ of ‘has diabetes’, that can be stated as an inclusion criteria of ‘having 0 occurrences of diabetes’.

  2. There’s a few different dimensions around ‘temporal constraints’. Sometimes a protocol says: 'must not have within ’ (ex: must not have a diagnosis of cancer is past 3 years), in which case you can represent that as: ‘having 0 occurrence of: a condition of cancer’ ‘starting between 1095 days before and 0 days before index date’. I see this a lot in protocols that say: ‘must have active disease of XXX’ (so you might frame as at least 1 occurrence in 180d before to 0d before) or ‘must not have had a major surgery within 30 days of study entry’ (so 0 occurrence of surgery procedure in 30d before to 0d before), etc.

Now, the example you provide: ‘has HIV for 2 years’, I have seen some that could be interpreted as ‘has HIV for at least 2 years’, in which case, I represent this as: ‘at least 1 occurrence of HIV’ with additional attribute: ‘first diagnosis in history’, and then I’d specify the time period as ‘all days before to 730d before index date’ to communicate that the first HIV diagnosis had to appear more than 2 years in the past.

For instances where a criteria involves a measurement with some value (ex: BMI > 31), you can represent that as ‘at least 1 occurrence of BMI’ with ‘value_as_number greater than 31’. A few things to point out about using measurements: 1) note that each measurement may have units associated with it, and if you know the unit, you can specify that using the additional attribute: unit_concept_id. i suspect this will not be something we can accomplish fully automatically, but it will be a real issue when we go to apply these to data if we don’t think about how to handle units appropriately. 2) looking for an occurrence of a measurement in some time horizon is tricky, because just because you did have a BMI > 31 doesn’t mean you LATEST BMI value relative to your index date is > 31. depending on the question, this can matter 3) usually, i try to frame these criteria based on what is expected vs. unexpected and undeseriable. so, for example, if the critiera says: ‘don’t have bilirubin above upper limit of normal’ or ‘bilirubin within normal range’, I frame both as looking for ‘0 occurrence of bilirubin’ with ‘value_as_number > 0’ and ‘range high > 0’ and ‘range high ratio > 1’ .

3 . When a criteria has ‘has a history of HIV’, then I frame that as ‘has at least 1 occurrence of condition of HIV starting from all days before and 0 days before’. that is, ‘history of HIV’ is to say you had a HIV in your history (and without an explicit statement of how far back that history may be, we can assume we can use all data available).

Hi @Patrick_Ryan ,

Thanks so much for your detailed reply, it helps a lot!
There is one more question that I am still confused about: what’s the different usage for the "initial event"and “initial event inclusion criteria”? I understand that the criteria information (“temporal constraints” and “measurement”) should be represented in the “initial event inclusion criteria”, but what is “initial event” for, and what kind of information should I put in this part? Thank you!

Sincerely,
Yixuan.

Hi, @Yixuan_Guo, I think I can answer:

The initial event is used to establish your entry into the cohort (also known as the index date). You could say that the person enters the cohort based on a condition, drug or procedure (and I quite literally mean ‘either a condition drug or procedure’. Being able to specify multiple initial events means you can look across domains to establish when a person starts their presence in the cohort.

The initial event inclusion criteria is used to further qualify or validate the initial events with temporal criteria.

For example:
A patient begins their entry into the cohort if they are diagnosed with X or treated with Y (this is the initial event)

But, only use this diagnosis/exposure as the start if there is a measurement of Z within 14d before and 7d after the initial event. (This is the initial event inclusion criteria).

So, in the data you may have a patient who has the diagnosis and the treatment, but only the treatment had the measurement within the 14d before and 7d after. So, for this patient, the drug exposure date will begin their entry into the cohort. Another patient may have the measurement in the temporal proximity to the diagnosis, so that other patient will use the diagnosis date as the entry into the cohort.

I recognize that this ‘layer’ of criteria adds a bit of confusion and complexity, but with every inch of flexibility you get a mile of complexity :).

Hope this helps!

-Chris

t