OHDSI Home | Forums | Wiki | Github

Import i2b2 query cohort

I am as software developer working for Columbia Univ. I work mostly with i2b2 software and have a few questions about how to translate an i2b2 query into an ohdsi cohort.

Some introductory remarks: i2b2 is compose by panels and each panel has a collection of concepts( call items) or criteria. The sql code for relation between items in a panel is “OR”, among panels the relation is “AND”.
Each panel may have observation periods and number of occurrence (or events)a patient need to satisfy to be in the cohort. You may also Exclude items (or criteria) for a given panel.
You can also restrict the samevisit between 2 or more panels.

There is not such thing that I may say is the PrimaryCriteria (PC). You could say all panels are to be added to PC and the index date could be the first event for any of the panels. You could choose the first panel because usually, when you are designing the query, you put in the first panel the most restrictive panel according to the number of patients; this is totally optional to make the query more efficient. The result is an INTERSECT of the patients for all panels, and within a panel is UNION ALL
Question 1: How to design PC. All panels or pick up one
Question 2: Since I only care about the event happening in a certain time period I could choose continuous observation prior and after = 0, and earliest event per person. Will this work even if the number of occurrences in a given panel per patient is to be larger than 1
The exit strategy I would say is the end of the observation period.

If I put all panels in PC then I need inclusion rules that establishes the occurrence for each panel. Say panel 1 needs to happened during certain length of time and at-least one per patient, which is the default (occurrence); panel 2 has another time period or may be the same as panel 1, and at least twice events per person, and so on.
It seems to me that PC pick up the first event among the codesets that are part of PC and then set up the index date or the patient. If they come from one panel only the UNION ALL of patients is fine. If I have many panels then I need to find the intersect between the patients in the different panels besides establishing the index event.
Question 2: Where should the inclusion rules go.
Can I add them to PC as it has a Criteria array (for each panel and each domain), which contains CriteriaGroup, where I define the rules.
Can I add those rules to AdditionalCriteria or InclusionRules
Looking at the circe-be software in ohdsi, technically speaking AdditionalCriteria and InclusionRules are the same and they differ in some String description. They are both CriteriaGroup where the rules could be established.
Any help and comments are appreciated

Hello, @Elena_Villalon,

It does seem like there are some definite similarity with your i2b2 builder, where we say ‘Criteria’ you say ‘Panel’ and you say ‘items in a paanel’ and we’d say ‘criteria attributes’, although I think you might be referring to the codes used in a panel as the ‘items’ becuase you say ‘items in a panel is ‘OR’’. But, in the case of Circe, the codes in a codeset are ‘OR’ (as in: any of the codes in the codeset will be used to find the records). But, the ‘criteria attributes’ are ‘AND’ together, as in:

Diagnosis of: {Any of the codes found in codeset}
AND Age > 18
AND Gender = Female

So, with that comparison between Panel’s and Criteria, I can walk you through the cohort definition process:

Step 1: Cohort Entry Events

You can think of these as a set of Panels/Criteria which are used to locate events in the patient record which are the basis of when the person enters the cohort. You can specify multiple Panels for when a person enters the cohort, and for purposes of the event selection, it is a UNION (OR) of one or more Panels. For example:

Person enters Cohort when:
Is Diagnosed with X
OR is Tested for Y
OR is treated with Z

These 3 Criteria/Panels above are UNIONed because INTERSECT (AND) doesn’t really make sense here: what is the intersection of a Diagnosis a Test and a Drug Exposure? That they all happen at the same time? Doesn’t make sense to use these as anything except a UNION (OR) and each record found using the Criteria/Panel is used as a moment the person can enter the cohort.

Now that you have the Cohort Entry Events, you an then apply Inclusion/Exclusion rules relative to the cohort entry events. You keenly observed:

The main difference is that AdditionalCriteria is applied before the inclusion rules so that you can establish a ‘baseline’ population that you can apply your Inclusion Rules (there are reports on the imapct of the inclusion rules, so you could consider AdditionalCriteria as ‘absolute requirements’ and Inclusion Rules as a ‘maybe-maybe-not’ that you evaluate to determine if you want to relax the rules. The other difference is that you can name your Inclusion rules, while ‘AdditionalCriteria’ just just implicit rules about the cohort entry events. But, you can move all the criteria from AdditionalCriteria into a Inclusion Rule, and it will have the exact same effect.

Inclusion Rules are AND/INTERSECTED across each other: in order for a cohort entry event to be used in the cohort, it must pass all the Inclusion Rules. Each Inclusion Rule is based on the cohort entry event’s start/end dates (called the ‘index’ in the Circe UI). Note, we don’t distinguish Inclusion and Exclusion rules: If you want to say ‘they must have something’ then you say ‘Has at least 1 {Criteria} between X days before and Y days before index’. If you want to say 'They can’t have something (ie: exclude), you say ‘Has exactly 0 {Criteria} between X days before and Y days before index.’

So, after all the Cohort Entry Events are identified, and Inclusion rules applied, you can then move to step 2:

Step 2: Cohort Exit

For each event that was selected from step 1, you have to specify how you want the events to ‘persist’ or ‘extend over time’. By default, the event extends over time until the end of the observation period (aka: the continuous observation time after the index). But there are options to limit it to a fixed time duration, which I won’t go into detail here. But the point of Step 2 is to establish the duration the person is considered in the cohort.

Step 3: Cohort Eras

So you found all your cohort entry events, and they extend to some cohort exit. Since it is a rule that a given person may not have overlapping time periods in a cohort (they can only appear within a cohort exactly 1 time at any moment, and if you allowed overlapping times, the person would appear multiple times at a given moment). So, we always ‘collapse’ the individual cohort events so that there are no overlaps. But you also have the options to ‘combine’ 2 distinct cohort event periods if they are within a ‘gap’. The Cohort Era section allows you to specify a gap that can bring 2 distinct ‘cohort episodes’ together.

For simplicity, you’ll usually just have your criteria from Step 1 establish cohort entry, and not worry about Step 2 or Step 3 (just letting all the events persist until the end of the observation period, which means there are no gaps to worry about). But with more complicated scenerios (such as re-attempts at interventions), the options in Step 2 and Step 3 allow you to create more complicated cohort episodes.

Hope this was helpful, and I’m happy to answer any questions you have.


1 Like

With all that information above, I’ll try to answer your specific questions:

Q1: You design the PC to be the events that may (if rules are satisfied) start a person’s presence in the cohort. If it’s only ever based on one type of event (a drug exposure), then you just need 1 panel. If it is the choice of multiple types of events, then you’ll have one panel per type of event you are looking for.

Q2: If you say ‘earliest event per person’, then if your criteria found more than 1 event per person, it will filter out all the later events so that only 1 event per person is returned. However: by doing this, you are only giving the person one chance to enter the cohort. Sometimes this is desired. Other times, you want to give the person multiple attempts, so you’d use ‘all events per person’ instead of ‘earliest event per person’. If you did decide to use ‘all events’ per person, then all the events found will be evaluated against your Inclusion Rules, and any remaining events will be constructed into cohort episodes.

Inclusion rules are just a way to help you organize your criteria. You can take 10 inclusion rules and stuff them into a single inclusion rule where you AND all the criteria together. But, the benefit of having separate inclusion rules is that you can see how the population satisfies each rule’s criteria. I think one of the main differences between Circe and the i2b2 tool is that the i2b2 tool doesn’t have the ‘identify cohort entry -> specify cohort exit->collapse cohort episodes’ process. If you are saying that the i2b2 tool only has Panels that are either intersected or UNION’d together, that doesn’t quite align with the thinking in Circe: In Circe, you find ‘potential’ start dates, and then filter them out using Inclusion Rules. In finding ‘potential start dates’ you are establishing an index time. The inclusion rules are not. The Inclusion rule is saying ‘yes or no’ to whether the date is a date that the person was present in the cohort. It’s not clear to me in the I2B2 tool where you establish ‘This panel defines the date and this other panel decides if that date is valid’.


Thank you for your reply.
As an example I mean
Patients with Diabetes either 1 or 2 AND Limb Amputation (say hand or leg) with twice amputations in observation period AND Some sort of cardiovascular problem
Panel 1 (or codesetid=1) Diabetes codes
Panel 2 (codesetid=2): Procedure Amputation codes
Panel 3 (codesteid=3): Cardivascular disease
I want patients that satisfy the three conditions Panel1 and Panel2 and Panel3 during the observation period, say last 2 years
PC will be any event for any of the panels (or codesets), the earliest defines index (It cold be diabetes or condition_occurrence, amputation o procedure_occurrence or heart disease or condition_occurrence). Any will specify the index date.
Then I need the Inlcusion Rules to make sure that the patients are diabetic AND had 2 amputations AND had some cardivascular disease in the observation period of the last 2 years////.
Thus within the panel or codesetid I need OR one of the concepts in observation period. Among panels I need AND or those patients that are included in the three codesetid (or panels). Sorry if I do not explain myself clearly.
My interest is to create a JSON file that contains what I am describing above above, i.e. Diabetic patients with 2 amputations and some cardiovascular disease diagnosis during the past two years. It does not really matter what comes first but they need to be in the three groups (panels) or the three codesetid’s.

This statement is very clear and, in my mind, establishes that the cohort entry is that they are diabetic, and that at the condition_occurrence date of diabetes, you find cardiovascular disease and at least 2 ampudations between 730 days and 0 days prior to index (the diabetes diagnostic date).

You say ‘order doesn’t matter’ but If you were to say ‘Patients suffering cardiovascular disease with diabetes diagnosis and at least 2 amputations in the prior 2 years’, wouldn’t that phrasing mean that your index is cardiovascular disease, and you look for the other 2 things between 730 days and 0 days prior? the order of phrasing is determining what the index represents.

If, however, you are saying ‘I just want to identify the patients where they had all 3 of these things in a 2 year period of time, then you’d have to select each of the 3 elements as index, and then define 3 inclusion rules that require each of the things within 2 years of the index (and in this case, the index will either be a Diabetes diagnosis, amputation, or Cardiovascular Disease’. You’d construct this cohort like so:

Cohort Entry

Patients having any of the following:

  • Condition Occurrence of Diabetes (codeset 1)
  • A procedure of Amuptation
  • A Condition Occurrence of Cardiovascular disease

Limit to all events per person

Inclusion Rules

1: At least 1 Condition Occurrence of Diabetes between 730 days and 0 days prior to index
2: At least 2 procedures of amputation between 730 days and 0 days prior to index
3: at least 1 Condition Occurrence of Cardiovascular disease between 730 days and 0 days prior to index

You can leave the Cohort Exit and Cohort Eras to do the default behavior, which is it will make the cohort_end_date be the end of the observation period that the index date belongs to.

The result of this cohort will be: the patient’s cohort_start_date will be the date that they have all 3 criteria: a Diabetes condition, they had experienced 2 amutations, and cardio vascular disease.

The key here is first establishing which dates to evaluate (diabetes diagnosis, amputation procedure, cardiovascular diagnosis) and then enforcing that the 3 things existed within 2 years of that date with inclusion rules.

I hardly can imagine the question where any of three panels can be used as index event.

Next, let’s try to clarify the following:
You mentioned that :

But then:

In my understanding, as soon as you have chosen the index event as the earliest of those 3 panels, none of the rest 2 panels can happen 2 years before. So , if the earliest defines index then other panels should occur within 2 years after.
If it’s right, then I see the design as follwos:
Cohort Entry:
Patients having any of the following:

  • Condition Occurrence of Diabetes
  • Procedure Occurrence of Amputation
  • Condition Occurrence of Cardiovascular disease
    Limit initial events to earliest per patient

Inclusion Criteria:

  • At least 1 Condition Occurrence of Diabetes between 0 days after to 730 days after index
  • At least 2 Procedure Occurrences of Amputation between o days after to 730 after index
  • At least 1 Condition Occurrence of Cardiovascular disease 0 days after to 730 after index

Limit qualifying event to earliest per patient

i2b2 also has temporal queries like, for example, diabetes need to happen before amputation. For now, I am just focusing to get the non-temporal queries translated into json where order in time does not matter provided the three panels (or 3 codeset ids) are met in the 2 years period.

You could say the first panel with codeSetid=1 is the PC event, in the example diabetes. As I explained, in i2b2 you put in the first panel the most restrictive condition or set of codes that limits the number of patients that the second and third panel will use or consider in the queries. Thus second panel will only look at the patients that are diabetic and third panel will only look at patients that are diabetic and has 2 amputations.
You can design the query differently swapping the order of panels and the result should be the same, if your i2b2 query is non-temporal, which is my case for now.

Also I found that some of my codes in the vocabulary are non-standard like ICD9CM, ICD10CM etc. So I need to query the concept_relationship table with relationship_id=‘Maps to’ to get the concept_id of the standard code and that concept_id is what I put in the conceptSets.

I also have some of the codes like Loinc which are standard_value=‘C’ they correspond to the LP folders of the i2b2 ontology. Thus I usually go down the tree until I get the standard leaves.
It seems to me that Atlas is always using concept_id of standard concepts, otherwise the query will not work.

I may need to explain that i2b2 has some request-xml file, which is the counterpart of the json file, and that defines the query. My interest is to translate the request-xml file into a json file that can be upload into Atlas.
In i2b2 is usually easier to create cohorts but it is not as detailed and has not as much capabilities as Atlas.
Thanks for your reply best elena

Well, in OHDSI philosophy cohort is not just the pool of patients, but also time spans :slight_smile:
So you’ll need to deal with it.

‘C’ - are classificational conepts. You can use these concepts, but make sure you enabled ‘descendants’ for those in your concept set definition.

Speaking about non-standard concepts (ICD9CM, ICD10CM) we kindly advise to use corresponding standard concepts.
But if for some reasons it doesn’t work for you, you can use source concepts.
NB! Source concept_id’s are stored in _source_concept_id fields (for ex. condition_source_concept_id but not condition_source_concept_id)
So, If you want to use source ones, add criteria attribute (ex. Add Condition Source Concept)
and put your concept set there.

Also, please keep in mind that when creating concept sets concepts, ‘descendants’ works only for standard or classificational concepts. So, to grab all ICD10CM concepts for Type 1 diabetes mellitus, you can’t just use
E10 Type 1 diabetes mellitus and descendants. Instead, you need to add all underlying concepts one by one.

Only problem here is you should say 'Limit initial events to all events per person

If you limit to the earliest of the three, you won’t be able to find anything 2 years earlier for any of the others. So, use all the events among the three, find those events that have all 3 kinds of events within 2 years, whatever remains, use the earliest of those (your last ‘Limit qualifying events to earliest per person’ is fine.

To put this vsually:

P1:     A D   C  A
P2:   A    A D C
P3:    C       A    D   A
P4:   A  A   D   C

A= Amputation, D = diabetes, C=Cardio

If you say ‘limit initial events to earliest per person’, then you’d never evaluate the second - fourth events for inclusion. But you need to. so you use all events per person

Then the inclusion rules apply:
For P1: the ssecond Amputation is the moment where the person has a diabetes, cardio and 2 amputations on or before index within 2 years.
For P2: The cardio event is the moment when they have all 3 rules satsifed.
for P3: the Second amutation event is when they are in.
for P4: The cardio event is when all 3 rules are satisfied.

But here’s a case where it wouldn’t satisfy:

                 1Y            2Y         3Y
          A              C          D  A

For this person, there is no event that has all 3 rules satisifed within the prior 2 year period.