I was wondering what is the best way to construct a general population cohort in the data after a certain year (e.g., number of adults in the database after 2010). I am looking into this in order to get prevalence estimates. I have tried definining the cohort based on observation period (since I know that this is supposed to specify the entry to the database) and condition/observation occurrence of any condition/occurrence; however, these led to very different results. It would also be good if you consider that I would later on use inclusion criteria with the disease-specifying concept sets.
Related to that - I am also wondering why defining the age in inclusion criteria vs in restriction of initial events lead to different results, when I limit the initial events as the earliest event. I guess there is something I conceptually do not understand regarding these crtieria, so any further insight would be appreciated.
What is the provenance of the data you’re using (eg EHR, claims)? The fact that you get different estimates makes total sense since there are many people who made it into the database but don’t utilize healthcare services actively. Then, if you have claims you can generally trust your observation period and just go with it as your inclusion criteria. For EHR you would probably also use observation period although ensuring that you have sufficient coverage in EHR (ie that your included people come only to you institution and you have accurate prevalence estimates) is a separate hard question. Also, if you don’t care about specific years you can simply use Data Sources tab in Atlas to get your prevalence estimates
Thank you very much for your comprehensive reply, I was indeed looking for an answer for both EHR and claims data. This was very helpful for me to understand what the numbers really correspond to. I was wondering regarding my other question: when determining age limits for the population do you recommend using the inclusion criteria approach or restriction of initial events?
I also needed to define a general population cohort in Atlas to use as a denominator for incidence rates.
Suppose we have a study start date of Jan 1, 2010 and study end date of Jan 1, 2020
If there is an observation period that contains the study start date then we want to use the study start date as the index date. If there is no observation period that contains the study start date then we want to use the first observation period start date that occurs within the study period as the index.
We also need to apply inclusion criteria based on this index date.
I think you’re asking for: all observation periods that start between 2010-01-01 and 2020-01-01, earliest per person.
If that’s it, then you just use the second entry event you specified. The first one will just return a cohort entry/exit date of 2010-01-01 for any person who has an observation period that ‘covers’ (ie: starts before 2010-01-01’ and ends after 2010-01-01.
You can also add a censoring event of observation periods with user-defined date of 2020-01-01 such that anyone who has an OP that extends past 2020-01-01 will exit the cohort at 2020-01-01. Depending on your version of atlas, you may not see the option to censor at observation period, but there is a work around I can show you if you do have this issue.
Alternatively, you can use the ‘study window’ option of cohort definitions to specify the 2010-2020 date range, and what that does is it excludes people that started before the eariler date, and censors people at the later date. But, I caution using this because we may remove this feature from cohort definitions.
The reason I say we might remove this is that a study window isn’t exactly a phenotype concern, it’s more of a study concern. What I’d like to see people doing is creating a general cohort that is used as a phenotype (in this case, it’s ‘everyone in the database’) and at analysis time you specify your study window and in one study it’s 2010-2015, another is 2014-2018, etc etc but you use the same ‘database population’ cohort across all the different study contexts.
I was actually trying to capture both observation periods that start after the study start date and those that start before the study start date. As an example, if this was my database population
The reason for doing this in Atlas would be to give the PI the ability to define any inclusion criteria they wish based on the study start date in Atlas.
It makes sense that this is an analysis concern and not a phenotype concern though.
Maybe I can just use the censoring boxes in Cohort Eras to get this cohort. But then the inclusion criteria will not be applied based on the study start date.
ok, then in that case, your initial thing of having the entry event exactly on jan-1 makes sense but you want to cohort exit at end of obs. and use the Right censor that you’re showing at the end to just right censor at study end date.
Inclusion criteria are based on the start_date of your entry events so what you initially propose works.