OHDSI Home | Forums | Wiki | Github

Randomized data - Atlas Cohort Generation

Hello @Chris_Knoll - We created our own custom concepts for our data. However in Atlas, we found something unusual, so thought of checking with you

  1. Cohort Generation returns records successfully when I explicitly mention the custom created concept_ids for my observation.

Ex: Here, I have created around 10 records in observation table with my concept_ids in “Value_as_Concept_id” column. When I follow this approach, cohort is generated successfully and returns records. Here HDB 1-2 room and HDB 3-room flat are custom concepts

  1. However, when I create a concept set using the above concept_ids and use them during cohort generation, it returns 0 records. Instead of using the “Add Attribute” option, I am using the concept set (which contains the same two concepts) but it returns 0 records. Can you please help us understand what might be the reason?

  1. Finally when we use our custom created vocabulary_id as a search term (ex: Survey_PS) in search field of vocabulary search in Atlas, it does not display all our custom created concepts under the vocabulary Survey_PS. Is this expected?

  2. I also tried the advanced filters option where I checked my Survey_PS and clicked on Search Icon (Microscope icon), still it doesn’t return all custom concept records under my vocabulary

Can you please help us with these issues and let us know whether we are making any mistakes?

Thanks
Selva

The first image shows looking for the 2 concepts in the Value As Concept field. The second image shows looking for those concepts in the observation_cocnept_id. What did you put in the ‘observation_concept_id’ for those records?

As far as the vocabulary search, If the values are inserted into the ‘concept’ table, the vocabulary search should find it. You should write a direct query against your vocabulary tables to ensure that you can find the data. If you are saying that you selected the specific vocabulary and clicked on the search icon without typing any search terms into the textbox, I am pretty sure Atlas will not perform the search unless there is at least 3 characters of text in the text box.

Thank you @Chris_Knoll. Thanks for correcting my mistake.

Okay got it. So when we use concept_sets, it filters based on observation_concept_id (Tablename_concept_id ex: Drug_concept_id, Measurement_concept_id)? Am I right to understand that it is because of the same reason that we don’t see these column names in drop down of ‘Add Attribute’. Should we wish to filter records based on these concepts, only way is to concept sets

  1. Our data is de-identified (Person_ids, dates etc), it’s a hospital EHR records where we don’t have any info about patients before they visited hospital. In our case, do you see any use of observation period window?

is there any way to bypass this observation period (date) criteria or choose a default one which will ensure that all my records would be returned based on my Cohort entry criteria (Concept and default observation period) and inclusion criteria. Atlas allows a window of 1095 days before and 1095 days after event index data. So am I right to understand that it would be best to have this 1095 days window set up to ensure that there is no loss in records?

Thanks
Selva

I wouldn’t say that using concept_set automatically mans you’re taking about {domain}_concept_id, although that is the majority of the cases in the tool. Instead, I would look at is as: when you use a concept set for a criteria attribute, it’s because the intention of the attribute is to allow you to specify a set of cocnept_ids that are built from a concept set expression which involves using elements of concept_ancestor to pull in descendants. So, you’ll provide a concept set for things like the {domain}_concept_id fields, but also {domain}_source_concept. In other places, you would specify a ‘concept list’ which means a literal list of concept IDs which is not derived from any concept_ancestor lookup. Certain attributes it makes sense to ask for the explicit concept Ids (ie: VALUE_AS_CONCEPT: would using concept ancestor for this make any sense? or UNIT_CONCEPT_ID: would using descendants of a UNIT_CONCEPT make sense?). Other places ask for explicit concept IDs where in the future this may change. Example: gender/sex is specified as a concept list (you don’t pick descendants of some gender concept). however, in the future we may have support for ‘gender you identify with’ which you may want to pick a higher level gender identity concept that covers a broader range of lower level concepts. When that comes into the CDM, then we’ll make the change that when you want to check the gender of the person, you don’t specify a list of concepts, and instead you have to define a concept set expression that yields a wider range of gender concepts.

Part of defining a cohort definition is being able to specify the duration from when they enter the cohort and how long they can persist in the cohort. By default, they would persist until their end of the continuous observation period. This means that any event that will bring them into a cohort must exist within an observation period so that we can determine how long they can ‘persist’ in the cohort. Observation Periods make this possible and they are needed for the cohort definition mechanism.

Your comment about ‘Atlas allows a window of 1095 days before and 1095 days after’…is confusing me. You don’t have to enforce that in your definition, the editor is just providing a few ‘presets’ but you could put 123 in the input field. If you don’t want any prior/post continuous observation of the person, just set both those values to 0, and it will require the enclosing observation period to begin at least 0 days before and end at least 0 days after the cohort entry event. But it will require an enclosing observation period.

Determining the Observation Periods in your CDM is one of the most challenging/interesting decisions you need to make about how you want to represent your source data in the CDM. If you make individual observation periods for your hospital visits, then you’ll only be able to design a study that covers the time period that the person is physically at the hospital. Why? Because your observation periods are declaring when you actually have a person under direct observation period, and when you leave the observation period, all bets are off, and you can no longer follow this person for study. However, if you want to declare that your observation periods for patients is between their earliest visit and latest visit, you are sacrificing the ‘truth’ that you have a person under direct observation as a trade off for being able to consider a person for a study that spans across visits.There is no right or wrong answer here, just the consequences of your decision. For claims-based data sources, we use ‘enrollment periods’ to define the observation periods. The assumption here is that if they are covered by an insurance plan, then we will probably know about any relevant clinical experience (drug scripts, diagnoses, lab tests) because there will be a claim for it. Is it perfect? no. You could potentially get some sort of medical intervention that isn’t seen by the insurance provider. It ultimately becomes a limitation of the observational study, and people decide if they will trust the scientific evidence yielded from the analysis or not.

I can not tell you what the right answer is for you. I can tell you that you could easily have 2 different CDMs that have different Observtion Period rules (one would be based directly off visits, where a person has many OPs, and the other uses MIN/MAX of visits, thus giving each person one OP each).

1 Like

@Chris_Knoll - Thanks for the response and I just have a follow up question

Let’s consider the below example from our EHR data (our only data source)

Person ID = 123 has 4 visits to the hospital and they are as follows

a) 1st Jan 2009 - 2nd Jan 2009 - Prescribed Metformin (just for example)
b) 21st Jan 2010 - 25th Jan 2010 -Prescribed Metformin
c) 23rd Mar 2014 - 03rd Apr 2014 -Prescribed Metformin
d) 27th Jun 2015 - 21st Jul 215 -Prescribed Metformin

  1. Am I right to understand that drug exposure start and end dates will be same as visit dates or some one spent effort to manually derive their exposure start and end dates from prescription? I mean only these two can be the valid methods to have acquired drug dates considering we don’t monitor patients once they are out of hospital

  2. Talking about observation periods, again our visit dates will be the observation periods. Am I right? Hence you have got the equal to condition for dates here as shown below in the cohort query to handle cases like above. Am I right?

  1. Let’s say I am trying to build a cohort and I would like to identify patients who have been prescribed Metformin. So, am I right to understand that it doesn’t make sense to use preset interval (1095,365,548 etc) in observation window for our scenario. Because our index date.cohort entry date is the observation period/visit/Drug exposure start date and cohort exit date will be the observation end date/visit/drug exposure end date. These are only date info that we have about the patients.
    These intervals would make sense only if I had observed patients outside hospital as well. Am I right?

  2. So, in this case, all the records (4 records) of person ID = 123 who have been prescribed Metformin will be present in Cohort. So as a ordinary layman I might interpret this as our cohort contains 4 records/4 patients but it is 1 patient occurring 4 times in the cohort. But I think it is possible and valid for a person to occur multiple times in the cohort

  3. I understand the meaning of terms like “Latest events”, “Earliest events” and “All events” but can you help me understand how does choosing ‘Earliest events’ or any other be helpful for analysis. This might be an elementary question but your inputs/simple example will be helpful. Or am I right to understand that its basically about whether we want to retain duplicate person_ids or not? Choosing Earliest or Latest doesn’t make any difference?

  4. In our case Index date is the Drug exposure date which is again same as visit/observation dates. All dates are equal when we don’t have any info about patients when they are out of hospital. Am I right?

I would not ask my questions regarding min/max visit dates as of now. I feel based on your responses for the above questions, I should be able to make sense of how the min/max scenario works. If I have any difficulty then I will reach out to you. Thank you for all your patience and time in clarifying my doubts. Much appreciated.

This is an ETL specification question. Did you make the drug start/end dates match the visit dates in your ETL? Was the person perscribed a 14 day perscription on the metformin (is it oral?) and therefore the exposure would last 14 days? Are you only considering them exposed if you actively are observing them in the hospital (ie: do you lock them in a room and say 'take all these pills and you can’t leave until you do!)? Will you assume that if you prescribed the drug the patient took all the drug? There is no right or wrong answer here, you just have to decide for yourself what you want to do. I can’t answer this question for you. You have to decide.

Again, the visits will be your observation periods if you decide to do it that way. It’s up to you to make the decision, and live with the assumptions and consequences.

The cohort query you are showing is just applying the ‘prior and post continuous observation’. If you make all your observation periods line up with your visits, then you will never have any prior observation the visit to determine any baseline characteistics. If you look at how you are arranging your observation periods (ie: they are the visits) when someone comes into the hospital, and you can only look at the current observation perod’s events for prior history, you can’t say anything like ‘did this patient not have acute MI event in the past 6 months?’. How know this if you don’t have continuous observation prior to entering the visit for the same observation period? So, that’s the consequence of making Visits = Observation periods. You could make Observation Periods Min/Max of all visits, then you can look at the last visit and consider all the events that happened in the prior visits for this person. If you make one OP per visit, you are eliminating that possiblity. Is it right? Is it wrong? Depends on what you want to do in your study. Again, I can’t answer this question for you.

Yes, it wouldn’t make sense because you don’t allow any prior observation time for your visits. You use the ‘continuous observation’ values to require that a person has been observed for a minimum amount of time if you are going to use the data for the cohort.

Yes, it is valid for a person to appear multiple times in the cohort. We take additional steps that the person has no overlapping time in the same cohort. In other words: if the patient was prescribed a 7 day script for metformin on 1/1, 1/5, 1/14, and 1/20, you can see that there’s overlap between the 1/1 - 1/8 exposure and the 1/5 - 1/12 exposure. If you just used the raw exposure records in the cohort, you would have the person ‘double exposed’ on 1/5, 1/6 and 1/7. But you don’t want that, so the record in the cohort table should show the 1/1 -> 1/27 episode of drug exposure (if you wanted the cohort to represent continuous exposure).

These are used for performance purposes. If you only need to evaluate the earliest event in data, then why bring in all events per person? Reasons to use earliest event per person: new user cohort where they enter the cohort at their earliest exposure and continue in the cohort until end of observation period. Again, it all depends on your scientific question. I can’t answer ‘is it right to do this’ without knowing the context of what you are trying to accomplish.

You don’t have any continuous observation about patients when they are out of the hospital, if that’s how you’re defining your observation periods.

I think you would benefit from the online OHDSI ETL videos, since it’s very difficult to answer in detail your very complicated questions, as well as not knowing the full picture about the details of your ETL to give you the proper response in the right context. I would try to join in on specific work groups so you can ask calls, learn from other people activities, and then leverage that to produce your own fully specified ETL document. The ETL specification will answer all of the questions you raised.

Hello @Chris_Knoll - Just would like to know, if I would like to see the list of members in my cohort, do I have to manually run the query step by step to find out the final cohort set or is there any option to download my cohort members (person_ids)?

Hi, @SELVA_MUTHU_KUMARAN,
The person_ids are stored in the {result_schema}.cohort table. There currently isn’t a mechanism to expose person IDs in the application due to sensitivity to exposing patient level data in the UI. However, you can directly select the person_ids as:

select * from {result_schema}.cohort where cohort_definition_id = {cohortId}

Note: the person_id is stored in column subject_id. The reason for this is that there was the idea that you could create cohorts based on other entity identifiers (such as provider ID) so the ‘subject_id’ moniker was chosen to be more abstract.

1 Like

Hello @Chris_Knoll - Can you please help me with this post?

@Chris_Knoll @anthonysena

I have created a cohort in Atlas based on this paper (https://www.ncbi.nlm.nih.gov/pubmed/27923172). Can you let us know whether we have got the inclusion/exclusion criteria right as mentioned in the paper?

You can find the criteria in “Material and Methods” section

This is the link (http://www.ohdsi.org/web/atlas/#/cohortdefinition/1770877) of my cohort in public version of OHDSI

Your inputs would definitely be helpful. we would like to reproduce this study and show it as our demo

Thanks
Selva

Hi, @SELVA_MUTHU_KUMARAN,
I’d definitely solicit feedback from the broader community for suggestions for cohort criteria that can help identify T2DM, renal diseases, etc that are the clinical concepts that are specified in the paper. However, I’ll give you some general, logical things to consider.

The paper sates that it is a prospective cohort study, so they will be collecting data about patients starting from their enrollment in the study and following them for as long as possible. They state that they will drop people that have less than 2 years of follow up. However, you are doing study on already collected data, so you’ll be conducting a ‘retrospective’ study, so, you’ll be approaching the problem of cohort definition from that perspective.

Continuous Observation specification.

You specify that you require 0 days of prior observation, but you have inclusion rules that require 0 occurrences of certain diseases. But if your patient has 0 days of prior observation, how confident can you be that the person in question did not have the disease? Would you need to have the person exist in the data for 6 months? 1 year? 2 years? If you require more prior continuous observation, you will limit your study population, but you will have confidence that your inclusion criteria is based on information that you know. Note: there’s no real reason to require 2 years of prior observation if you are only looking for something in the prior year.

Using the event windows in inclusion criteria

All if your inclusion criteria (that specify a event window) specify ‘events starts between All days before and All days after index’. This means that you’re looking forward in time for the presence of some event. Depending on the study, this could be inappropriate. Since you’re trying to replicate the study, you shouldn’t look for anything in the future: you should model the cohort selection as using anything at the index and prior to make the determination about cohort inclusion. It’s not clear in the study what happens if someone develops an infection, cancer, renal diseases, etc. Do they get censored? Does there data get dropped completely from the analysis? It would be good to hear from the OHDSI community about how those cases are handled. But, you should review your inclusion rules and apply the lookback window that makes sense for the inclusion rules: does a pregnancy 5 years ago exclude them from the study, or just a pregnancy that started within the past 365 days? Does the term ‘active infection’ imply recent diagnosis of an infection (like 60 days) or is it something longer term?

Use of measurements

For me, use of measurement data has always been tricky. if you’re going to use those concepts you should verify that your datasource actually has those concepts in the data, and also check to see what units are associated with the measurement. If you are looking for a value, then i think it makes sense to also specify the unit concept ID in the criteria.

That’s my quick review, hopefully others can contribute some ideas as well. I’m sorry I can’t give you a deep review of the cohort, as these types of activiteis are very time intensive and you need to balance what is required from the prospective context vs. what is actually captured in the data to recreate the study as a retrospective study.

Good fortunes!

-Chris

Hello Chris,

Thanks for the response. I will wait to hear from the community but few quick questions based on your comments and you can let me know whether my understanding is correct or not

Let me clarify better. My aim of this task was only to explore Atlas and see whether we can implement the criteria from real time study in Atlas.

I agree that w.r.t to clinical concepts, i could have added more and it can be improved further. But one thing I would like to get to know/understand from you is whether my understanding of fields are correct

  1. Continuous Observation - From the paper, I am not able to find any info on observation period. So I have kept it as zero. Now that, I have as zero, I believe that all patients with T2DM will be present in the cohort. As we have as zero (which I understand as no restriction/any criteria to be qualified for the cohort. Whoever has got T2DM gets in the cohort. I mean same patient multiple times or unique patients (based on what I choose “Limit Initial events” to be) Am I right? let’s say, we have 100 patients who satisfy this primary event criteria (Cohort entry criteria), So all these 100 unique patients come with their ids and condition start and end date at this stage

  2. Event windows in inclusion criteria - Here now when I apply the inclusion criteria

Pregnant criteria - As we have to exclude patients who are pregnant, we need to look at patients to know whether they are pregnant or not. As we really can’t look back (we didn’t observe them in level 1(primary events)), based on my condition “All days before and All days after Index date”, this is not going to cause any impact anyway. Am I right? I am strictly speaking w.r.t my primary events result. Or do you think it would have been more appropriate to have “0 days before and 0 days after index date” as we don’t really have any records to look back, it is like basically saying, "see whether patients who were diagnosed with T2DM are pregnant. If yes, exclude them for the cohort. This is what I am trying to do. Exclude patients who are pregnant from a group of 100 T2DM patients. I understand your pt about where I shouldn’t look anything into the future but index and prior. In my case there is no prior, my index date becomes the inclusion criteria date (I mean date to check whether patient is pregnant or not).

Another important question is, though I have 0 prior continuous observation in the primary criteria, when I apply “event starts 730 days before and 0 days after index start date” on inclusion criteria, does this help us to look back patient records here by overriding the primary criteria?

  1. For criteria on “Follow up period of 2 years”, can you let me know what is the “event” here? . I mean in pregancy criteria, “being pregnant or not” is the event. Similarly in follow up criteria, what is the event?

I would like to let you know that our observation period dates were derived from min and max of visit occurrence table. When I give “Period duration > 730 days” after selecting “observation period” as criteria as shown below, am I right to understand that it only retains patients (from 100 records) who have observation period duration from observation_period table ((ob.end date - ob.start date) > 730). Let’s say out of 100 patients, 95 patients had their observation period more than 1000 days, 5 who were less than 730 were excluded. So, its the like the subset after filtering our pregnant women will be joined with observation period table to find the observation period duration. IF IT’S > 730 retain else reject it. Is this how it works? But how is the “event starts” condition used here. This is what confuses me.

  1. Atleast 3 measurement of eGFR criteria - I think to address this condition, we should have made sure that our “Limit Initial events” is “All events”, only then we can get multiple records of the same person. With my lookback period as 0 days, can you let me know how to implement this in Atlas?

Thanks

The paper is doing a prospective study, you are doing a retrospective study. In a prospective study, the people are evaluated if they qualify for the study, and if they qualify, then they are enrolled and followed. In a retrospective study, you look at pre-captured data, and have to look for continuous observation prior to be able to determine if any exclusion events actually happened (you can’t determine if an event did not happen if you do not have prior observation for the patient). From a cohort definition perspective, you are just looking for the population that ‘satisifies study qualification’, which is: they have T2DM, are not pregnant, have no infenxtions/cancer, have no renal disease, etc. you can’t determine that they aren’t pregnant if you don’t have prior observation. So you will need at least 9 months of prior observation so that you can determine that you do not see them pregnant in the past 9 months (worst case scnerio, you have a record of pregnancy the day they got pregnant). You also can’t determine if they are clear of renal disease if you don’t have prior observation. How far back do you need to look to ensure no renal diease? 1 year? then you’ll need a minimum of 365d prior observation.

I’m a little confused about you taking about the ‘same patient multiple times’, but what I’d say related to continuous observation is: if you have a person that enters your data, and on day 3 of their continuous observation they have their T2DM. On day 3 of their continuous observation, you can’t detemrine if they have no renal disease, pregnancy, etc in the past year (you do not have 365d of prior observation on day 3). However, this person may be getting monthly T2DM diagnosis, and after a year of this hapepnign, you will find a T2DM that has 365d or prior observation (maybe on day 425)… It is at this T2DM diagnosis you can confirm that there’s no pregnancy or renal disease, and at that point you can say they qualify for the study. They could have multiple T2DM even after that DX on day 425 but you only really care about the first time the person qualified for the study (on day 425). What will happen by default in the cohort defintion is the 425d DX and all the others that qualify after will be collapsed into a single cohort era taht starts on day 425, and continues until the end of the observation period.

Ok, so you found 100 patients that had the primary event (T2DM), but then you will only include those patients that satisfy all of the inclusion criteria. And, by default, if a person satisfies the inclusion criteria and multiple dates, the final cohort episodes will be from the first qualifiing event up to their observation period end date.

Yes, if you add a inclusion criteria that says the Observation Period must start between all days before and 730 days before the index (the T2DM dx), that is the same thing as requiring 730d of continuous observation, because you are sayign that the observation period must start at least 730 days before the T2DM DX, which means that, if it is true, the person had 730d prior observation. Conversly, you could say that the observation period ENDS 730d after the index date, which means you have 730d of continous observation AFTER. What you put in your inclusion criteria, hwoever, was that the length of the observation period wasy 730d. That is saying somethign different: that is sayign that the duration of continous observation spaned 730d, but it doesn’t say anyting about if that span was before or after the T2DM dx date.

Again, the follow up period is something they specified in their prospective study. When someone enrolled int he study, you don’t know if they are going to be with you 2 years (as you can’t see into the future). Instead, they said that they would drop people if they didn’t make it 2 years in the study (which is a bit odd for me). I suppose that if you wanted to recreate that rule in your retrospective study, you could require that people’s observation period must end between 730d after and all days after the DX date. I just think it’s strange, and more of a characteristic of a prospective study, so I was suggesting you don’t apply that rule. But, if you want to follow it, you can do it.

To answer your question, the ‘event’ to determine ‘follow up’ is the observation period event. You want to require that the observation period ends 730d after the T2Dx date in order to require that the person was continuously observed for at least 730d.

Yes, you understand correctly that when you say period dration > 730d, you are saying datediff(d, observation_start, observation_end) > 730. However, that is not saying that they had 730d of continuous observation following the T2DM diagnosis, it is saying that at the time of the T2DM, there existed a period of at least 730d of continuous observation surrounding the T2DM diagnosis. Since the paper is requiring 730d of foloow up, that means that when you are checking your T2DM diagnosis, you want to see the surrounding observation perioed end between 730d after and all days after the T2DM. that means an observation that ends 730d, 790d, 1024d, 2048d after the T2DM all qualify the T2DM diagnosis, but if the OP ends 60d after the T2dm, then you do not have 730d of follow up.

First part is right: if 5 people didn’t have 730d of post-observation, then they are not included there (note: ALL of their T2DM diagnosis would have to fail this check. If at least 1 T2DM passes, then that T2DM event will be considered. But, then you start talking about pregnant women…but pregnancy is a diffrent inclusion rule. Each inclusion rule is applied independently to the cohort entry events and at the end, we chose the cohort entry events that satisfy all inclsuion rules:
entry event in cohort if: (Inclusion 1 = TRUE && Inclsuion 2 = TRUE && Inclusion 3 = TRUE)

Let’s say you ahve a person with multiple T2DM diagosis. Let’s say the first one only had 30d of prior observation, and you require 365d. Let’s say they had another diagnosis on 765d, so they satisfy the inclusion rule related to observation period. But, ut oh! they got pregnant on day 732, so while they had the right prior observation on their 765d diagnosis, they failed the pregnancy check! so, they didn’t meet all the inclusion criteria, so the person is not in the cohort as of day 765. But let’s say they have another diagnosis on day 2750. At this time, there’s no pregnancy within 365d, the have pleanty of prior observation time, they don’t have any renal problems, so, on day 2750, they are in! See how that works? Each inclusion rule makes a different statement about if they qualify, and they ONLY qualify for the cohort on the date, if ALL the inclusion rules pass on that date.

I understand your confusion, but it’s hard for me to put into words: when you say ‘all events per person’ you’re just talking about those cohort entry events (in this case, the T2DM diagnosis events) using ‘all events’ or ‘earliest event’ simply means are you going to allow a person to use multiple T2DM diagnosis events or only their earliest T2DM event for inclusion in the cohort. After you’ve made that decision, the inclusion rules will use all the events that you specify to look for in the inclusion criteria. Meaning: You said use earliest event per person for the T2DM diagnosis. Then you want to look for eGFR measurements. Looking for eGFR measuremetns is not limited to a single event per person. When you create your inclusion rule looking for eGFR, it is going to find all eGFR measurements, and count them up (based on the time window before-after you specify, and other criteria like the concept ID, measurement value, etc). It is going to count up how many it found and that count will be used in the inclusion criteria: do you want at least 1? (that’s an inclusion)…do you want exactly 0? (that is an exlcusion)…do you want at least 3? (that means that the person had 3 measurements taken). What the toold oesn’t do (yet) is let you say 'there was some X increase over the next 3 measurements). You’ll have to do something like ‘had a value in range A-B’ and then a second measurement in range C-D within 90d…which I acknowledge is not the same thing, but it’s a workaround.

-Chris

Hello @Chris_Knoll - Thanks a ton for your detailed response. Much appreciated

I would like to know whether you have any examples where you have tried/showcased to implement the cohort definitions of a study (article like what I shared) through atlas. That would be helpful to solidify my understanding as well.

Check out the studies found in the OHDSI github study protocols repo:
Example:

@Chris_Knoll - Hello Chris, I was trying to debug the SQL query which was generated for Cohort. Can I know what is the use of “date offset strategy” or temp table called “strategy_ends”?

Date offset strategy

This is used to specify the cohort_end_date for the cohort entry events that qualified for the cohort. Using the date offset strategy, you can specify that the cohort_end_date is offset by a fixed number of days from either the entry event’s start date or entry event’s end date. For example, you may have a procedure (such as chemotherapy) which you’d like to make a fixed duration for all people, and so you’d set the date offset strategy to add 7 days to the entry event’s start date. Or, maybe you want to specify a 30d surveillance window to cohort entry events based on drug exposures, so you would specify a offset strategy to add 30 days to the entry event’s end date.

strategy_ends temp table

The strategy_ends temp table is used calculate each person’s adjusted exit_date based on the strategy, also taking into account observation period ends. Example, the offset strategy may specify adding 365d to the cohort start date, but a person may only have 128d of post-index continuous observation. The strategy_ends will contain the offset date of event_start + 365d and also the observation_period_end_date that encloses the entry event. Which ever is first for that person will be used. The logic is: if the observation period ends before the date offset, then the person should exit at the observation period end date.

1 Like

@Chris_Knoll Thank you for the response Chris. In addition, Can you let me know what happens when we generate the custom features for measurement values? I mean Systolic Blood Pressure, eGFR values. When I mean values, I talk about value_as_number field. So when I see the output as below and if they don’t indicate value_as_number field (came to know that from github) , can I know what does this result indicate or how to interpret this? 2 questions

  1. When cohort size is 12177, why do I see different count for different features when we are generating features for the same cohort (which has 12177 records). Ex: LDL has only 12142 records. (Similar post here -
    Atlas - Issue with Characterization display)

  2. What does Min (0.0) and Max (38.00) indicate in this scenario. We already have count.

I posted in github as well but thought of posting in forum as well for the use of community.

Hi,
Let’s keep the discussion confined to github about the question on characterization. I understand you posted here to get broader view, but this thread is now touching on multiple different issues, and anyone attempting to find information about “Randomized data - Atlas Cohort Generation” is going to get a thread talking about characterizations related to distributed values.

At the very least, a separate thread specific to the characterization of distributions could have been created, just to keep the topics separately. But, if you do start the discussion on github, we should leave the discussion there. I think the forum is a good place to explain the functionality, and github should be a place to report actual issues. In the case of this topic about distributions, I’m not sure there’s an actual issue in code, rather you have a question about what the numbers mean, but you’re raising the question in the context that you think that there is a bug. That’s fine, and it looks like you’ve gotten some information already over on git.

-Chris

Hi @SELVA_MUTHU_KUMARAN,

Can you please help me with an issue I’m facing as it seems similar to yours and you probably found the solution.

ISSUE : Unable to view report for cohort characterization

I tried generating report for cohort characterization and it successfully generated but when I try to click on view reports then Atlas just keeps loading it forever

I’ve tried the following things :

  1. I’ve looked at the chrome console and I see an error which I’m unable to comprehend(see screenshot below

  1. I’ve tried looking at the network tab within chrome console and I don’t see any error messages there either.

  2. I tried looking at the tomcat localhost_access logs but I dont see any statement with FAILED status over there either.

I’m using Atlas 2.7.4 along with webAPI 2.8.0 since it is backward compatible I dont think there are any issues there.

Lastly there is a table results.cc_results already present in my DB and the records are being inserted there by the cohort characterization job so I dont think that is the problem either.

Any help would be greatly appreciated.

@Chris_Knoll could you please provide your suggestions as well. Thanks in advance !!

t