Thinking about calculating gap days (if anyone is interested in this enhancement):
The trick is going to be creating the era in 2 passes: the first pass collects all the individual drug exposures and flattens them (with a gap window of 0) so that any overlapping drug exposure days are not double counted. Looking like this:
|------------|
|------------|
|------------|
|------------|
Creating a 0-gap era from the above gives us:
|--------------------------|
15d
|------------|
So with this first pass we have the non-overlapping continuous exposures, Note the gap of 15 days.
The second pass will create the final drug_era with the 30 day gap window:
|---------------------------------------------|
The last era will be what’s put into the drug era start_date and end_date. to calculate the gap_days for the drug era, we simply subtract the total days in the drug era - the duration of each ‘sub-era’ found between the final drug_era start and end. From the above picture, we know that there’s 15d that were not covered by en sub-era. So, if I am interpreting the ETL spec correctly, the gap_days would store 15 days. A final example:
After overlaps are removed:
|------------|
12d
|----------------|
21d
|-------------------|
Final Era:
|---------------------------------------------------------------|
Gap Days = 21d + 12d = 33d
@DTorok: would you agree with this logic?