I’ve noticed some issues in Atlas (Cohort Definitions section).
Could someone provide any help?
Person counts observed in Reporting (Procedure report) exceed the Person counts from Generation (Cohort features - Procedures) report.
In the 1st report the background is described as follows: Features are baseline characteristics (e.g collected before /on cohort start) Long Term: 365 day lookback. Short Term: 30d lookback. Overlapping: Event spans cohort start date.
Generation (Cohort features - Procedures) report shows the list of concepts with repeating duplicates (from 2 to 15 repeats) with the same (mostly) or slightly different counts.
Device reports are not represented in both of sections. There is no way to look through the devices, right?
Issues 1-2 are not focused on Procedure reports only. Another reports are affected as well.
For #1, is it possible that your cohort definition allows people to have mulitple eras? Ie: you allow ‘all events per person’ and you specify some sort of end date that would let people leave the cohort and then re-enter the cohort?
If so, I believe what happens is that if your cohort has a person with multiple eras, the feature report will count up the procedures in the prior 365d for each era for a person. This does not happen in the cohort reporting.
For #2: here is the query that is used to produce the results in the UI:
select
f.covariate_id,
fr.covariate_name,
ar.analysis_id,
ar.analysis_name,
ar.domain_id,
ar.start_day,
ar.end_day,
fr.concept_id,
c.concept_name,
f.sum_value as count_value,
f.average_value as stat_value
from @cdm_results_schema.cohort_features f
join @cdm_results_schema.cohort_features_ref fr on fr.covariate_id = f.covariate_id and fr.cohort_definition_id = f.cohort_definition_id
JOIN @cdm_results_schema.cohort_features_analysis_ref ar on ar.analysis_id = fr.analysis_id and ar.cohort_definition_id = fr.cohort_definition_id
LEFT JOIN @cdm_database_schema.concept c on c.concept_id = fr.concept_id
where f.cohort_definition_id = @cohort_definition_id AND f.average_value > .005 @criteria_clauses
ORDER BY f.average_value DESC
To run this on your own env, you will need to replace the results schema and cohort ID to your specific case. Additionally, the @criteria_clauses is used to filter the results to a specific domain ID. You should take this query, apply a WHERE domain_id = ‘Procedure’, and then look at the rows to understand where your duplicates are coming from.