Hi,
[Context]: I am conducting a descriptive analysis aimed at measuring the proportion of inappropriate antibiotic use among patients diagnosed with upper respiratory infections. I created a cohort for this purpose and want to characterize this group of patients using various metrics. The primary task is to identify which care sites are associated with these inappropriate uses.
[Question]: However, when I included the following SQL to stratify the measure by care site, the sum of the number of patients for the care site variable is much less than the total number of patients identified as inappropriately using the medication. For example, I identified 9000 patients who were supposed to come from five clinics (A, B, C, D, E). However, using the SQL code, I got N(A) + N(B) + N(C) + N(D) + N(E) = 500, rather than 9000.
select cast(c.care_site_id as bigint)1000 + @analysis_id as covariate_id,
c.care_site_name as covariate_name,
c.care_site_id as concept_id,
count() as sum_value,
count()1.0 / stat.total_cnt1.0 as average_value
from (select distinct care_site_id,
cohort.subject_id,
cohort.cohort_start_date
from @cohort_table cohort
inner join @cdm_database_schema.visit_occurrence on cohort.subject_id = visit_occurrence.person_id
where care_site_id != 0
and cohort.cohort_definition_id = @cohort_id) visit_entries
join @cdm_database_schema.care_site c on visit_entries.care_site_id = c.care_site_id
cross join (select count() total_cnt
from @cohort_table
where cohort_definition_id = @cohort_id) stat
group by c.care_site_id,
c.care_site_name,
stat.total_cnt
[Thoughts]: I suspect the reasons for this discrepancy could be:
- Missing care site values in the original dataset
- Incorrectly customized SQL code
Any advice and thoughts will be greatly appreciated!