Need help with customized SQL for stratifying measures by care site

Shanshan4Q33 · June 10, 2024, 7:03pm

Hi,

[Context]: I am conducting a descriptive analysis aimed at measuring the proportion of inappropriate antibiotic use among patients diagnosed with upper respiratory infections. I created a cohort for this purpose and want to characterize this group of patients using various metrics. The primary task is to identify which care sites are associated with these inappropriate uses.

[Question]: However, when I included the following SQL to stratify the measure by care site, the sum of the number of patients for the care site variable is much less than the total number of patients identified as inappropriately using the medication. For example, I identified 9000 patients who were supposed to come from five clinics (A, B, C, D, E). However, using the SQL code, I got N(A) + N(B) + N(C) + N(D) + N(E) = 500, rather than 9000.

select cast(c.care_site_id as bigint)1000 + @analysis_id as covariate_id,
c.care_site_name as covariate_name,
c.care_site_id as concept_id,
count() as sum_value,
count()1.0 / stat.total_cnt1.0 as average_value
from (select distinct care_site_id,
cohort.subject_id,
cohort.cohort_start_date
from @cohort_table cohort
inner join @cdm_database_schema.visit_occurrence on cohort.subject_id = visit_occurrence.person_id
where care_site_id != 0
and cohort.cohort_definition_id = @cohort_id) visit_entries
join @cdm_database_schema.care_site c on visit_entries.care_site_id = c.care_site_id
cross join (select count() total_cnt
from @cohort_table
where cohort_definition_id = @cohort_id) stat
group by c.care_site_id,
c.care_site_name,
stat.total_cnt

[Thoughts]: I suspect the reasons for this discrepancy could be:

Missing care site values in the original dataset
Incorrectly customized SQL code

Any advice and thoughts will be greatly appreciated!

MPhilofsky · June 11, 2024, 12:29am

Disclaimer: I am not a SQL expert

I suspect you don’t have care site information for every visit. I would do a simple query into your data to quickly test this theory:

SELECT count(*)
FROM Visit Occurrence
WHERE care_site_id is NULL - - this is a nullable field, care_site_id = 0 could be a legitimate care site

For your cohort, are you requiring the Condition (upper respiratory infection) and the Drug (antibiotic) occur during the same visit? Or could these be separate visits? IF they are separate visits, which visit are you using to identify the care site? Do you have ordered drug data? Or are you utilizing pharmacy claims for this query?

This snippet is incorrect because most persons in a dataset usually have > 1 visit occurrence. And you are trying to identify the care site for the visit with the Condition & Drug, not every visit for a person in your cohort.