yes, but I think option 2 is the way to go. Let me give more detail:
FeatureExtraction supports a few modes of execution: patient-level and aggregate, and within those two, you can get ‘features’ within a time window (indicated by a time_id), or you can get a result without a time_id (null) where the analysis_id would inform the time window (short/medium/long). The primary usecase of FeatureExtraction (surprisingly) is to extract features, or patient-level characteristics that can be used to build models for prediction, propensity scores, etc. A secondary use of these features is to get population-level summary statistics.
Cohort Characterization is all about population-level summaries (as in, what’s the proportion of condition X within a time window relative to index (cohort start)). A particular FeatureExtraction analysis can summarize if it has the ‘aggregated’ param, which controls how to group by - sum to summarize. Atlas lets you use those ‘preset’ analyses that support aggregation to get the summary statistics. The custom SQL and criteria type of features ONLY work in aggregate, so your sql needs to produce the following columns:
covariate_id : usually computed as :
cast(condition_concept_id as bigint)*1000 + @analysis_id
covariate_name: usually the concept name
concept_id: the concept ID that represents this observational data you are summarizing
sum_value, the number of people you counted
average_value: the proportion of people with the concept calcualted as:
count(*)*1.0 / stat.total_cnt*1.0
Fun fact: it may not be obvious calling this column ‘average value’ but the average is the sum of values / number of values, but when dealing with binary (0 and 1) covariates, then the average is the proportion of people with the covariate (sum of 1’s = people with covariate / total people (number of values) = people with cases / total people = proportion of people with case. We use these types of binary features to produce the characerization reports in atlas to show 'proportion of condition/drug/procedure within a time window (short medum long). We would not use something like average age in this way (because the values sumed would be ages between 0 and 99) but instead we can put people into ‘age groups’ and have yes/no flags for if a person is in an age group, and create proportions by age group (ie: 35% of peole are between 30 and 40).
Circling back to the sql you want: if you want to provide the sql to yield the staristics you want to summarize, just make sure you yield those specific columns, and it will show up in the characterization report. Start with the same I gave you (which just counts people who have any condition concept within 30d before and 30d after index) and then play around with expanding that to locate (and count) the people you would like to summarize in the report.