What would it take to augment Achilles so that it calculates person counts for non-standard concepts and source concepts; and have those values flow into achilles_result_concept_count? I may be able to draft a solution and submitting a pull request, but I need some guidance.
Parts look pretty straight forward. There are existing SQL that generate person counts for standard concepts (for example for conditions here). Adapting that for condition_source_concept_id would be easy - but it would need its own numbering system.
From there, I’d have basic questions such as:
Can I pick a suffix of 40 for person counts of *_source_concept_id? For example, 440, 640, 840 for conditions, procedures, and observations, respectively? If not, what is a good numeric suffix to use?
How should such new scripts in inst/sql/sql_server/analyses be added to the workflow (e.g. so that they are executed from R)? I’m not clear on which R function calls those scripts.
Where should de-duplication be done (since some source concepts are also standard concepts)?
What is the proper way to take that output and incorporate it into the WebAPI for achilles_result_concept_count? Is it simply a matter of adding the new analysis_ids to its script?
Are there other needed tasks (or stumbling blocks) to watch out for?
I believe the latest version of Achilles already produces the source-concept record counts. The problem is that the achilles_result_concept_count script in Achilles does not create the source counts.
In 2.13 of WebAPI, we’ve introduced a script that will populate the achilles_result_concept_count with person counts and source code counts. Specifically, it pulls counts from the condition_source_concept_id, drug_source_concept_id, etc analyses and aggregates them.
If this table is populated from this script, Atlas/WebAPI will read from this table and produce the counts for the UI.
@Chris_Knoll , I just tested the latest scripts you mentioned on our data. I do now see PC and DPC for custom codes (e.g. local codes greater than 2 Billion) that are considered standard codes. However, PC and DPC are not generated for non-standard codes > 2 Billion, or for non-standard source codes (such as ICD-10 codes).
So, I’d be happy to augment Achilles to create person counts for _source_concept_id, but I need some guidance on how to incorporate those new scripts into the Achilles workflow.
The current analysis (425 for condition_occurrence) is just calculating record counts, not person counts. Standard or non-standard concepts is not relevant here, it just takes the condition_source_concept_id from condition occurrence and writes it to the achilles results as analysis_id = 425. This analysis is referenced in the achilles_result_concept_count script.
However, if you check the section on the counts_person CTE, it is picking a much smaller list of analysis_ids, and those related to distinct persons with at least one concept_id from the given domain. (example: Analysis 400 is condition, 600 is procedure, 700 is drug, etc).
The change would be to add a new analysis to calculate person counts (like 400) but using the condition_source_concept column. You can take the analysis 400 sql, and just change it to use condition_source concept. You’ll need to define a new analysis id, I think 440 works and you can do this for the other domains (600-> 640, … 700->740…etc). this will make the analysis that end in *40.sql the analysis for distinct people by source_concept.
The next step is to make Achilles execute those analyses during the call to achilles(), and this is controlled in the achilles_analysis_details.csv file.
So, the order of changes are:
Copy the existing person count analysis into a new file, and name that file the new analysis ID. Example: condition_occurrence condition_concept_id (400.sql) copied to 440.sql, and modify the script to use the condition_source_concept_id column.
Update achilles_analysis_details.csv bo copying the row of the analysis you are copying from, and add a new row but specify the {domain}_source_concept as the stratum_1 value. Also make the name of the analysis 'Number of persons with at least one {domain} by {domain}_source_concept
Once you have the results generated, we’ll need to modify the WebAPI achilles_result_concept_count script to incorporate the new analysis_ids you’ve defined (440, 640, 740, etc).
Hope that makes sense! Happy to help if you have any other questions.
Thanks, @Chris_Knoll , that was easy. I made the changes, confirmed that they work as expected on my Databricks instance, and submitted linked pull requests for Achilles and WebAPI.