How to create thousands of concept sets for ATLAS?

I need very granular ATLAS concept sets for my project. For example, a concept set could be “albumin” (just a few concept_ids for a lab).

Is there a way I could derive these using the vocabulary table or maybe something that already exists in the OHDSI community? The reason I ask this question is because I do need thousands of labs, drugs, procedures, etc.

Thank you for your help.

Could you share a bit more about your use case? More context could help in figuring out how best to accomplish your goal.

Some suggestions for starters:

Without knowing more about your use case, and assuming that generating thousands of concept sets is the right approach for it, I would suggest the following:

  1. Create an input csv that has the keyword(s), domain, and a pre-determined concept_set_id
  2. Search the standard concepts that match the keywords
  3. For each concept, find the mapped concepts
  4. Expand on this using the concept_relationship, concept_ancestor, concept_synonym tables
  5. Export each concept set to a directory with each folder containing the three csv files that Atlas would export: one with metadata (conceptSetExpression.csv), one with all the standard concepts in the set (includedConcepts.csv), one with the mapped concepts (mappedConcepts.csv).
  6. Create some type of summary report for this for observability.

Am I right on if this is what you are looking for? The output being a concept set for each row of the input csv?

EDIT:
This seems like it would be possible with Capr as @katy-sadowski mentioned, but may require an additional feature be added for the bulk method. Thought it would be fun to write this out in Python (please don’t use these for production purposes)

These scripts assume that you have a local duckdb database with the OMOP CDM (db.ddb)

The input csv should look something like this:

"keywords","domain","concept_set_id","set_name"
"diabetes mellitus, type 2 diabetes","Condition","1001","Diabetes Conditions"
"aspirin, ibuprofen, naproxen","Drug","2001","Common NSAIDs"
"pneumonia, bronchitis, respiratory infection","Condition","1002","Respiratory Infections"
"coronary artery disease, myocardial infarction, angina","Condition","1003","Cardiac Conditions"
"appendectomy, cholecystectomy","Procedure","3001","Common Abdominal Surgeries"
"metformin, insulin, glipizide","Drug","2002","Diabetes Medications"
"fracture, bone injury","Condition","1004","Bone Injuries"
"vaccination, immunization","Procedure","3002","Immunization Procedures"
"breast cancer, lung cancer, colon cancer","Condition","1005","Common Cancers"
"statins, atorvastatin, simvastatin","Drug","2003","Lipid-Lowering Medications"

generate_concept_sets.py

convert_concept_set_csv_to_json.py

I wrote a basic starting point out in R as well if that is more helpful:
programmatic_concept_set.pdf (24.6 KB)

There is some functionality in Capr we could extend to support this better. I will look into with @mdlavallee92 and @Adam_Black