Hi @lilipeng! You may find this discussion we had in the past on a similar topic interesting to read.
I read your draft protocol. I definitely think you’re heading in the right direction. I’m not entirely sure what statistics exactly you want to capture. You focus on drug-outcome pairs (“for each unique condition…number of people exposed to a drug”). That will create a matrix with a lot of cells with count 1, and we usually don’t share patient-level information. Would it be ok to just capture information on the number of people with a condition (for all condition concepts), and the number of people with a drug (for all drug concepts) separately?
Personally, I wouldn’t use the ACHILLES summary statistics as the input for a study as your protocol currently suggests. I don’t like the extra dependency, and the numbers are just as easily computed directly from the CDM. But I’m sure others will disagree