Hello Everyone,
Let’s consider a scenario as shown below
Patient A - had drugs `DRUG A`, `DRUG B`, `DRUG C`, `DRUG D`
Patient B - had drugs `DRUG C`, `DRUG D`, `DRUG E`, `DRUG F`
Let’s say that Drugs A, B and C
belong to the category Drugs used in Diabetes
- Like ATC 2nd level. I group drugs to reduce the feature space (for computational purpose)
Drugs D, E, F
belong to the category Drugs used in Hypertension
- Like another ATC 2nd level.
While from documentation I found that feature extraction produces an output like below (which I validated by running as well)
Current output - feature extraction | ||||||
---|---|---|---|---|---|---|
Drug A | Drug B | Drug C | Drug D | Drug E | Drug F | |
Patient A | 1 | 1 | 1 | 1 | 0 | 0 |
Patient B | 0 | 0 | 1 | 1 | 1 | 1 |
But I expect my output to be like as shown below
Expected output - Feature extraction | ||
---|---|---|
Drugs used in Diabetes | Drugs used in Hypertension | |
Patient A | 1 | 1 |
Patient B | 1 | 1 |
Can someone guide me on how can I do this using Feature Extraction package?
a) While I can write SQL to get the higher level terms but how do I make Feature extraction package produce an output as shown above? Usually with database connection and cohort id, feature extraction computes binary/freq features at each drug level. But I would like to do the same at the class/group level
b) Is there anyone who has done this earlier?