OHDSI Home | Forums | Wiki | Github

Feature Extraction - FAQ's

Hello Everyone,

Let’s consider a scenario as shown below

Patient A - had drugs `DRUG A`, `DRUG B`, `DRUG C`, `DRUG D`

Patient B - had drugs `DRUG C`, `DRUG D`, `DRUG E`, `DRUG F`

Let’s say that Drugs A, B and C belong to the category Drugs used in Diabetes - Like ATC 2nd level. I group drugs to reduce the feature space (for computational purpose)

Drugs D, E, F belong to the category Drugs used in Hypertension - Like another ATC 2nd level.

While from documentation I found that feature extraction produces an output like below (which I validated by running as well)

Current output - feature extraction
Drug A Drug B Drug C Drug D Drug E Drug F
Patient A 1 1 1 1 0 0
Patient B 0 0 1 1 1 1

But I expect my output to be like as shown below

Expected output - Feature extraction
Drugs used in Diabetes Drugs used in Hypertension
Patient A 1 1
Patient B 1 1

Can someone guide me on how can I do this using Feature Extraction package?

a) While I can write SQL to get the higher level terms but how do I make Feature extraction package produce an output as shown above? Usually with database connection and cohort id, feature extraction computes binary/freq features at each drug level. But I would like to do the same at the class/group level

b) Is there anyone who has done this earlier?

To my knowledge, we can build custom covariates using the feature extraction package, so using it, we can use specific cohorts as covariates. In covid prediction study, @RossW used it in covid19 prediction study. I’m not sure it is what you want, but I think you can apply this :grinning:

1 Like