Feature Extraction - FAQ's

Akshay · August 3, 2021, 10:25am

Hello Everyone,

Let’s consider a scenario as shown below

Patient A - had drugs `DRUG A`, `DRUG B`, `DRUG C`, `DRUG D`

Patient B - had drugs `DRUG C`, `DRUG D`, `DRUG E`, `DRUG F`

Let’s say that Drugs A, B and C belong to the category Drugs used in Diabetes - Like ATC 2nd level. I group drugs to reduce the feature space (for computational purpose)

Drugs D, E, F belong to the category Drugs used in Hypertension - Like another ATC 2nd level.

While from documentation I found that feature extraction produces an output like below (which I validated by running as well)

Current output - feature extraction

	Drug A	Drug B	Drug C	Drug D	Drug E	Drug F
Patient A	1	1	1	1	0	0
Patient B	0	0	1	1	1	1

But I expect my output to be like as shown below

Expected output - Feature extraction

	Drugs used in Diabetes	Drugs used in Hypertension
Patient A	1	1
Patient B	1	1

Can someone guide me on how can I do this using Feature Extraction package?

a) While I can write SQL to get the higher level terms but how do I make Feature extraction package produce an output as shown above? Usually with database connection and cohort id, feature extraction computes binary/freq features at each drug level. But I would like to do the same at the class/group level

b) Is there anyone who has done this earlier?

Chungsoo_Kim · August 3, 2021, 12:14pm

Hi,
To my knowledge, we can build custom covariates using the feature extraction package, so using it, we can use specific cohorts as covariates. In covid prediction study, @RossW used it in covid19 prediction study. I’m not sure it is what you want, but I think you can apply this