Handling covariate matrix

Hi everyone,

I am currently trying to implement a reweighting algorithm.
I need some help / advice regarding how the covariate matrix should be handled.

The packages that I use / my custom implementation expects features that are organized as:
sample * features
i.e., the rows would represent each patient and the columns each feature.

However, it is rather unclear how features are originally handled in the cyclops library documentation. I tried checking out some internal variables, and it seems that:

  1. Features are all transformed into one-hot: for example, lab values are transformed like “high sodium within 1 yrs of index”
  2. Only a few chosen features, such as comorbidity index, are integer-valued and regularized afterwards

My questions are:

  1. Are these one hot features only for patients with corresponding records?
  2. Would it make sense to transform this into a covariate matrix as mentioned? I plan to simply have the value to 0 when the patients do not have that records.
  3. How are these variables regularized?

Thanks in advance for any help.

Hi Min-Gyu,

You can find some clues how covariates are constructed via FeatureExtraction package, and how those were regularized in cyclops or patient-level prediciton packages.

1 Like