When reviewing an output from a Lasso prediction model (PLP) we have found that among others it included pairs of covariates that have the same values per person (meaning the two variables are technically identical).
After investigating those covariates we found out that this was caused by one-to-many concept mapping of a single source variable. Other words, when one source variable is mapped to >1 (typically 2) standard concept and both are not excluded from baseline covariates, the PlpData stores two covariates with exactly the same information. Finally as regularization is applied during the training process we expected the model to push one of the coefficients towards zero. However the model contained both covariates with different coefficients non of which was pushed to zero, after the PLP run completed.
We where wondering if there are any guideline or any experience with how one-to-many concept mappings should be considered in the PLP models, if any considerations is nessesary?
@cssdenmark @Karoline_Bendix_CSS
Thank you
Best Julie and Eldar