Adding PRS and variant-carrier status as features in PatientLevelPrediction — encoding, leakage, and cross-ancestry calibration

With the PatientLevelPrediction workgroup picking back up, it seems worth opening a thread specifically on genetic covariates. PLP’s FeatureExtraction is built for time-varying clinical observations, but polygenic risk scores (PRS) and germline variant-carrier flags break several of those assumptions — they’re time-invariant, computed off-platform, and (in most networks) live outside the OMOP tables that FeatureExtraction reads from. A few questions where consensus would be useful:

1. Where do genetic features live in the CDM, and how are they pulled into PLP?
Most teams seem to store PRS and carrier status either as measurement rows with custom concept IDs or as a parallel genomic_* table joined on person_id. Either choice has consequences for how (or whether) FeatureExtraction picks them up. Has anyone published a working covariateSettings recipe that includes PRS deciles + a set of carrier flags alongside the standard demographic / condition / drug covariates?

2. Encoding choices: continuous, binned, or stratified?
For PRS: raw z-score, deciles, top-x% indicator? For carrier status: binary, allele dosage (0/1/2), or gene-burden collapsed across rare variants? The choice interacts with downstream calibration and with how interpretable the model is to clinicians. Curious what others have found stable across validation cohorts.

3. Temporal leakage despite time-invariant genotype.
Genotype itself doesn’t leak — but the record of it often does. A ClinVar-classified variant note entered into the EHR after diagnosis, or a genetic-test-result observation timestamped at workup, can encode the outcome. How are people anchoring genetic features to a pre-index “baseline-at-birth” timestamp rather than the EHR observation date, and is there a clean convention for this in the CDM?

4. Cross-ancestry portability when validating across the network.
PRS weights derived from European-ancestry GWAS attenuate substantially in non-European cohorts, so a PLP model that performs well at a single site can degrade sharply when externally validated on an ancestry-diverse network. Are teams (a) stratifying PRS by genetically inferred ancestry, (b) recalibrating per site, (c) using multi-ancestry methods like PRS-CSx, or (d) reporting per-ancestry performance as part of the standard PLP output?

5. Reproducibility of the PRS itself.
Sharing a trained PLP model is straightforward; sharing the PRS that feeds it is not. Should the OHDSI PRS convention be a model card pointing at PGS Catalog IDs, with site-level recomputation, or actual score_* concept IDs anchored in the vocabulary?

Would also be interested whether the WG sees this as in-scope for the restart or as adjacent work that needs its own subgroup.