Hi, Just adding my 2 cents:
I think we need to draw the line between ‘features’ and ‘phenotypes’. Features, in my experience, do not have precise dates around them (otherwise, you might sort features for a person by time relative to index, which I don’t think anyone does). Phenotypes are supposed to represent the period of time of an ‘active status of a clinical concept’, such as a disease state or treatment status.
I see a few problems with H/O codes with respect to ‘active state’: May I use an example of ‘History of nosebleeds’? Ok, so you have a visit, and you record history of nosebleeds. Is this an active status? No, if it was they would record a nosebleed. Does the H/O code represent a single event? Probably not, if you say ‘I have a history of nosebleeds’ it probably means it’s a recurring problem. How do you get an index date from a statement of a recurrence of problems? This is where @Gowtham_Rao is talking about ‘not using H/O codes as entry events of a cohort’, because the H/O codes only state one thing about timing: the actual event happened some time prior to the date the H/O code was recorded. From a ‘active state of a phenotype’ perspective, it’s erroneous. However, if you have an entry event (index date) that identifies something ‘active’ like ‘exposed to blood thinners’, and you want to limit these index events to those with a prior nosebleed, you would have an inclusion criteria of either ‘active nosebleeds’ or ‘H/O nosebleeds’, therefore using both active codes and H/O codes in your phenotype. If you want to say something about ‘recent’ nosebleeds, however, you have to think carefully if a H/O code can represent the appropriate fidelity of recent or not, and this is where you make the choice to use the code or not.
Prevalence vs. Incidence:
I have found that there’s a lot of debase about prevalence and incidence, but one thing I’ve realized is that both prevalence and incidence should be using ‘active state’ of a phenotype to determine either. The difference between prevalence and incidence is simply how the ‘active state’ overlaps the specified time-at-risk: if the active state starts during a time at risk, it’s incidence, if it overlaps with the time at risk, it is prevalence. I’m sure this perspective is controversial, but it’s the definition I use and from this perspective, dates matter.
Features vs. Phenotypes
So, with the above said, I think H/O codes have utility in the context of features. I also think phenotypes (which are identifying active state) can be leveraged to create features of their own. The principle that ‘phenotypes are not code lists’ should resonate here for everyone, in that you can create features for predictive models from H/O codes, you can create them from phenotypes, but when you push H/O codes into phenotypes, I agree with @Gowtham_Rao that H/O codes for entry events have limitations that should be carefully considered.