Let me answer in detail, because we keep running into these arguments quite a lot, and I am pretty passionate about not going down a spiral that will turn a pragmatic and useful model into a nightmare.
If you happen to know a “human algorithm” please make an introduction. I would love to talk about his/her/its feelings, maybe over lunch.
Here it is. Right here: “Researchers”. Look: The CDM has to support algorithms where no Researcher has seen the data. For example, for our distributed studies we will develop the code using one database, and then it will have to run on all the others with no more intervention, because most databases will be off limits to the Researcher. All content has to be organized so it can be blindly relied upon. That is why we cannot invent database-specific conventions as we go. It’s a standard.
We could do that, but: Each time we introduce such a detail all tools and methods will have to change and incorporate these options making if/then/else statements, when right now they just rely on the birth year. So, there is a tradeoff between capturing every possible piece of information and keeping the CDM from becoming unwieldy. Unless we have a really good use case for this imputed/non-imputed thing I would veto it.
Remember: The CDM is there to project real patients and their healthcare experience. Not the idiosyncrasies of all sorts of databases collected for all sorts of reasons.
If “Researchers” do that post-CDM, which means, in some piece of code that they write for themselves - fine. But they should not do that when filling out the CDM. Patients with no birth years should be out. If folks impute anyway - they can, however, it will make a lot of the algorithms fail, which are written with the expectation that the birth year is known. C*** in c*** out.
So, unless you have a good reason otherwise I’d say no Jesusses and no null birth years. If people have data that cannot be used for our type of work - don’t use the CDM.