I think until that is fleshed out we can provide conventions on the set up we currently have. Any new proposal of course can change conventions set.
Given everyone’s feedback above I think people are agreeing that there are allowable scenarios to eliminate a PERSON but suggesting that one of those reasons is “unknown gender” seems to give people some heart burn. Let me suggest an update to the verbiage:
It is not required that all subjects from the raw data be carried over to the CDM, in fact removing people that are not of high enough quality may help researchers using the CDM. Example scenarios to remove subjects include: a person’s year of birth or age are unreasonable (e.g. born in 1800 or 2999), person lacks prescription or health benefits in claims database (i.e. thus you do not have a complete picture of their record), or raw data states that the person may not be of high research quality (e.g. CPRD will actually suggest which people not to use within research). Removal of a patient is not required and should be made in consideration of the raw data source. Reasons for removal of persons should be documented in the ETL documentation and METADATA table.
What this convention does is lets people know it is okay to remove persons and a few examples to get them thinking of when it might make sense in their own CDM. In the end, in your ETL if you want to drop people for a given reason you can, if you want to keep everyone you can. The best place for these notes to exist are the Metadata table (@Ajit_Londhe ) but as that still matures the ETL document can also document the choices made.
Still interested in people’s thoughts . . .