It is always helpful if a query produces data that are
guaranteed to be free of protected health information. Many EHR-based
(non-claims) warehouses contain test patients and some have some way of tagging
test patients. Test patients are also useful for knowing how facts on the
screen in an EHR end up showing in the warehouse. (see the ETL impact in the
data)
Perhaps CDM should somehow allow tagging certain patients (person_id’s) as test patients.
If the same query across CDM sites always displays dummy
test patient data, it may be useful.
We should not put meaning into an ID, but one very simple option is that
person_id = 0 is always a test patient. (we already use a vocabulary_id = 0 trick
once) (I think this approach is not the best and would allow only 1 test patient)
A more flexible way might be to define a “test-patient” location in the LOCATION table (perhaps with
location_id=0) and assigning this test-patient location to test patients (this would allow to have multiple test patients (we have many unreal, dummy, test patients in our NIH CC BTRIS warehouse). In the CDM - the table PERSON table
has a column location_id that refers to the LOCATION table).
(The most “clean” approach would obviously be a test patient flag in the schema (PERSON table) but that requires schema change rather than just a new “CDM convention rule”.)
Do you currently allow test patients into your CDM data view?
Do you see value in keeping them in the dataset?
Do you think we should have the same way of tagging test patients across sites?