I’m currently working with an NHS Trust in the North West of England to kick off the implementation of a local OMOP CDM and I’m at the stage where I would like to use White Rabbit to assess to readiness of my data. However, a few concerns have been raised internally as to the IG aspect of the White Rabbit tool, with it accessing live patient data in our Snowflake environment.
I would be interested to hear from other NHS trusts who have used this tool directly linked to their Snowflake environment.
I work at The Hyve, where we use WhiteRabbit frequently, and we are also involved in maintaining it. The issue you raise comes up in projects, and also in relation to our own security standards.
On a sufficiently large and heterogeneous data source, WhiteRabbit generally does not expose sensitive data that can be used to identify individuals or derive sensitive data. We do advise to review the output (the scan reports) before sharing them outside of a trusted environment. You may want to exclude certain tables or values from a scan report.
The scan reports will contain frequent values that occur in your database, and can also expose some characteristics of the population in your source data, such as gender distribution.
WhiteRabbit scans the database tables individually, it does report data types of individual columns, frequent values and statistics (e.g. on the uniqueness of values in a column). It does not use, or expose, relations between tables or other structural information about the source data.
From a technical point of view, there is no reason to shun the use WhiteRabbit in a trusted environment. It only reads a sample of the source data, does not write into the database, and writes its scan reports locally. It does not make any (network) connections other than the connection it needs to access the source database.
Using credentials for the database that do not have permissions to alter the source data can help with some of the IG concerns.