I am not well versed in BigQuery, so I’m not sure what the mechanism for data distribution is. But for things like Redshift and PDW, one important consideration is your ‘distribution key’ that is used to divide the data across the cluster. A good distribution means data is evenly distributed across the nodes so that your distributed queries process similar volumes of data across nodes. At the same time, you want to distribute on the same key as your joins so that you don’t do cross-node joins.
What this means for a CDM context is that you’ll distribute your data on person_id: person IDs are going to be even distributed across patients (it’s a unique ID), it’s likely that patient-level data will be evenly distributed across all people (some people will have a lot, some people a little, but that distribution will be random across person_id), and most importantly: a given patient_id is linked to all domains of data via their person_id, so you’ll be joining on this column the most, so you want a persons health records all co-located on the same node of a cluster.
Beyond that, from a BigQuery perspective, I would recommend @tom.white.md and @Ajit_Londhe for their input because they have technical expertise.