My two cents:
In terms of content: if you have multiple sources that feed information
for a given patient which is linkable across PERSON_ID and covers the same
OBSERVATION_PERIOD, then there’s likely value from an analytical
perspective in bringing those sources together, since it should provide you
a more complete lens into a person’s medical history, thereby improving
your ability to classify exposures, outcomes, and other relevant covariates.
If you have multiple sources that feed different patients, then the choice
of whether to bring together or keep separate should be driven by your
analytical use case and whether you are comfortable with pooling under the
assumption that the different sources and different patients are
effectively comparable. In my experience, most data sources I have worked
with have grossly violated that assumption (2 EHRs are never alike, nor are
2 claims databases). Therefore, for my analytical use cases - clinical
characterization of disease natural history and treatment utilization,
population-level effect estimation, patient-level prediction - I recommend
keep different sources with different populations in different CDM
instances. Then, after an analysis is performed on each CDM, you can
evaluate whether it is appropriate to pool the results (but not the
patient-level data) for any given analysis. But if you opt to pool all
data into one instance, then you won’t be able to back out the
source-specific data unless it cleanly divides by the TYPE_CONCEPT_ID in
each domain.
As a general principle within OHDSI network studies, we expect there may be
heterogenity across the data network and require that we be fully
transparent, and therefore we will always report source-specific estimates
and consider if a composite summary is additionally warranted, but we won’t
only report one aggregate summary.