@andrew, i think the language here could be ambiguous to other newcomers in the community, so let me try to restate @Christian_Reich’s comment:
One CDM instance represents one observational database that contains a set of persons with some capture of clinical observations about those persons. Some organization have access to multiple observational databases (ex: one could license CPRD, MarketScan, Optum, PharMetrics), and our recommendation is to maintain each of these disparate populations as separate CDMs. There has been occasional discussion that comes up from time-to-time where someone considers instead building only 1 CDM instance that contains all data from all sources (e.g. stack all persons from CPRD, MarketScan, Optum, PharMetrics in one massive database), and then they ask “where’s the field in the CDM tables that lets me preserve provenance from which source the patient came from?”. The answer: there are no such fields in the OMOP CDM, because this is not recommended behavior. Instead, we recommend that you treat each database as a separate collection of patients and distinct vantage point of the healthcare system and associated data capture process. Rather than running 1 analysis against an amalgamated database, we suggest you run the same analysis consistently across each database, and then you can synthesize the evidence that arise from your data network.
But I want to separate this notion of ‘pooling populations’ from the idea that @Andrew is raising, which is ‘linking persons’. Certainly, it is reasonable (and generally expected) that a given population may have patient-level data that might come from disparate sources. Simple examples: an administrative claims system is typically made up of disparate fields of medical service claims and pharmacy dispensing claims, and may further be linked with laboratory measurement data or health risk assessments. A clinical registry may represent multiple data feeds brought together at the person-level, and registry-claims linkages (like SEER-Medicare) combine these ideas together. And if you are looking to maintain this type of provenance (e.g. ‘where did the clinical observation for this patient come from?’), then that’s the explicit intent of the _TYPE_CONCEPT_ID fields in every OMOP CDM table.