Will probably repeat multiple things that people have already said.
When our team is testing the results of our OMOP CDM ETL (or validating someone’s OMOP CDM), we do in multiple steps:
- Test data in the OMOP CDM from an integrity perspective e.g. referential integrity, key uniqueness, constraints, allowed value ranges etc… Some database offer DB constraints that should definitely be enabled after data is loaded.
- Check mappings rates e.g. how no. of records successfully mapped into standard values
- Check compliance with THEMIS business rules
- Look for any anomalies in ACHILLES reports. For example data shifts or spikes, absence of data etc…
(with the DQ Dashboard that our OHDSI team is working on, we will be able to standardize on and automate several things I listed above)
Then, as a part of UAT - we often take some of the existing queries and analyses that were successfully completed on source (“raw”) data and replicate those on OMOP CDM data and validate the results. It is not necessarily will produce the same results (and most likely will not) but gives a good idea on what changed during the ETL process. You can use ATLAS for designing these cohorts or analyses - but you do not have to. So, If you already have some existing queries against your Semantic database, you could convert it into OMOP CDM dialect and run it to compare the output in both cases.