I will. Let me tell you a little about our business because it influences our decisions.
Our/University of Colorado’s reason for using The OMOP CDM is a little different than most active forum users. We use the OMOP CDM as our primary data warehouse and deliver (just starting this phase now) datasets to our customers (providers, managers in the hospital system doing QA/QI, PhD students, professors, and all our data collaborators). We do incremental loads from the source through the pipeline to OMOP every day. We do not do the analytics. If we can’t use the OMOP CDM to deliver datasets to our customer, maintaining OMOP would be prohibitive. All these factors give us a different perspective.
We did this mostly in parallel with the goal of mapping
codes first. The Person table is easy, the Visit table took more work, but only because the source has a different view of a visit than the OHDSI community. It took a couple days of using Usagi to go from 86% to 99% of drugs administered to a patient being mapped to a standard concept, which was easy because drug names are fairly standardized. Conditions and Procedures are tough because the terminology varies from source names to target concepts. And social history (tobacco, alcohol, drugs) is a beast. We are also working on other data that belongs in the Observation table. This data is mostly unmapped, but also not used in network studies (pain, scoring systems, ventilator settings, etc.) @Christian_Reich.
Organizing the data is a lot of work and very error prone. As @mgkahn stated, we take very large spreadsheets of local codes and put them in to Usagi one domain at a time. Then the output is altered to fit the Concept and Concept Relationship tables (@Christian_Reich help here) - also an error prone step. Next is uploading them to the necessary tables and testing for accuracy. All of this is very time consuming. Yes, a lot of work, but the time savings, especially with using the hierarchies, makes the effort worth it, I hope We haven’t put this into production.
I would put in only the necessary data, not everything, just what is going to be used, in to the source_value column during the ETL. I suggest putting in the local code and the name separated with a delimiter. You will want to get on the standard side of the model as quickly as possible to leverage the vocabulary. The power of the OMOP CDM is in the vocabularies! Leveraging one concept and its hierarchy is much easier than using a string search to find all relevant concepts and then listing out each concept (error prone).
Put the data where it belongs or the tools won’t work. But if it doesn’t belong in any other domain, it goes into the Observation table and SQL will pull it out.
I’m happy to share or elaborate, @TCKeen!