tl;dr
Are there any approaches to mapping source data to recognised concepts iteratively, in parallel with (or after) building the ETL to OMOP?
What is it like to map a large (uncoded) dataset by hand with Usagi?
—
Hi Everyone, I work in a teaching hospital in the UK, and am in need of some advice about the OMOP CDM.
We are just in the process of making the decision of which data model to use for a new clinical data warehouse coming as part of a programme of investment in IT. The first stage of the process has been to pull all of the data we need (ADT, Pharmacy etc.) into a noSQL staging area. From there, we want to ETL the data into something mere mortals like me can access…
We’ve been considering a few options. Two main contenders have emerged. One has been to load all of the data into OMOP. The other is to use some sort of derivative of i2b2, with a dimensional structure and a single EAV fact table (with a rudimentary conceptual taxonomy).
In making our decision between the two, OMOP is our preferred model for interoperability and tooling. Nevertheless, we have some reservations about the amount of work required to complete the semantic portion of the ETL. In particular, our project team is drawn from one specialty, while our data set spans several adjacent departments too - so we are concerned that mapping concepts from (predominantly uncoded) source data will be unfeasibly time-consuming.
Are there any tricks we can use to gradually build this semantic work (for adjacent specialties) into our ETL?
From what I understand, each OMOP domain contains columns for the source values of each data item — but the source concept still has to be attached to a recognised (if not standard) concept. Is there a way we could tweak this to accept our local dictionaries (in the interim)? Along similar lines, if we were to default to exploiting the EAV structure of the observation table for facts we struggled to initially pass into the other domains - would that be heretical?
Or from a different angle, can anyone share experiences of mapping (and then verifying) large datasets by hand with Usagi?