OMOP limitations

fabbondanza · February 9, 2021, 8:04am

Good morning,

I’m Filippo, currently 3rd year PhD student on learning disorders in the UK. For my PhD I have access to several biobanks (UKBiobank, ALSPAC, and some small in-house clinical ones). I was thinking about moving the data into OMOP but I have a couple of questions.

I know all the advantages of using OMOP (or other CDM like i2b2, Sentinel or PCORnet) but I was wondering which are the current OMOP limitations, apart from storing genomic data?
I’ve seen some effort has been put to transform UKBiobank data into OMOP but it seems there is a lot of data loss. What are the best practices when there is a lot of loss of information when mapping multiple biobanks to OMOP? I guess you would just use the source_id and have a missing value in the concept_name?

Thank you!
Filippo

Christian_Reich · February 9, 2021, 12:04pm

@fabbondanza:

OMOP is probably the most encompassing of the 4 others, meaning, you can always “downmap”. It’s the largest communities, with the tallest technology stack and tool suite. And there are now genomic variants supported as well. But you were asking for limitations, not for a sales job. A standard OMOP job tends to be more work, particularly than i2b2, because of the deeper level of harmonization. If you are part of PCORNet you would be funded, but you would know that fact already. OHDSI doesn’t typically directly fund collaborators.
Data loss? The UK Biobank has a lot of information that is only useful in the context of that particular initiative. Obviously, that is lost if you take the context away. Other than that - folks have mapped the longitudinal clinical data and the survey data, as far as they are queriable out of context. What are you alluding to?

fabbondanza · February 9, 2021, 2:00pm

Thanks @Christian_Reich for the answers!

Makes sense, I saw ODHSI has a lot of R tools and packages … good sales pitch Yes, indeed it seems quite a bit of work for harmonising the data
Sorry for not being clearer! In here I’m trying to understand what researchers normally do when they have issues mapping multiple cohorts to OMOP (e.g. can not find standard vocabularies/codings to use or the data may not fit the OMOP tables). Is there any best-practice around this or any resource I could use (appreciate there can be a lot of different issues but was wondering if there are any standardised practices)? For example when a standard concept_name is not available for the data, do researchers normally map the data to OMOP but set the concept_name to an empty value and use instead the source_value for down-stream processes? I’ve put an example of UKBiobank cause from the github repo it seemed that a lot of data could not be mapped and I was wondering how researchers would be able to use that data.

Thank you again!

Christian_Reich · February 9, 2021, 4:02pm

If the vocabularies are public bring them on. There is a Vocabulary Team who can map them in. If they are local, you still can get help with the mapping, or you spend some money on people who can do it for you.

Hm. Not sure. Whatever is relevant to the healthcare experience of a patient should be covered. The rest: OMOP does not cover facts about running a healthcare institution (e.g. hourly rates of nurses), and it does not cover data about other parts of the patient’s life, unless it has an effect on the health. Do you have something in mind?

That’s a possibility, but it’s ugly and makes it non-standard, i.e. not useful in a network. Remember: converting to the standard enables systematic and standardized research in a network. If you want to research only the UKB - no need to use the OMOP route.

fabbondanza · February 9, 2021, 10:50pm

Thank you a lot @Christian_Reich very useful insights!