OHDSI Home | Forums | Wiki | Github

Who is working with UK Biobank?

(Patrick Ryan) #1

I have heard in various conversations that researchers have been working with UK Biobank and did/planned to/want to convert their data instance to the OMOP CDM. Since this is a valuable data resource that can be used by many organizations, it seems a natural opportunity for community collaboration around a common ETL for this data. Does anyone have anything they could share to get this ball rolling?

(ella) #2

This is very exciting to hear ! We are mapping from Cerner Millenium to OMOP, so our ability to help depends on the format/nature of the source data coming from the biobank… do they use standard vocabularies ? keep us posted !

(Gregory Klebanov) #3

@Patrick_Ryan we do not currently work with UK Biobank data but definitely aware of this resource and would be interested in joining this effort and can contribute resources for both ETL and vocab mappings.

(Kyriakos Schwarz) #5

It would also be interesting for us.

(Nicholas Tatonetti) #6

Hi @all – Just picking this back up to see if anyone has worked with getting the UKBB into the CDM? HMU! :slight_smile:

(Michael Cantor) #7

We are also interested in getting UKB into the CDM- we are doing a lot of work with it at Regeneron.

Maybe a side meeting at AMIA (or the Columbia reception) to figure out how to get this moving?

(Spiros Denaxas) #8

Hey @all, we are actively working on converting the UK BIobank to OMOP. We’ve done quite a lot of work on the EHR component (hospital and primary care) as it’s similar to another dataset we’ve converted to OMOP (called CALIBER) but there’s still loads to do. Happy to coordinate around this and maybe have a meeting/discussion over the next month or so?

(Seng Chan You) #9

@spiros I love to join :slight_smile:

(Michael Cantor) #10

Hi Spiros and all-
We are working within the UKB Pharma consortium to map the UKB to OMOP. We are in the process of getting the group of interested companies together and will most likely start the work (with a vendor) within the next month or two. We have been talking w/the UKB leadership and they are supportive of the project. Would be great to hear how far along you all are since we may be able to limit the scope of our project and get it done faster.

(Gregory Klebanov) #11

Hi Spiros / Michael,

Thanks for bring this up - we are also converting BioBank UK to OMOP, I would say in early stages at this point. Would be great to connect

(Seng Chan You) #12

So great!!

Is there anything we can contribue to? this is one of my reasons worrking for genomic CDM. :smiley:

(Spiros Denaxas) #13

Dear all, this is great, thanks for the enthusiastic responses.

Can I suggest, as a first step, we get together on a call to discuss who is doing what and try to coordinate? We are doing this work as part of a larger IMI project called BigData@Heart and working with an SME in the Netherlands (the Hyve) plus some inhouse developers at University College London.

Could interested parties please send me an email (s.denaxas at ucl.ac.uk) and I will organize a call to discuss next steps.


I am interested in joining. Already sent an email to @spiros

(Vojtech Huser) #15

We are also working with it. (have a pending project application (see below)

The list of their Data Elements is public here: http://biobank.ctsu.ox.ac.uk/showcase/browse.cgi

For those who are further along, I would be curious to know how the files are organized. In what language you are developing (planing to develop) your ETL? (we may use just R (mostly tidyverse) and skip SQL if possible)

(Nicole Washington) #16

I would be interested in joining the working group as well. I have been working with UKB data for the last year and a half and have access to 1000’s data elements, but haven’t yet transformed to OMOP. Would like to coordinate and/or use this as a great test case for developing a new ETL. (We’re a python house, so that’s probably the language I’d prefer.) @spiros

(Spiros Denaxas) #17

Files are in CSV - in theory usable in both R and Python, in practice a bit of a pain as baseline data are in “wide format” i.e. base table is 500000x9000 or so - challenging to load in Pandas, even if you specify dtypes manually, more luck using Dask but still SQL is much much faster and more intuitive.

(Spiros Denaxas) #18

Hey Nicole, thanks, please drop me an email so I can add your address to the mailing list.

(Spiros Denaxas) #19

Dear all, I’ve created a Doodle to help us find a suitable date/time to have an initial discussion - could you please have a look and mark your availability accordingly ?


p.s. apologies to EU friends, I’ve set timeslots late in the PM to enable US colleagues to join.

(Vojtech Huser) #20

Some notes from the first meeting UKBB WG meeting:

Spiros, please announce when is our next meeting.

(Vojtech Huser) #21

We continue to explore the CDEs in UKBB.

@MaximMoinat - we would like to join forces with your team to work on them.

At the link, we created an overview using the R package referenced in the past.

When is the next meeting?
You had a google drive folder with nice outputs as well. Can you please get in touch with me - so that we don’t duplicate effort.