Database Catalogue Developments in EHDEN project input needed

Rijnbeek · August 4, 2019, 1:01pm

Dear all,

In the European Health Data and Evidence Network (EHDEN) project we are working on a Database Catalogue. We think it is important to have a tool in which all the OMOP-CDM databases can be exposed (Making them Findable in FAIR terms). We will use this in the EHDEN project for the European Data Network but like to expand to full global database coverage in OHDSI. We have made considerable progress in the past months and like to share this with the OHDSI community and get feedback.

The Database Catalogue hosts a searchable fully customisable questionnaire with extrinsic metadata, e.g. contact details and the governance procedures. Furthermore, it hosts intrinsic metadata, i.e. information we can extract automatically from the CDM. Finally, we are working on adding visualizations across the databases in the network.

Extrinsic Metadata

Extrinsic Metadata is data you cannot directly measure from the data but needs to be provided by the data custodian. The extrinsic metadata we aim to collect is provided in this Google Sheet and we like to get feedback from the OHDSI community on this, for example:

Do we miss any extrinsic metadata that would be valuable to select a database during study design?
Is the current information easy to understand or do you suggest for formulate items differently?

Please add any remarks next to the item or in this Forum post.

Intrinsic Metadata

The aim of the intrinsic information in the Database Catalogue is to provide high-level database characteristics, e.g. cdm version, vocab version, but also number of patients etc. This intrinsic metadata will be made available through visualizations or tables whatever is more appropriate. We will provide this information on the data source and data network level.

Data source level

In Atlas we have the Data Source profile (also known as the Achilles profiles). These profiles contain a lot of detailed information. For some Data-Partners this level of detail is too much, and they don’t want to share this level of detail for various reasons. We will make a selection of the current Achilles graphs but will also add new graphs if requested by our stakeholders.

The graphs in the database catalogue are meant to be used as a first eligibility check for a study. The aim is not to replace feasibility studies in which for example cohort definitions are distributed to the data partners using federation through for example Arachne.

The Intrinsic Metadata can be extracted using the Achilles Web approach, i.e. the data custodians runs R code against the data which produces JSON files that can be uploaded in the Database Catalogue that will render the visualisations on the fly. We think that for our purpose this is a much easier, more flexible, and faster approach than currently used in ATLAS where the data is stored in the results tables.

This process is fully controlled by the data custodian. This upload functionality is already integrated in the Database Catalogue tool: the user can login and has the rights to update the profile by uploading new JSON files. To render the graphs we will re-use the OHDSI visualisation code and are also experimenting with Vega as a library.

An example can be found here: https://test.ehden.eu/DatabaseDashboard

Note this is just a test environment to try out Vega functionality and it has some cosmetic updates on Achilles Web like the country flag, more detail in the table, interactive graphs, hints per table etc. This is all under development and will change considerably in the next months based also on your input. The page is there just to tricker ideas at the moment.

Data Network Level Visualisations

Patrick and I have spend time on thinking through visualizations on the data network level. We created a crude proof of concept in spotfire which we could build for real in d3/r shiny/vega lite, but its based off simple SQL scripts running off ACHILLES results compiled across a network of different databases. The main idea was to lay out a series of graphs that we think could represent each data source but also show the totality, breadth, and diversity of a given network. We also wanted to build a tool to facilitate feasibility assessment for researchers to determine how to design studies and which sources could be useful. Below is a video Patrick recorded to walk you through the visualizations and give you our thoughts. I can highly recommend to have a look at this video!

https://drive.google.com/drive/folders/1e-X1WmdW5jmWaV-fZlrZ4KUGD-bBeQCf?usp=sharing

How to provide feedback?

For the questionnaire please provide comments in the Google Sheet or here on the forum.

For both the Database Level and Data Network Level visualizations, we have developed a short document to collect ideas. We invite you to provide feedback in the document or here in the forum whatever is easier for you. We will also involve other stakeholders that are not active on the OHDSI forum in this process.

We realise this is a lot of information but hope you will enjoy this as much as we do and we look forward to all your valuable insights!

The EHDEN Consortium.