Dataset DOI and metadata - for source datasets, outputs, and for datasets represented in OMOP CDM

mb_ardc · July 22, 2021, 1:57am

Hello all. I am seeking information on anything related to DOIs, for any data that is represented within the OHDSI framework. For example, I am wondering how a DOI in raw data would be represented within the OMOP CDM and or its associated metadata? Another scenario is if there is a DOI minted for data that is represented in OMOP CDM - how might the DataCite metadata map to the metadata that is within the model? Also, how might the DOI of the data that is represented in OMOP CDM, be indicated as related to the DOI of any raw source data? Then I wonder about how any outputs may include the DOIs of all source data, for eventual citation and reuse, and even for possible reproducing? I am very interested to hear anything along these lines. Also, if a DOI is minted at any level, I am interested in where a landing page URL may resolve to - for example within a metadata catalogue that allows data-discovery, and whether there are examples of this that I can see? I am also interested in hearing about examples of mapping to other metadata schemas too (not just DataCite)… and really anything that relates OMOP CDM (or its infrastructure - e.g. Atlas etc.) to the FAIR principles https://www.go-fair.org/fair-principles. That’s all for now - many thanks in advance. If this conversation is all going on somewhere that I just can’t find, please feel free to point me to that topic instead of having to answer it all again here - I did have a look around but I haven’t found anything in the levels that I can access.

JayGee · July 22, 2021, 10:00am

FAIR was mentioned in the Book of OHDSI: Chapter 3 Open Science | The Book of OHDSI

Here there is reference to an EHDEN project initiative and the Hyve. This poster was produced: https://www.ohdsi.org/wp-content/uploads/2020/05/Implementing-FAIR-in-OHDSI.pdf

And this conversation began: Implementing the FAIR principles in the OHDSI approach and tools

Another initiative is discussed in The Book of OHDSI that leads to FAIR. It is ARACHNE: Chapter 20 OHDSI Network Research | The Book of OHDSI. ARACHNE promises a “database catalog”. ARACHNE may still be a work in progress. It is powered by Odysseus: https://www.ohdsi.org/wp-content/uploads/2015/04/OHDSI_Odysseus_ARACHNE_Platform.pdf

In fact, along with the Hyve and Odysseus, CODATA is studying this issue now in the context of an actual use case. In this context we are considering the construction of a very light weight catalog of cohort specifications that could service the needs of a federated study in progress in East Africa. The catalog could also serve as a model for other federated studies that use OHDSI in SSA. In such a catalog we would follow Hyve recommendations and produce DOIs at several levels of specificity. Currently, one or two papers are in progress that describe this use case. Our publication plans are still in flux.

mb_ardc · July 23, 2021, 3:02am

Thank you for all of those references. That all looks good and I would like to follow all of this progress. I am down in the details at the moment, curious about how to populate DOI DataCite 4.3 metadata from the OMOP CDM tables CDM_SOURCE and METADATA. We are working this out as best we can, so I would be very pleased to see what mappings have already been done, so that we can reuse. The ARACHNE Database catalogue may be along the same lines, as we would also like to explore and understand possible DOI metadata at various levels, i.e. the database itself and or datasets within. If anyone would like to see what we have done so far and provide advice (or a full solution :D), please let me know here, or comment in the spreadsheet where the mapping has only just begun, at Public - OMOP CDM Dataset Mapping To DataCite 4.3 - Google Sheets

JayGee · July 23, 2021, 12:53pm

Yes, the CODATA group working on this would very much like to see your work. We are putting together the specification for a registry of of OHDSI cohort specifications modeled after DCAT. It is part of a larger paper for which we will have a draft in the next couple of weeks and hope to submit for pre-publication by the end of August. After we review your spreadsheet, let’s talk some more. Thank you.

mb_ardc · July 27, 2021, 2:56am

Thank you - I look forward to reading that paper.
I would like to know how a column for DCAT would look within the spreadsheet that we have begun, and whether there are additional source tables to add. Also, mappings of vocabulary terms, eventually.