OHDSI Home | Forums | Wiki | Github

Metadata WG: Call for actual metadata examples

Metadata enthusiasts,

As discussed last time, before we can move from simple metadata use cases to nitty-gritty data modelling, we should fully catalog actual metadata examples that we think should be stored in the metadata schema. I’ve created a spreadsheet to help us with this step. The goal is to identify patterns in the examples so that we can design appropriate tables.

Could you add your examples to the spreadsheet before our next meeting, which I’ve rescheduled to Tuesday 6/26? For each metadata example, you can provide short and long descriptions, category (Temporal Event, CDM Design, Source, Content), and perspective (Database, Domain, Concept, Cohort, Person). For reference, the categories are from last time’s discussion (PPT slides). Additionally, try to capture the message of the metadata artifact, along with the basic elements of it, to help us determine ways to structure it.

Spreadsheet: https://docs.google.com/spreadsheets/d/1X7HxFf1qnLr3NEDNDoGxrcuaa1eTz6O9EB0L9msRMG4/edit?usp=sharing

Thanks,
Ajit

Hello Ajit,

Interesting, this makes it very concrete what you mean by metadata. This is indeed key information for observational research, and providing this would be in line with FAIR Data Principle R1.2: (Meta)data are associated with detailed provenance (https://www.go-fair.org/fair-principles/r1-2-metadata-associated-detailed-provenance/).
Metadata in the context of FAIR Data is also understood in a wider sense, e.g. to provide information about persistent identifiers, authorship, license, access provisions, links with other datasets etc. but I think it’s a good idea to start with the data provenance and influence metadata that you propose. It would probably make sense to indeed add this directly into the CDM, since this metadata is quite tightly linked with the data itself, especially the Perspective that you propose.
I’m not sure if there are any existing RDF / semantic web standards that we could leverage for that specific purpose, Dublin Core is not so suitable, maybe we could leverage some concepts from PROV (https://www.w3.org/TR/2013/REC-prov-dm-20130430/) and PROV-O (https://www.w3.org/TR/prov-o/). But representing OMOP CDM metadata as RDF / semantic web is a broader issue that we would also like to address in the upcoming EHDEN project, to improve compliance with the FAIR Data Principles.

Greetings,

Kees van Bochove
The Hyve

1 Like

Hi Ajit,

I have a conflicting meeting on Tuesday, so I will post my thoughts here. I hope to join again soon.

Our EHR database has a few examples of “Data Quality” issues that we would like to make note. Most of issues are the stuff that Achilles will find in the database. We should incorporate the Achilles reports with the Metadata table. It doesn’t have to be in the Metadata table, that may make it a huge table, but maybe just some rows to denote Achilles found X number of Y errors. Or that an Achilles report if available for further information about the source’s data quality.

Thanks,
Melanie

1 Like

Agreed, not only should we annotate the actual Heel results in the Metadata/Annotations schema, we should produce a piece of metadata that denotes the presence of an Achilles execution, its location, and a summary of Heel findings.

Do you have any concrete examples you would like to add to the spreadsheet?

I’ll add in some examples from our Achilles report

Hi @keesvanbochove,

Interesting ideas! I think at this point, we’re still fleshing out real-world examples, but I do think it is important that we start to consider non-SQL based approaches when we move into the implementation phase of this effort.

I’m afraid I have 0 experience with semantic web, and I’m not sure how much experience there is in the Metadata WG. Would you mind joining us during an upcoming session and providing a little crash course through semantic web / RDF and how we could align with current EHDEN efforts?

Thanks,
Ajit

All – due to travel and meeting conflicts, I’m shifting our next call to July 24 at 2 pm est.

t