Metadata and Annotations WG

Ajit_Londhe · April 4, 2018, 5:01pm

Hi all,

We’ve discussed some pretty good ideas around metadata and annotations over the past few months, and have tried some site-specific proof of concepts, but haven’t made real progress so far on a community effort. One of my main goals for 2018 is to push this forward.

To recap:

I presented a poster at the last Symposium with @ericaVoss and @Vojtech_Huser on the value of metadata and annotations to preventing poor study design. This poster focused on data source specific artifacts: data collection anomalies, vocabulary mapping changes, and Achilles Heel annotations. I also demonstrated a version of Atlas/WebAPI that could consume metadata and annotations in order to help users steer clear of danger with their study design.
After posting a thread in the Forums ( Annotations in the CDM ), @jon_duke presented a great use case for clinical annotation in an OHDSI community call, and demonstrated a web app used at GA Tech that relies upon a custom schema (Jon – can you share the ER diagram?). This dialogue showed that we have other valuable use cases to consider, but that we could conceivably unify the various levels of metadata into one construct.

From that call, 2 key questions came up:

1. Couldn’t fact_relationship handle most of our metadata needs?

@hripcsa raised this question, and it’s a good one. Could we store metadata in fact_relationship rather than create new table(s)? It’s certainly possible to add facts about patients or concepts, but it seems a bit too rigid as it contains purely numeric fields that do not allow human-authored annotation strings or temporal bounds. It feels like it would be challenging to force fit some of the types of information we’re looking to store, such as our classic examples of a drop-off in death data in 2011, or an unmappable clinical note about a patient made retroactively.

Additionally, metadata and annotations are data not collected during the patient’s observation period. As such, I would be hesitant to store it in the CDM schema, which is intended to “include all observational health data elements (experiences of the patient receiving health care)”; instead, I feel it would be better served in the results schema, or, perhaps better still, its own metadata schema.

2. Okay, assuming we don’t store this in the CDM…should we form a WG? Or should this be a part of the CDM WG?

Personally, I think we should form a new WG. Metadata and annotations are valuable artifacts at every level of a data source, but agreeing on standards for generating and storing it will be challenging. We will need to capture a diversity of use cases and understand how sites could realistically benefit from this new metadata to ensure that we’re not just creating more data for the sake of it. The CDM WG has a lot of key decisions to make that require significant discussion, and I worry that adding metadata and annotations to the mix will be detrimental to both efforts.

If there are no objections to creating a new WG, I volunteer to lead it. I would ask that we establish a GitHub repo to propose, vote, and disseminate our Metadata/Annotation standards similar to how @clairblacketer has done with the CDM repo. Similar to Themis, once metadata/annotation standards are agreed upon by this WG, we would take them to the CDM WG for final ratification into the CDM specification.

So who’s interested in joining this journey about the journey?

Tagging some additional folks from previous posts (only 10 tags allowed? c’mon!): @mgurley @jenniferduryea @lilipeng @Gowtham_Rao @Evan_Minty

Thanks,
Ajit

Frank · April 10, 2018, 4:08pm

I would like to join!

Andrew · April 10, 2018, 4:10pm

Thanks for starting this. I would like to join.

Vojtech_Huser · April 10, 2018, 5:14pm

I would like to actively contribute.

To add to the discussion:

There are two types of metadata - (A) on a specific row (I converted kilograms in this row from lb in the source system).

And (B) metadata on a dataset (e.g., count of persons that were removed due to missing birth year) during source to target (target being OMOP CDM) data transformation.

For (B) type, the fact_relationship “hack/overload” would not work. (or would be “stretched” a bit)

Yurang_Park · April 11, 2018, 4:48am

I would like to join this working group.
I am currently working on metadata extraction of clinical items from unstructured text materials. (Eg institution-specific data such as pathology results or imaging results)
Thank you for starting.

mgurley · April 16, 2018, 6:53am

I would like to join.

Ajit_Londhe · April 16, 2018, 8:39pm

Thanks all. @MauraBeaton has set us up with a new WG page, I will send out updates tomorrow evening on the meeting scope, roadmap, and logistics.

MPhilofsky · April 17, 2018, 3:28pm

Count me in!

razzaghih · April 17, 2018, 3:34pm

Hi - I’d like to also be included in this. Thanks!

DocTuppy · April 18, 2018, 1:03pm

I’d like to participate as well. Thanks for starting this group!

Ajit_Londhe · April 18, 2018, 5:56pm

All – the new WG page is now up:
http://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:metadata_and_annotations

We’ll meet every other Friday, starting April 27, at 12 PM EST / 9 AM PST.

Some initial discussion points for our first meeting:

WG Goals and Deliverables
Logistics (including: how do we disseminate our ideas, how do we make them standard practice?)
What are your use cases? What challenges do researchers at your site face when consuming your CDM(s)?
Roadmap for 2018

MPhilofsky · April 18, 2018, 7:44pm

I know you can’t please everyone, but as a FYI, this meeting conflicts with a standing PEDSnet meeting. @razzaghih, @mgkahn and I are regulars on the PEDSnet call.

And thank you for organizing and getting this group started. It is very important work.

Ajit_Londhe · April 18, 2018, 7:49pm

Does that PEDSnet meeting end at 1 pm est? Perhaps we just slot this right after?

razzaghih · April 19, 2018, 1:58am

Thanks @MPhilofsky for the tag. I would prefer 2 PM on Fridays if that works for people but if 1 is more convenient, I could try to make that work sometimes (I have another conflict then, but could try to work around it some weeks). Thanks!

jon_duke · April 20, 2018, 7:49pm

1 or 2 ET works for me. Have another OHDSI meeting at 12p!

Andrew · April 21, 2018, 8:35pm

Both 1 and 2 work for me as well.

DocTuppy · April 21, 2018, 8:45pm

Friday afternoon is a tough time for me. Would it be possible to do it on Friday morning (before noon EST) or Thursday afternoon?

razzaghih · April 24, 2018, 9:13pm

Hi - just wanted to follow up on this. @Ajit_Londhe did you decide on a time?

Ajit_Londhe · April 24, 2018, 9:25pm

Hi all – let’s go for 2 pm est and adjust if necessary.

razzaghih · April 24, 2018, 9:26pm

Thanks! Looking forward to this!