OHDSI Home | Forums | Wiki | Github

Metadata and Annotations WG

Hi all,

We’ve discussed some pretty good ideas around metadata and annotations over the past few months, and have tried some site-specific proof of concepts, but haven’t made real progress so far on a community effort. One of my main goals for 2018 is to push this forward.

To recap:

  1. I presented a poster at the last Symposium with @ericaVoss and @Vojtech_Huser on the value of metadata and annotations to preventing poor study design. This poster focused on data source specific artifacts: data collection anomalies, vocabulary mapping changes, and Achilles Heel annotations. I also demonstrated a version of Atlas/WebAPI that could consume metadata and annotations in order to help users steer clear of danger with their study design.

  2. After posting a thread in the Forums ( Annotations in the CDM ), @jon_duke presented a great use case for clinical annotation in an OHDSI community call, and demonstrated a web app used at GA Tech that relies upon a custom schema (Jon – can you share the ER diagram?). This dialogue showed that we have other valuable use cases to consider, but that we could conceivably unify the various levels of metadata into one construct.

From that call, 2 key questions came up:

1. Couldn’t fact_relationship handle most of our metadata needs?

@hripcsa raised this question, and it’s a good one. Could we store metadata in fact_relationship rather than create new table(s)? It’s certainly possible to add facts about patients or concepts, but it seems a bit too rigid as it contains purely numeric fields that do not allow human-authored annotation strings or temporal bounds. It feels like it would be challenging to force fit some of the types of information we’re looking to store, such as our classic examples of a drop-off in death data in 2011, or an unmappable clinical note about a patient made retroactively.

Additionally, metadata and annotations are data not collected during the patient’s observation period. As such, I would be hesitant to store it in the CDM schema, which is intended to “include all observational health data elements (experiences of the patient receiving health care)”; instead, I feel it would be better served in the results schema, or, perhaps better still, its own metadata schema.

2. Okay, assuming we don’t store this in the CDM…should we form a WG? Or should this be a part of the CDM WG?

Personally, I think we should form a new WG. Metadata and annotations are valuable artifacts at every level of a data source, but agreeing on standards for generating and storing it will be challenging. We will need to capture a diversity of use cases and understand how sites could realistically benefit from this new metadata to ensure that we’re not just creating more data for the sake of it. The CDM WG has a lot of key decisions to make that require significant discussion, and I worry that adding metadata and annotations to the mix will be detrimental to both efforts.

If there are no objections to creating a new WG, I volunteer to lead it. I would ask that we establish a GitHub repo to propose, vote, and disseminate our Metadata/Annotation standards similar to how @clairblacketer has done with the CDM repo. Similar to Themis, once metadata/annotation standards are agreed upon by this WG, we would take them to the CDM WG for final ratification into the CDM specification.

So who’s interested in joining this journey about the journey?

Tagging some additional folks from previous posts (only 10 tags allowed? c’mon!): @mgurley @jenniferduryea @lilipeng @Gowtham_Rao @Evan_Minty



I would like to join! :smile:


Thanks for starting this. I would like to join.


I would like to actively contribute.

To add to the discussion:

There are two types of metadata - (A) on a specific row (I converted kilograms in this row from lb in the source system).

And (B) metadata on a dataset (e.g., count of persons that were removed due to missing birth year) during source to target (target being OMOP CDM) data transformation.

For (B) type, the fact_relationship “hack/overload” would not work. (or would be “stretched” a bit)

1 Like

I would like to join this working group.
I am currently working on metadata extraction of clinical items from unstructured text materials. (Eg institution-specific data such as pathology results or imaging results)
Thank you for starting.

1 Like

I would like to join.

1 Like

Thanks all. @MauraBeaton has set us up with a new WG page, I will send out updates tomorrow evening on the meeting scope, roadmap, and logistics.

Count me in!

1 Like

Hi - I’d like to also be included in this. Thanks!

1 Like

I’d like to participate as well. Thanks for starting this group!

1 Like

All – the new WG page is now up:

We’ll meet every other Friday, starting April 27, at 12 PM EST / 9 AM PST.

Some initial discussion points for our first meeting:

  1. WG Goals and Deliverables
  2. Logistics (including: how do we disseminate our ideas, how do we make them standard practice?)
  3. What are your use cases? What challenges do researchers at your site face when consuming your CDM(s)?
  4. Roadmap for 2018

I know you can’t please everyone, but as a FYI, this meeting conflicts with a standing PEDSnet meeting. @razzaghih, @mgkahn and I are regulars on the PEDSnet call.

And thank you for organizing and getting this group started. It is very important work.

Does that PEDSnet meeting end at 1 pm est? Perhaps we just slot this right after?

Thanks @MPhilofsky for the tag. I would prefer 2 PM on Fridays if that works for people but if 1 is more convenient, I could try to make that work sometimes (I have another conflict then, but could try to work around it some weeks). Thanks!

1 or 2 ET works for me. Have another OHDSI meeting at 12p!

Both 1 and 2 work for me as well.

Friday afternoon is a tough time for me. Would it be possible to do it on Friday morning (before noon EST) or Thursday afternoon?

Hi - just wanted to follow up on this. @Ajit_Londhe did you decide on a time?

Hi all – let’s go for 2 pm est and adjust if necessary.

Thanks! Looking forward to this!